WO2022068551A1 - 裁剪视频的方法、装置、设备以及存储介质 - Google Patents

裁剪视频的方法、装置、设备以及存储介质 Download PDF

Info

Publication number
WO2022068551A1
WO2022068551A1 PCT/CN2021/117458 CN2021117458W WO2022068551A1 WO 2022068551 A1 WO2022068551 A1 WO 2022068551A1 CN 2021117458 W CN2021117458 W CN 2021117458W WO 2022068551 A1 WO2022068551 A1 WO 2022068551A1
Authority
WO
WIPO (PCT)
Prior art keywords
frame
detection frame
detection
cost
cropping
Prior art date
Application number
PCT/CN2021/117458
Other languages
English (en)
French (fr)
Inventor
吴昊
马云涛
王长虎
Original Assignee
北京字节跳动网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京字节跳动网络技术有限公司 filed Critical 北京字节跳动网络技术有限公司
Priority to EP21874212.0A priority Critical patent/EP4224869A4/en
Priority to US18/001,067 priority patent/US11881007B2/en
Publication of WO2022068551A1 publication Critical patent/WO2022068551A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • G06T5/70
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/10Image acquisition
    • G06V10/16Image acquisition using multiple overlapping images; Image stitching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/34Smoothing or thinning of the pattern; Morphological operations; Skeletonisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • H04N21/440245Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display the reformatting operation being performed only on part of the stream, e.g. a region of the image or a time segment
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/4728End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for selecting a Region Of Interest [ROI], e.g. for requesting a higher resolution version of a selected region

Definitions

  • the embodiments of the present application relate to the field of computer vision, and more particularly, to a method, apparatus, device, and storage medium for cropping video.
  • the aspect ratio of the ad slot used to place the ad video is fixed, such as 9:16, however, the size of the original video is varied, for example, different original videos have different aspect ratios, thus , causing many original videos to be inconsistent with the size required by the ad slot, making it impossible to directly place original videos of other sizes on the ad slot. Based on this, the original video needs to be cropped to fit the size of the ad slot.
  • the original video is generally cropped by center cropping.
  • a method, device, device and storage medium for cropping video which can improve practicability and user experience while ensuring a simplified video cropping process.
  • a method for cropping a video including:
  • the importance score is used to characterize the importance of the detection frame in the first image frame
  • the coverage area is used to characterize the overlap of the detection frame and the text frame in the first image frame area
  • the smooth distance is used to represent the distance between the detection frame and the cropping frame of the previous image frame of the first image frame
  • an apparatus for cropping a video including:
  • the importance score is used to characterize the importance of the detection frame in the first image frame
  • the coverage area is used to characterize the overlap of the detection frame and the text frame in the first image frame area
  • the smooth distance is used to represent the distance between the detection frame and the cropping frame of the previous image frame of the first image frame
  • a cropping unit configured to crop the first image frame based on the cropping frame.
  • an electronic device comprising:
  • a processor and a memory where the memory is used for storing a computer program, and the processor is used for calling and running the computer program stored in the memory to perform the method of the first aspect.
  • a computer-readable storage medium for storing a computer program, the computer program causing a computer to perform the method of the first aspect.
  • the first image frame is divided into at least one detection frame, and then the first detection frame with the smallest cost of the at least one detection frame is determined as the cropping frame based on the cost of each detection frame;
  • the first image frame is divided into at least one detection frame, and by determining the cropping frame in the at least one detection frame, not only can the video be cropped, but also the position of the cropping frame can be avoided, and the flexibility of cropping the video can be improved; on the other hand,
  • By determining the cost of the detection frame according to the importance score of the detection frame it is beneficial to avoid loss or crop out important information in the first image frame, so as to improve the cropping effect; by determining the cost of the detection frame by the coverage area of the detection frame, it is possible to Avoid partial text and other phenomena in the cropped image, so as to improve the user's perception, and correspondingly, the cropping effect can be improved;
  • the cost of the detection frame is determined by the smooth distance of the detection frame, which can reduce the number of cropped image frames in the video.
  • directly determining the detection frame as the cropping frame is also beneficial to simplify the video cropping process.
  • the first detection frame with the smallest cost of the at least one detection frame is determined as the cropping frame, so as to crop the first image frame, which can not only ensure the simplification of the video cropping process, but also Improve the flexibility of cropping video, and also improve the cropping effect.
  • FIG. 1 is a schematic block diagram of a system framework provided by an embodiment of the present application.
  • FIG. 2 is a schematic flowchart of a method for cropping a video provided by an embodiment of the present application.
  • FIG. 3 is a schematic block diagram of an apparatus for cropping a video provided by an embodiment of the present application.
  • FIG. 4 is a schematic block diagram of an electronic device provided by an embodiment of the present application.
  • Computer Vision is a science that studies how to make machines "see”. Further, it refers to the application of computer equipment such as cameras and computers instead of human eyes to identify, track and measure target objects in images. , the image can also be further processed to make the processed image more suitable for human observation or more convenient to transmit to other devices for detection.
  • Computer vision technology can usually include image processing, image recognition, image semantic understanding, image retrieval, Optical Character Recognition (OCR), video processing, video semantic understanding, video content/behavior recognition, 3D object reconstruction, 3D technology, Technologies such as virtual reality, augmented reality, simultaneous positioning and map construction can also include common biometric identification technologies such as face recognition and fingerprint recognition.
  • FIG. 1 is a schematic block diagram of a system framework 100 provided by an embodiment of the present application.
  • the system framework 100 may include a frame extraction unit 110 , an image understanding unit 120 , a path planning unit 130 and a post-processing unit 140 .
  • the frame extracting unit 110 is configured to receive the video to be cropped, and extract image frames to be processed based on the video to be cropped.
  • the image understanding unit 120 receives the to-be-processed image frame sent by the frame extraction unit 110, and processes the to-be-processed image frame.
  • the image understanding unit 120 may process the to-be-processed image frame through frame detection, so as to remove useless frames such as black borders and Gaussian blur of the to-be-processed image frame.
  • the image understanding unit 120 may process the to-be-processed image frame by saliency detection to detect the position of the main component of the to-be-processed image frame, for example, for each pixel in the to-be-processed image frame significance score.
  • the image understanding unit 120 may process the to-be-processed image frame through face detection to detect the location of the face.
  • the image understanding unit 120 may process the to-be-processed image frame through text detection, so as to detect the position of the text and the text content.
  • the image understanding unit 120 may process the to-be-processed image frame through trademark detection, so as to detect the location of the trademark, watermark, and the like. After processing, the image understanding unit 120 may send the processed image frame to be planned and the image information of the to-be-planned image frame to the path planning unit 130 .
  • the image understanding unit 120 may also have a preprocessing function.
  • the frame removal process is performed on the to-be-processed image frame sent by the frame extracting unit 110 .
  • the path planning unit 130 receives the to-be-planned image frame sent by the image understanding unit 120, and based on the image information detected by the image processing unit 120, determines the planned path of the to-be-planned image frame, and then can crop based on the planned path. the to-be-planned image frame to output the cropped image frame.
  • the post-processing unit 140 may be configured to perform post-processing operations on the cropped image frame. For example, for interpolation or smoothing.
  • the interpolation processing can be understood as inserting multiple image frames through interpolation between multiple cropped image frames to generate a cropped video.
  • smoothing can be performed to keep the positions of the cropped image frames in the video unchanged.
  • system framework 100 may be a terminal or a server.
  • the terminal may be a device such as a smart phone, a tablet computer, or a portable computer.
  • the terminal installs and runs an application program that supports video cropping technology.
  • the application may be a photography application, a video processing application, or the like.
  • the terminal is a terminal used by a user, and a user account is logged in an application program running in the terminal.
  • the terminal can be connected to the server through a wireless network or a wired network.
  • the server may be a cloud computing platform, a virtualization center, or the like.
  • the server is used to provide background services for applications that support video cropping technology.
  • the server undertakes the main video cropping work, and the terminal undertakes the secondary video cropping work; in another example, the server undertakes the secondary video cropping work, and the terminal undertakes the main video cropping work; another example, the server or the terminal can separately undertake the video cropping work.
  • the servers may include access servers, video recognition servers, and databases.
  • FIG. 2 is a flowchart of a video cropping method 200 provided by an embodiment of the present application.
  • the method 200 can be applied to the above-mentioned terminal or server. Both terminal and server can be regarded as a kind of computer equipment.
  • terminal and server can be regarded as a kind of computer equipment.
  • the system framework 100 shown in FIG. 1 For example, the system framework 100 shown in FIG. 1 .
  • the method 200 may include:
  • S220 Determine the cost of the detection frame according to at least one of the importance score, coverage area and smooth distance of any detection frame in the at least one detection frame; wherein the importance score is used to characterize all the detection frames. the importance of the detection frame in the first image frame, the coverage area is used to represent the overlapping area of the detection frame and the text box in the first image frame, and the smooth distance is used to represent the the distance between the detection frame and the cropping frame of the previous image frame of the first image frame;
  • the first image frame may be obtained by performing frame extraction processing on the video to be cropped, whereby at least one detection frame of the first image frame may be obtained to determine the first detection frame in the at least one detection frame, based on Therefore, the cost of the detection frame is determined according to at least one of the importance score, coverage area and smooth distance of any detection frame in the at least one detection frame, and then the first detection frame with the smallest cost is determined as a cropping frame to crop the first image frame based on the cropping frame.
  • the at least one detection frame may be a preset detection frame.
  • the at least one detection frame may also be set by a user, or the at least one detection frame may be generated based on image understanding of the first image frame.
  • the first detection frame By dividing the first image frame into at least one detection frame, then based on the cost of each detection frame, the first detection frame with the smallest cost of the at least one detection frame is determined as a cropping frame; on the one hand, the first image The frame is divided into at least one detection frame, and by determining the cropping frame in at least one detection frame, not only can the video be cropped, but also the fixed position of the cropping frame can be avoided, which can improve the flexibility of cropping the video;
  • the importance score of the detection frame determines the cost of the detection frame, which is conducive to avoiding loss or cropping out important information in the first image frame to improve the cropping effect; determining the cost of the detection frame by the coverage area of the detection frame can avoid the cropped frame.
  • the cost of the detection frame is determined by the smooth distance of the detection frame, which can reduce the position movement of the cropped image frames in the video Amplitude to avoid frequent movement of the lens, which in turn, can improve the cropping effect. It is equivalent to determining the first detection frame with the smallest cost of the at least one detection frame as the cropping frame based on the cost of each detection frame, and cropping the first image frame, which can not only improve the flexibility of cropping the video, but also improve the cropping effect. .
  • directly determining the detection frame as the cropping frame is also beneficial to simplify the video cropping process.
  • the first detection frame with the smallest cost of the at least one detection frame is determined as the cropping frame, so as to crop the first image frame, which can not only ensure the simplification of the video cropping process, but also Improve the flexibility of cropping video, and also improve the cropping effect.
  • the at least one detection frame is a plurality of detection frames, and the detection frames in the plurality of detection frames may partially overlap.
  • the size of the at least one detection frame may be determined with the granularity of pixels. For example, a detection box every 20 pixels.
  • the size of the at least one detection frame may be determined based on the size of the video to be cropped.
  • the size of the at least one cropping frame may be determined based on the cropping size.
  • the cropping size may be understood as the expected size of the cropped video, and may also be understood as the expected aspect ratio of the cropped video.
  • the at least one detection frame can also be understood as at least one state or at least one cropping state. In other words, one of the at least one states may be determined to crop the first image frame based on the one state.
  • the cropping size is 1:1
  • the size of the cropped video is 720*720.
  • the size of each cropping frame in the at least one cropping frame may be 720*720.
  • the present application does not specifically limit the first image frame.
  • the first image frame is an image frame from which borders are removed or blurred.
  • it may also be an image frame obtained by directly performing frame extraction processing on the original video.
  • the S220 may include:
  • the importance cost of the detection frame decreases as the importance score of the detection frame increases; the cost of the detection frame includes the importance cost of the detection frame.
  • the cost of the detection box can be determined by the importance score of the detection box.
  • the cropping frame is determined in the at least one detection frame by the importance score of the detection frame, so that the cropping frame can use or retain the position of the important information in the first image frame, and correspondingly, loss of the important information can be avoided.
  • Important information in the first image frame to enhance the look and feel of the cropped image.
  • the first detection frame may be determined based on at least one importance score, the at least one importance score characterizing the degree of importance of the at least one detection frame in the first image frame, respectively.
  • the at least one importance score is obtained by means of saliency detection or face detection.
  • the importance score with the highest score among the at least one importance score may be used to determine the first detection frame from the detection frame corresponding to the importance score with the highest score.
  • the importance score of the detection box can be the sum of the importance scores of all pixels in the detection box.
  • the importance score of each pixel may include a saliency score obtained by saliency detection and a face score obtained by face detection.
  • the at least one importance cost may be determined based only on the first image frame.
  • determine a first ratio of the detection frame where the first ratio is the ratio of the importance score of the detection frame to the importance score of the first image frame; determine based on the first ratio of the detection frame
  • the importance cost of the detection frame the importance cost of the detection frame decreases as the first ratio of the detection frame increases.
  • each detection frame in the at least one detection frame may correspond to a first ratio.
  • the first ratio of the same detection frame in the at least one detection frame is the ratio of the importance score of the same detection frame and the importance score of the first image frame.
  • the importance cost corresponding to the same detection frame can be determined by the following formula:
  • S i1 represents the importance cost corresponding to the i-th detection frame in the at least one detection frame
  • C i represents the i-th detection frame in the C-th image frame
  • I(C i ) represents the detection frame C i
  • the importance score of , I(C) represents the importance score of the C-th image frame.
  • the at least one importance cost may be determined based on the second image frame.
  • determining at least one ratio of the detection frame where the at least one ratio of the detection frame includes the ratio of the importance score of the detection frame to the importance score of each detection frame in the previous image frame; Based on at least one ratio of the detection frames, an importance cost of the detection frame is determined, and the importance cost of the detection frame decreases as a ratio of the at least one ratio increases.
  • each detection frame of the at least one detection frame may correspond to at least one ratio.
  • at least one ratio of the same detection frame in the at least one detection frame includes the ratio of the importance score of the same detection frame to the importance score of each detection frame in the second image frame, the second image The frame precedes the first image frame in time domain.
  • the total cost of each detection frame can be determined by the following formula:
  • S 1i represents the importance cost corresponding to the ith detection frame in the at least one detection frame
  • C i represents the ith detection frame in the C th image frame
  • I(C i ) represents the detection frame
  • C i Dj represents the jth detection frame of the Dth image frame
  • I( Dj ) represents the importance score of the detection frame Dj
  • n represents the number of detection frames of the Dth image frame.
  • the S220 may include:
  • the coverage cost corresponding to the detection frame decreases first and then increases with the increase of the coverage area of the detection frame; the cost of the detection frame includes the coverage cost of the detection frame.
  • the cost of the detection box is determined based on the overlap of the detection box and the text box in the first image frame.
  • At least one coverage cost corresponding to the at least one detection frame respectively may be determined based on the overlap between the at least one detection frame and the text frame, and the coverage cost corresponding to the same detection frame in the at least one detection frame varies with The coverage area first decreases and then increases, and the coverage area is the overlapping area of the same detection frame and the text frame; the cropping frame is determined based on at least one coverage cost.
  • the coverage cost of each detection frame can be determined by the following formula:
  • S 2i represents the coverage cost of the ith detection frame in the at least one detection frame
  • C i represents the ith detection frame in the C th image frame
  • T k represents the k th detection frame in the C th image frame number of text boxes
  • m represents the number of text boxes in the C-th image frame
  • B(C i ,T k ) represents the coverage cost of detection box C i and text box
  • ⁇ 1 represents detection box C i and text box T k coverage factor.
  • ⁇ 1 is greater than or equal to 0 and less than 1.
  • the at least one coverage cost can be determined by the following formula:
  • x represents the overlapping area of the same detection frame and the text frame.
  • the text box includes an area where the text or trademark in the first image is located.
  • the text in the first image may be the subtitle of the first image frame.
  • the S220 may include:
  • the distance ratio of the detection frame is the ratio of the smooth distance of the detection frame to the first length, and the first length is the length of the side of the first image frame parallel to the first connecting line, so
  • the first connection line is the connection line formed by the detection frame and the cropping frame of the previous image frame, and the distance cost of the detection frame increases with the increase of the distance ratio of the detection frame;
  • the cost includes the distance cost of the detection box.
  • the cost of the detection frame can be determined based on the smooth distance of the detection frame.
  • Determining the cropping frame according to the distance between the detection frame and the cropping frame of the second image frame is equivalent to reducing the number of cropped image frames in the video as much as possible during the process of determining the cropping frame. position movement range to avoid frequent lens movement, which in turn can improve the cropping effect.
  • At least one distance cost corresponding to the at least one detection frame may be determined based on at least one distance ratio, the at least one distance ratio being the ratio of the at least one smooth distance to the length of the first side of the first image frame.
  • the at least one smoothing distance is the smoothing distance of the at least one detection frame
  • the first side length is parallel to the distribution direction of the at least one detection frame
  • the same detection frame in the at least one detection frame corresponds to The distance cost increases as the distance between the same detection frame and the second cropping frame increases; the cropping frame is determined based on the at least one distance cost.
  • the distance cost of each detection frame can be determined by the following formula:
  • S 3i represents the distance cost of the i-th detection frame in the at least one detection frame
  • C i represents the i-th detection frame in the C-th image frame
  • ⁇ 2 represents the detection frame C i relative to the detection frame D j
  • the smoothing coefficient of , L(C i ) represents the position of the detection frame C i
  • D t represents the cropping frame of the D-th image frame
  • L(D t ) represents the position of the cropping frame of the D-th image frame
  • A represents the The length of the first side of the first image frame.
  • the first side length is an arrangement direction of the at least one detection frame.
  • the method 200 may further include:
  • the first detection frame may be determined based on a total cost corresponding to each detection frame of the at least one detection frame.
  • the total cost may include at least one of the importance cost, coverage cost, and distance cost referred to below below.
  • the total cost of each detection frame can be determined by the following formula:
  • S i represents the total cost of the i-th detection frame in the at least one detection frame
  • C i represents the i-th detection frame in the C-th image frame
  • I(C i ) represents the importance of the detection frame C i Sex score
  • D j represents the j-th detection frame of the D-th image frame
  • I(D j ) represents the importance score of the detection frame D j
  • n represents the number of detection frames of the D-th image frame
  • T k represents the th-th detection frame
  • m represents the number of text boxes in the C-th image frame
  • B(C i ,T k ) represents the coverage cost of the detection box C i and the text box T k
  • ⁇ 1 represents the coverage coefficient of the detection frame C i and the text box
  • ⁇ 2 represents the smoothing coefficient of the detection frame C i relative to the detection frame D j
  • L(C i ) represents the
  • the detection frame with the smallest total cost can be determined as the first detection frame.
  • FIG. 3 is a schematic block diagram of an apparatus 300 for cropping a video provided by an embodiment of the present application.
  • an acquiring unit 310 configured to acquire at least one detection frame of the first image frame
  • a determining unit 320 configured to determine a first detection frame in the at least one detection frame
  • a cropping unit 330 configured to crop the first image frame by using the first detection frame as a cropping frame.
  • the at least one detection frame is a preset detection frame.
  • the determining unit 320 is specifically configured to:
  • the first detection frame is determined based on at least one importance score, the at least one importance score characterizing the degree of importance of the at least one detection frame in the first image frame, respectively.
  • the obtaining unit 310 is further configured to:
  • the at least one importance score is obtained by means of saliency detection or face detection.
  • the determining unit 320 is specifically configured to:
  • At least one importance cost corresponding to the at least one detection frame is determined based on the at least one importance score, and the importance cost corresponding to the same detection frame in the at least one detection frame varies with the importance score of the same detection frame increase and decrease;
  • the first detection frame is determined based on the at least one importance cost.
  • the determining unit 320 is specifically configured to:
  • the importance cost of the same detection frame decreases as the first ratio of the same detection frame increases.
  • the determining unit 320 is specifically configured to:
  • At least one ratio of each detection frame in the at least one detection frame is determined, and at least one ratio of the same detection frame in the at least one detection frame includes the importance score of the same detection frame relative to the second image frame, respectively.
  • the ratio of the importance scores of each detection frame, the second image frame is located before the first image frame in the time domain;
  • the importance cost corresponding to the same detection frame is determined based on at least one ratio of the same detection frame, and the importance cost of the same detection frame decreases as the ratio of the at least one ratio increases.
  • the determining unit 320 is specifically configured to:
  • the first detection box is determined based on an overlap of the at least one detection box and a text box in the first image frame.
  • the determining unit 320 is specifically configured to:
  • At least one coverage cost corresponding to the at least one detection frame is determined, and the coverage cost corresponding to the same detection frame in the at least one detection frame varies with the coverage area.
  • the increase is first decreased and then increased, and the coverage area is the overlapping area of the same detection frame and the text frame;
  • the first detection frame is determined based on at least one coverage cost.
  • the text box includes an area where the text or trademark in the first image is located.
  • the determining unit 320 is specifically configured to:
  • the first detection frame is determined based on the distance of each detection frame in the at least one detection frame relative to the second crop frame in the second image frame, the second image frame is located in the first detection frame in the time domain before the image frame.
  • the determining unit 320 is specifically configured to:
  • At least one distance cost corresponding to the at least one detection frame is determined based on at least one distance ratio, the at least one distance ratio is the ratio of the at least one distance to the length of the first side of the first image frame, and the at least one distance ratio is the length of the first side of the first image frame.
  • One distance is the distance between the at least one cropping frame and the second cropping frame, the first side length is parallel to the distribution direction of the at least one detection frame, and the same detection frame in the at least one detection frame corresponds to The distance cost increases with the distance between the same detection frame and the second cropping frame;
  • the first detection frame is determined based on the at least one distance cost.
  • the first image frame is an image frame after frame removal or blurring.
  • the cropping unit 330 is further configured to:
  • the apparatus embodiments and the method embodiments may correspond to each other, and similar descriptions may refer to the method embodiments. To avoid repetition, details are not repeated here.
  • the apparatus 300 shown in FIG. 3 may correspond to the corresponding subject in executing the method 200 of the embodiment of the present application, and the aforementioned and other operations and/or functions of the various modules in the apparatus 300 are respectively for the purpose of realizing the various functions in FIG. 2 .
  • the corresponding processes in the method will not be repeated here.
  • the functional modules can be implemented in the form of hardware, can also be implemented by instructions in the form of software, and can also be implemented by a combination of hardware and software modules.
  • the steps of the method embodiments in the embodiments of the present application may be completed by hardware integrated logic circuits in the processor and/or instructions in the form of software, and the steps of the methods disclosed in conjunction with the embodiments of the present application may be directly embodied as hardware
  • the execution of the decoding processor is completed, or the execution is completed by a combination of hardware and software modules in the decoding processor.
  • the software modules may be located in random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, registers, and other storage media mature in the art.
  • the storage medium is located in the memory, and the processor reads the information in the memory, and completes the steps in the above method embodiments in combination with its hardware.
  • FIG. 4 is a schematic block diagram of an electronic device 400 provided by an embodiment of the present application.
  • the electronic device 400 may include:
  • the processor 420 can call and run the computer program 411 from the memory 410 to implement the method in the embodiments of the present application.
  • the processor 420 may be configured to perform the steps of the method 200 described above according to the instructions in the computer program 411 .
  • the processor 420 may include, but is not limited to:
  • DSP Digital Signal Processor
  • ASIC Application Specific Integrated Circuit
  • FPGA Field Programmable Gate Array
  • the memory 410 includes but is not limited to:
  • Non-volatile memory may be a read-only memory (Read-Only Memory, ROM), a programmable read-only memory (Programmable ROM, PROM), an erasable programmable read-only memory (Erasable PROM, EPROM), an electrically programmable read-only memory (Erasable PROM, EPROM). Erase programmable read-only memory (Electrically EPROM, EEPROM) or flash memory. Volatile memory may be Random Access Memory (RAM), which acts as an external cache.
  • RAM Random Access Memory
  • RAM Static RAM
  • DRAM Dynamic RAM
  • SDRAM Synchronous DRAM
  • SDRAM double data rate synchronous dynamic random access memory
  • Double Data Rate SDRAM DDR SDRAM
  • enhanced SDRAM ESDRAM
  • synchronous link dynamic random access memory SLDRAM
  • Direct Rambus RAM Direct Rambus RAM
  • the computer program 411 may be divided into one or more modules, and the one or more modules are stored in the memory 410 and executed by the processor 420 to complete the provision of the present application. method of recording pages.
  • the one or more modules may be a series of computer program instruction segments capable of accomplishing specific functions, and the instruction segments are used to describe the execution process of the computer program 411 in the electronic device 400 .
  • the electronic device 400 may further include:
  • a transceiver 440 which can be connected to the processor 420 or the memory 410 .
  • the processor 420 may control the transceiver 440 to communicate with other devices, specifically, may send information or data to other devices, or receive information or data sent by other devices.
  • Transceiver 440 may include a transmitter and a receiver.
  • the transceiver 440 may further include antennas, and the number of the antennas may be one or more.
  • bus system includes a power bus, a control bus and a status signal bus in addition to a data bus.
  • the present application also provides a computer storage medium on which a computer program is stored, and when the computer program is executed by a computer, enables the computer to execute the methods of the above method embodiments.
  • the embodiments of the present application further provide a computer program product including instructions, when the instructions are executed by a computer, the instructions cause the computer to execute the methods of the above method embodiments.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable device.
  • the computer instructions may be stored on or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions may be transmitted over a wire from a website site, computer, server or data center (eg coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (eg infrared, wireless, microwave, etc.) means to another website site, computer, server or data center.
  • the computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that includes one or more available media integrated.
  • the available media may be magnetic media (e.g., floppy disk, hard disk, magnetic tape), optical media (e.g., digital video disc (DVD)), or semiconductor media (e.g., solid state disk (SSD)), and the like.
  • the disclosed system, apparatus and method may be implemented in other manners.
  • the apparatus embodiments described above are only illustrative.
  • the division of the modules is only a logical function division. In actual implementation, there may be other division methods.
  • multiple modules or components may be combined or Integration into another system, or some features can be ignored, or not implemented.
  • the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or modules, and may be in electrical, mechanical or other forms.
  • Modules described as separate components may or may not be physically separated, and components shown as modules may or may not be physical modules, that is, may be located in one place, or may be distributed over multiple network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment. For example, each functional module in each embodiment of the present application may be integrated into one processing module, or each module may exist physically alone, or two or more modules may be integrated into one module.

Abstract

提供了一种裁剪视频的方法、装置、设备以及存储介质。该包括:获取第一图像帧的至少一个检测框;根据所述至少一个检测框中的任意一个检测框的重要性得分、覆盖面积以及平滑距离中的至少一项,确定所述检测框的代价;将所述至少一个检测框中代价最小的第一检测框确定为裁剪框;基于所述裁剪框裁剪所述第一图像帧。基于每个检测框的代价将该至少一个检测框代价最小的第一检测框确定为裁剪框,以裁剪所述第一图像帧,能够在保证简化视频裁剪过程的情况下,不仅能够提升裁剪视频的灵活度,还能够提升裁剪效果。

Description

裁剪视频的方法、装置、设备以及存储介质
本申请要求于2020年09月30日提交中国专利局、申请号为202011061772.8、发明名称为“裁剪视频的方法、装置、设备以及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及计算机视觉领域,并且更具体地,涉及裁剪视频的方法、装置、设备以及存储介质。
背景技术
通常情况下,用于投放广告视频的广告位的宽高比是固定的,例如9:16,然而,原始视频的尺寸是多种多样,例如不同的原始视频有不同的宽高比,由此,导致了很多原始视频与广告位所要求的尺寸不一致,使得无法直接在广告位上直接投放其他尺寸的原始视频。基于此,需要对需要原始视频进行裁剪,以符合广告位的尺寸。
截止目前,一般通过居中裁剪的方式裁剪原始视频。
但是,由于原始视频的中重要信息的位置存在随机性,通过居中裁剪的方式裁剪原始视频,有可能会造成视频中的重要信息过低,使得实用性过低且用户体验差。
发明内容
提供了一种裁剪视频的方法、装置、设备以及存储介质,能够在保证简化视频裁剪过程的情况下,提高实用性和用户体验。
第一方面,提供了一种裁剪视频的方法,包括:
获取第一图像帧的至少一个检测框;
根据所述至少一个检测框中的任意一个检测框的重要性得分、覆盖面积以及平滑距离中的至少一项,确定所述检测框的代价;
其中,所述重要性得分用于表征所述检测框在所述第一图像帧中的重要程度,所述覆盖面积用于表征所述检测框和所述第一图像帧中的文本框的重叠面积,所述平滑距离用于表征所述检测框和所述第一图像帧的上一个图像 帧的裁剪框之间的距离;
将所述至少一个检测框中代价最小的第一检测框确定为裁剪框;
基于所述裁剪框裁剪所述第一图像帧。
第二方面,提供了一种裁剪视频的装置,包括:
获取单元,用于:
根据所述至少一个检测框中的任意一个检测框的重要性得分、覆盖面积以及平滑距离中的至少一项,确定所述检测框的代价;
其中,所述重要性得分用于表征所述检测框在所述第一图像帧中的重要程度,所述覆盖面积用于表征所述检测框和所述第一图像帧中的文本框的重叠面积,所述平滑距离用于表征所述检测框和所述第一图像帧的上一个图像帧的裁剪框之间的距离;
将所述至少一个检测框中代价最小的第一检测框确定为裁剪框;
裁剪单元,用于基于所述裁剪框裁剪所述第一图像帧。
第三方面,提供了一种电子设备,包括:
处理器和存储器,该存储器用于存储计算机程序,该处理器用于调用并运行该存储器中存储的计算机程序,以执行第一方面该的方法。
第四方面,提供了一种计算机可读存储介质,用于存储计算机程序,该计算机程序使得计算机执行第一方面该的方法。
基于以上方案,通过将所述第一图像帧划分为至少一个检测框,进而基于每个检测框的代价将该至少一个检测框代价最小的第一检测框确定为裁剪框;一方面,将所述第一图像帧划分为至少一个检测框,通过在至少一个检测框中确定裁剪框,不仅能够实现视频的裁剪,还能够避免固定裁剪框位置,能够提升裁剪视频的灵活度;另一方面,通过根据检测框的重要性得分确定所述检测框的代价,有利于避免损失或裁剪掉第一图像帧中的重要信息,以提升裁剪效果;通过检测框的覆盖面积确定检测框的代价,能够避免裁剪后的图像中出现部分文字等现象,以提升用户观感,相应的,可提升裁剪效果;通过检测框的平滑距离确定所述检测框的代价,能够降低裁剪后的多张图像帧在视频中的位置移动幅度,以避免镜头发生频繁移动,进而,可提升裁剪效果。相当于,基于每个检测框的代价将该至少一个检测框代价最小的第一检测框确定为裁剪框,以裁剪所述第一图像帧,不仅能够提升裁剪视频的灵活度,还能够提升裁剪效果。
此外,直接将检测框确定为裁剪框,还有利于简化视频裁剪过程。
综上,基于每个检测框的代价将该至少一个检测框代价最小的第一检测框确定为裁剪框,以裁剪所述第一图像帧,能够在保证简化视频裁剪过程的情况下,不仅能够提升裁剪视频的灵活度,还能够提升裁剪效果。
附图说明
图1是本申请实施例提供的系统框架的示意性框图。
图2是本申请实施例提供的裁剪视频的方法的示意性流程图。
图3是本申请实施例提供的裁剪视频的装置的示意性框图。
图4是本申请实施例提供的电子设备的示意性框图。
具体实施方式
本申请实施例提供的方案主要涉及计算机视觉(Computer Vision,CV)技术领域。
计算机视觉(Computer Vision,CV)是一门研究如何使机器“看”的科学,进一步地说,就是指应用摄影机和电脑等计算机设备代替人眼对图像中的目标对象进行识别、跟踪和测量等,还可以对图像做进一步处理,使处理后的图像更适合人眼观察或更便于传送给其他设备进行检测。计算机视觉研究相关的理论和技术,试图建立能够从图像或者多维数据中获取信息的人工智能系统。计算机视觉技术通常可以包括图像处理、图像识别、图像语义理解、图像检索、光学字符识别(Optical Character Recognition,OCR)、视频处理、视频语义理解、视频内容/行为识别、三维物体重建、3D技术、虚拟现实、增强现实、同步定位与地图构建等技术,还可以包括常见的人脸识别、指纹识别等生物特征识别技术。
图1是本申请实施例提供的系统框架100的示意性框图。
如图1所示,所述系统框架100可包括抽帧单元110、图像理解单元120、路径规划单元130以及后处理单元140。
其中,所述抽帧单元110用于接收待裁剪视频,并基于所述待裁剪视频抽取待处理图像帧。
图像理解单元120接收抽帧单元110发送的待处理图像帧,并对所述待处理图像帧进行处理。
例如,图像理解单元120可通过边框检测对所述待处理图像帧进行处理,以去除所述待处理图像帧的黑边、高斯模糊等无用边框。再如,图像理解单元120可通过显著性检测对所述待处理图像帧进行处理,以检测出所述待处理图像帧的主体成分位置,例如针对所述待处理图像帧中的每个像素点的显著性得分。再如,图像理解单元120可通过人脸检测对所述待处理图像帧进行处理,以检测出人脸所在位置。再如,图像理解单元120可通过文字检测对所述待处理图像帧进行处理,以检测出文字所在位置,以及文字内容。再如,图像理解单元120可通过商标(logo)检测对所述待处理图像帧进行处理,以检测出商标、水印等所在位置。图像理解单元120经过处理后,可向路径规划单元130发送处理后的待规划图像帧和所述待规划图像帧的图像信息。
当然,图像理解单元120也可以具有预处理功能。例如,针对抽帧单元110发送的待处理图像帧进行去边框处理。
路径规划单元130接收到所述图像理解单元120发送的待规划图像帧,并基于所述图像处理单元120检测出的图像信息,确定所述待规划图像帧的规划路径,继而可基于规划路径裁剪所述待规划图像帧,以输出裁剪后的图像帧。
后处理单元140可用于对裁剪后的图像帧进行后处理操作。例如,对插值处理或平滑处理。所述插值处理可以理解为在多个裁剪后的图像帧之间通过插值的方式插入多张图像帧,以生成裁剪后的视频。再如,可以通过平滑处理,以保持裁剪后的多张图像帧在视频中的位置保持不变。
需要说明的是,所述系统框架100可以是终端或服务器。
终端可以是智能手机、平板电脑、便携计算机等设备。终端安装和运行有支持视频裁剪技术的应用程序。该应用程序可以是摄影类应用程序、视频处理类应用程序等。示例性的,终端是用户使用的终端,终端中运行的应用程序内登录有用户账号。
终端可通过无线网络或有线网络与服务器相连。
服务器可以是云计算平台、虚拟化中心等。服务器用于为支持视频裁剪技术的应用程序提供后台服务。例如,服务器承担主要视频裁剪工作,终端承担次要视频裁剪工作;再如,服务器承担次要视频裁剪工作,终端承担主要视频裁剪工作;再如,服务器或终端分别可以单独承担视频裁剪工作。
服务器可包括接入服务器、视频识别服务器和数据库。
图2是本申请实施例提供的一种视频裁剪方法200的流程图。该方法200可以应用于上述终端或者服务器。终端和服务器均可以视为一种计算机设备。例如图1所示的系统框架100。
如图2所示,所述方法200可包括:
S210,获取第一图像帧的至少一个检测框;
S220,根据所述至少一个检测框中的任意一个检测框的重要性得分、覆盖面积以及平滑距离中的至少一项,确定所述检测框的代价;其中,所述重要性得分用于表征所述检测框在所述第一图像帧中的重要程度,所述覆盖面积用于表征所述检测框和所述第一图像帧中的文本框的重叠面积,所述平滑距离用于表征所述检测框和所述第一图像帧的上一个图像帧的裁剪框之间的距离;
S230,将所述至少一个检测框中代价最小的第一检测框确定为裁剪框;
S240,基于所述裁剪框裁剪所述第一图像帧。
例如,可通过对待裁剪的视频进行抽帧处理获取第一图像帧,由此可获取所述第一图像帧的至少一个检测框,以确定所述至少一个检测框中的第一检测框,基于此,根据所述至少一个检测框中的任意一个检测框的重要性得分、覆盖面积以及平滑距离中的至少一项,确定所述检测框的代价,进而将代价最小的第一检测框确定为裁剪框,以基于所述裁剪框裁剪所述第一图像帧。可选的,所述至少一个检测框可以为预设的检测框。当然,也可以通过用户设置所述至少一个检测框,或者可以通过基于对所述第一图像帧的图像理解生成所述至少一个检测框。
通过将所述第一图像帧划分为至少一个检测框,进而基于每个检测框的代价将该至少一个检测框代价最小的第一检测框确定为裁剪框;一方面,将所述第一图像帧划分为至少一个检测框,通过在至少一个检测框中确定裁剪框,不仅能够实现视频的裁剪,还能够避免固定裁剪框位置,能够提升裁剪视频的灵活度;另一方面,通过根据检测框的重要性得分确定所述检测框的代价,有利于避免损失或裁剪掉第一图像帧中的重要信息,以提升裁剪效果;通过检测框的覆盖面积确定检测框的代价,能够避免裁剪后的图像中出现部分文字等现象,以提升用户观感,相应的,可提升裁剪效果;通过检测框的平滑距离确定所述检测框的代价,能够降低裁剪后的多张图像帧在视频中的 位置移动幅度,以避免镜头发生频繁移动,进而,可提升裁剪效果。相当于,基于每个检测框的代价将该至少一个检测框代价最小的第一检测框确定为裁剪框,裁剪所述第一图像帧,不仅能够提升裁剪视频的灵活度,还能够提升裁剪效果。
此外,直接将检测框确定为裁剪框,还有利于简化视频裁剪过程。
综上,基于每个检测框的代价将该至少一个检测框代价最小的第一检测框确定为裁剪框,以裁剪所述第一图像帧,能够在保证简化视频裁剪过程的情况下,不仅能够提升裁剪视频的灵活度,还能够提升裁剪效果。
需要说明的是,本申请对所述至少一个检测框的具体实现不作限定。
例如,所述至少一个检测框为多个检测框,所述多个检测框中的检测框可以部分重叠。再如,可以以像素为粒度确定所述至少一个检测框的大小。例如,每20个像素点一个检测框。再如,可以基于待裁剪视频的尺寸确定所述至少一个检测框的大小。再如,可以基于裁剪尺寸确定所述至少一个裁剪框的大小。其中,所述裁剪尺寸可以理解为预期的裁剪后的视频的尺寸,也可以理解为预期的裁剪后的视频的长宽比。所述至少一个检测框也可以理解为至少一个状态或至少一个裁剪状态。换言之,可以在至少一个状态中确定一个状态,以基于所述一个状态裁剪所述第一图像帧。
假设待裁剪的视频的尺寸为1280*720,所述裁剪尺寸为1:1,裁剪后的视频的尺寸为720*720。基于此,所述至少一个裁剪框中每一个裁剪框的尺寸可以为720*720。
此外,本申请对所述第一图像帧也不作具体限定。
例如,所述第一图像帧为经过去除边框或经过模糊处理的图像帧。当然,也可以是对原始视频直接进行抽帧处理后获取的图像帧。
在本申请的一些实施例中,所述S220可包括:
基于所述检测框的重要性得分确定所述检测框的重要性代价;
其中,所述检测框的重要性代价随所述检测框的重要性得分的增加而减小;所述检测框的代价包括所述检测框的重要性代价。
简言之,可通过所述检测框的重要性得分确定所述检测框的代价。
通过检测框的重要性得分在所述至少一个检测框中确定所述裁剪框,可以使得所述裁剪框能够使用或保留所述第一图像帧中的重要信息的位置,相应的,可以避免丢失所述第一图像帧中的重要信息,以提升对裁剪后的图像 的观感。
换言之,可基于至少一个重要性得分确定所述第一检测框,所述至少一个重要性得分表征所述至少一个检测框分别在所述第一图像帧中的重要程度。
例如,通过显著性检测或人脸检测的方式,获取所述至少一个重要性得分。例如,可将所述至少一个重要性得分中得分最大的重要性得分,以将所述得分最大的重要性得分所对应的检测框,确定所述第一检测框。检测框的重要性得分可以是检测框中所有像素的重要性得分之和。每一个像素的重要性得分可以包括通过显著性检测获取显著性得分以及通过人脸检测获取的人脸得分。
在本申请的一些实施例中,可仅基于第一图像帧确定所述至少一个重要性代价。
例如,确定所述检测框的第一比值,所述第一比值为所述检测框的重要性得分和所述第一图像帧的重要性得分的比值;基于所述检测框的第一比值确定所述检测框的重要性代价,所述检测框的重要性代价随所述检测框的第一比值的增加而减小。
换言之,所述至少一个检测框中的每一个检测框可对应一个第一比值。例如,所述至少一个检测框中的同一检测框的第一比值为所述同一检测框的重要性得分和所述第一图像帧的重要性得分的比值。
再如,可以通过以下公式确定所述同一检测框对应的重要性代价:
S i1=1-I(C i)/I(C);
其中,S i1表示所述至少一个检测框中的第i个检测框对应的重要性代价,C i表示第C个图像帧中的第i个检测框,I(C i)表示检测框C i的重要性得分,I(C)表示第C个图像帧的重要性得分。
在本申请的一些实施例中,可以基于第二图像帧确定所述至少一个重要性代价。
例如,确定所述检测框的至少一个比值,所述检测框的至少一个比值包括所述检测框的重要性得分分别相对所述上一个图像帧中的每一个检测框的重要性得分的比值;基于所述检测框的至少一个比值,确定所述检测框的重要性代价,所述检测框的重要性代价随所述至少一个比值中的比值的增加而减小。
换言之,所述至少一个检测框中的每一个检测框可对应至少一个比值。例如,所述至少一个检测框中同一检测框的至少一个比值包括所述同一检测框的重要性得分分别相对第二图像帧中的每一个检测框的重要性得分的比值,所述第二图像帧在时域上位于所述第一图像帧之前。
再如,可以通过以下公式确定每一个检测框的总代价:
Figure PCTCN2021117458-appb-000001
其中,S 1i表示所述至少一个检测框中的第i个检测框对应的重要性代价,C i表示第C个图像帧中的第i个检测框,I(C i)表示检测框C i的重要性得分,D j表示第D个图像帧的第j个检测框,I(D j)表示检测框D j的重要性得分,n表示第D个图像帧的检测框的数量。
在本申请的一些实施例中,所述S220可包括:
基于所述检测框和所述文本框的重叠面积,确定所述检测框的覆盖代价;
其中,所述检测框对应的覆盖代价随所述检测框的覆盖面积的增加先减小后增加;所述检测框的代价包括所述检测框的覆盖代价。
简言之,基于所述检测框和所述第一图像帧中的文本框的重叠情况,确定所述检测框的代价。
通过所述检测框和所述文本框的重叠情况,相当于,在考虑所述文本框的基础上确定所述裁剪框,由此可避免裁剪后的视频中出现部分文字等现象,以避免降低用户观感并提升裁剪效果。
换言之,可基于所述至少一个检测框和所述文本框的重叠情况,确定所述至少一个检测框分别对应的至少一个覆盖代价,所述至少一个检测框中的同一检测框对应的覆盖代价随覆盖面积的增加先减小后增加,所述覆盖面积为所述同一检测框和所述文本框的重叠面积;基于至少一个覆盖代价确定所述裁剪框。
再如,可以通过以下公式确定每一个检测框的覆盖代价:
Figure PCTCN2021117458-appb-000002
其中,S 2i表示所述至少一个检测框中的第i个检测框的覆盖代价,C i表示第C个图像帧中的第i个检测框,T k表示第C个图像帧中的第k个文本框, m表示第C个图像帧中的文本框的数量,B(C i,T k)表示检测框C i和文本框的覆盖代价,λ 1表示检测框C i和文本框T k的覆盖系数。例如,λ 1大于等于0且小于1。
例如,可以通过以下公式确定所述至少一个覆盖代价:
x(1-x);
其中,x表示所述同一检测框和所述文本框的重叠面积。
在本申请的一些实施例中,所述文本框中包括所述第一图像中的文字或商标所在的区域。
例如,所述第一图像中的文字可以是第一图像帧的字幕。
在本申请的一些实施例中,所述S220可包括:
基于所述检测框的距离比值确定所述检测框的距离代价;
其中,所述检测框的距离比值为所述检测框的平滑距离与所述第一长度的比值,所述第一长度为所述第一图像帧的与第一连线平行的边长,所述第一连线为所述检测框和所述上一个图像帧的裁剪框形成的连线,所述检测框的距离代价随所述检测框的距离比值的增加而增加;所述检测框的代价包括所述检测框的距离代价。
简言之,可基于所述检测框的平滑距离,确定所述检测框的代价。
通过所述检测框相对第二图像帧的裁剪框的距离,确定所述裁剪框,相当于,在确定所述裁剪框的过程中,能够尽可能的降低裁剪后的多张图像帧在视频中的位置移动幅度,以避免镜头发生频繁移动,进而,可提升裁剪效果。
换言之,可基于至少一个距离比值确定所述至少一个检测框分别对应的至少一个距离代价,所述至少一个距离比值为至少一个平滑距离分别与所述第一图像帧第一边长的长度的比值,所述至少一个平滑距离分别为所述至少一个检测框的平滑距离,所述第一边长平行于所述至少一个检测框的分布方向,所述至少一个检测框中的同一检测框对应的距离代价随所述同一检测框和所述第二裁剪框之间的距离的增加而增加;基于所述至少一个距离代价确定所述裁剪框。
再如,可以通过以下公式确定每一个检测框的距离代价:
S 3i=λ 2|(L(C i)-L(D t))/A|;
其中,S 3i表示所述至少一个检测框中的第i个检测框的距离代价,C i表 示第C个图像帧中的第i个检测框,λ 2表示检测框C i相对检测框D j的平滑系数,L(C i)表示检测框C i的位置,D t表示第D个图像帧的裁剪框,L(D t)表示第D个图像帧的裁剪框的位置,A表示所述第一图像帧的第一边长的长度。例如,所述第一边长为所述至少一个检测框的排列方向。
在本申请的一些实施例中,所述方法200还可包括:
平滑或插值处理裁剪后的图像帧。
以上结合附图详细描述了本申请的优选实施方式,但是,本申请并不限于上述实施方式中的具体细节,在本申请的技术构思范围内,可以对本申请的技术方案进行多种简单变型,这些简单变型均属于本申请的保护范围。例如,在上述具体实施方式中所描述的各个具体技术特征,在不矛盾的情况下,可以通过任何合适的方式进行组合,为了避免不必要的重复,本申请对各种可能的组合方式不再另行说明。又例如,本申请的各种不同的实施方式之间也可以进行任意组合,只要其不违背本申请的思想,其同样应当视为本申请所公开的内容。
例如,可以基于所述至少一个检测框中每一个检测框对应的总代价,确定所述第一检测框。所述总代价可包括以下上文涉及的重要性代价、覆盖代价以及距离代价中的至少一项。
再如,可以通过以下公式确定每一个检测框的总代价:
Figure PCTCN2021117458-appb-000003
其中,S i表示所述至少一个检测框中的第i个检测框的总代价,C i表示第C个图像帧中的第i个检测框,I(C i)表示检测框C i的重要性得分,D j表示第D个图像帧的第j个检测框,I(D j)表示检测框D j的重要性得分,n表示第D个图像帧的检测框的数量,T k表示第C个图像帧中的第k个文本框,m表示第C个图像帧中的文本框的数量,B(C i,T k)表示检测框C i和文本框T k的覆盖代价,λ 1表示检测框C i和文本框的覆盖系数,λ 2表示检测框C i相对检测框D j的平滑系数,L(C i)表示检测框C i的位置,D t表示第D个图像帧的裁剪框,L(D t)表示第D个图像帧的裁剪框的位置。
基于此,可以将总代价最小的检测框确定为所述第一检测框。
上文结合图2详细描述了本申请的方法实施例,下文结合图3至图4,详细描述本申请的装置实施例。
图3是本申请实施例提供的裁剪视频的装置300的示意性框图。
获取单元310,用于获取第一图像帧的至少一个检测框;
确定单元320,用于确定所述至少一个检测框中的第一检测框;
裁剪单元330,用于将所述第一检测框作为裁剪框裁剪所述第一图像帧。
其特征在于,所述至少一个检测框为预设的检测框。
在本申请的一些实施例中,所述确定单元320具体用于:
基于至少一个重要性得分确定所述第一检测框,所述至少一个重要性得分表征所述至少一个检测框分别在所述第一图像帧中的重要程度。
在本申请的一些实施例中,所述获取单元310还用于:
通过显著性检测或人脸检测的方式,获取所述至少一个重要性得分。
在本申请的一些实施例中,所述确定单元320具体用于:
基于所述至少一个重要性得分确定所述至少一个检测框分别对应的至少一个重要性代价,所述至少一个检测框中的同一检测框对应的重要性代价随所述同一检测框的重要性得分的增加而减小;
基于所述至少一个重要性代价确定所述第一检测框。
在本申请的一些实施例中,所述确定单元320具体用于:
确定所述至少一个检测框中的每一个检测框的第一比值,所述至少一个检测框中的同一检测框的第一比值为所述同一检测框的重要性得分和所述第一图像帧的重要性得分的比值;
基于所述同一检测框的第一比值所述同一检测框对应的重要性代价,所述同一个检测框的重要性代价随所述同一检测框的第一比值的增加而减小。
在本申请的一些实施例中,所述确定单元320具体用于:
确定所述至少一个检测框中的每一个检测框的至少一个比值,所述至少一个检测框中同一检测框的至少一个比值包括所述同一检测框的重要性得分分别相对第二图像帧中的每一个检测框的重要性得分的比值,所述第二图像帧在时域上位于所述第一图像帧之前;
基于所述同一检测框的至少一个比值确定所述同一检测框对应的重要性代价,所述同一个检测框的重要性代价随所述至少一个比值中的比值的增加而减小。
在本申请的一些实施例中,所述确定单元320具体用于:
基于所述至少一个检测框和所述第一图像帧中的文本框的重叠情况,确 定所述第一检测框。
在本申请的一些实施例中,所述确定单元320具体用于:
基于所述至少一个检测框和所述文本框的重叠情况,确定所述至少一个检测框分别对应的至少一个覆盖代价,所述至少一个检测框中的同一检测框对应的覆盖代价随覆盖面积的增加先减小后增加,所述覆盖面积为所述同一检测框和所述文本框的重叠面积;
基于至少一个覆盖代价确定所述第一检测框。
在本申请的一些实施例中,所述文本框中包括所述第一图像中的文字或商标所在的区域。
在本申请的一些实施例中,所述确定单元320具体用于:
基于所述至少一个检测框中的每一个检测框相对第二图像帧中的第二裁剪框的距离,确定所述第一检测框,所述第二图像帧在时域上位于所述第一图像帧之前。
在本申请的一些实施例中,所述确定单元320具体用于:
基于至少一个距离比值确定所述至少一个检测框分别对应的至少一个距离代价,所述至少一个距离比值为至少一个距离分别与所述第一图像帧第一边长的长度的比值,所述至少一个距离为所述至少一个裁剪框分别相对所述第二裁剪框的距离,所述第一边长平行于所述至少一个检测框的分布方向,所述至少一个检测框中的同一检测框对应的距离代价随所述同一检测框和所述第二裁剪框之间的距离的增加而增加;
基于所述至少一个距离代价确定所述第一检测框。
在本申请的一些实施例中,所述第一图像帧为经过去除边框或经过模糊处理的图像帧。
在本申请的一些实施例中,所述裁剪单元330还用于:
平滑或插值处理裁剪后的图像帧。
应理解,装置实施例与方法实施例可以相互对应,类似的描述可以参照方法实施例。为避免重复,此处不再赘述。具体地,图3所示的装置300可以对应于执行本申请实施例的方法200中的相应主体,并且装置300中的各个模块的前述和其它操作和/或功能分别为了实现图2中的各个方法中的相应流程,为了简洁,在此不再赘述。
上文中结合附图从功能模块的角度描述了本申请实施例的装置和系统。 应理解,该功能模块可以通过硬件形式实现,也可以通过软件形式的指令实现,还可以通过硬件和软件模块组合实现。具体地,本申请实施例中的方法实施例的各步骤可以通过处理器中的硬件的集成逻辑电路和/或软件形式的指令完成,结合本申请实施例公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。可选地,软件模块可以位于随机存储器,闪存、只读存储器、可编程只读存储器、电可擦写可编程存储器、寄存器等本领域的成熟的存储介质中。该存储介质位于存储器,处理器读取存储器中的信息,结合其硬件完成上述方法实施例中的步骤。
图4是本申请实施例提供的电子设备400的示意性框图。
如图4所示,该电子设备400可包括:
存储器410和处理器420,该存储器410用于存储计算机程序411,并将该程序代码411传输给该处理器420。换言之,该处理器420可以从存储器410中调用并运行计算机程序411,以实现本申请实施例中的方法。
例如,该处理器420可用于根据该计算机程序411中的指令执行上述方法200中的步骤。
在本申请的一些实施例中,该处理器420可以包括但不限于:
通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等等。
在本申请的一些实施例中,该存储器410包括但不限于:
易失性存储器和/或非易失性存储器。其中,非易失性存储器可以是只读存储器(Read-Only Memory,ROM)、可编程只读存储器(Programmable ROM,PROM)、可擦除可编程只读存储器(Erasable PROM,EPROM)、电可擦除可编程只读存储器(Electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(Random Access Memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(Static RAM,SRAM)、动态随机存取存储器(Dynamic RAM,DRAM)、同步动态随机存取存储器(Synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(Double Data Rate SDRAM,DDR  SDRAM)、增强型同步动态随机存取存储器(Enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synch link DRAM,SLDRAM)和直接内存总线随机存取存储器(Direct Rambus RAM,DR RAM)。
在本申请的一些实施例中,该计算机程序411可以被分割成一个或多个模块,该一个或者多个模块被存储在该存储器410中,并由该处理器420执行,以完成本申请提供的录制页面的方法。该一个或多个模块可以是能够完成特定功能的一系列计算机程序指令段,该指令段用于描述该计算机程序411在该电子设备400中的执行过程。
如图4所示,该电子设备400还可包括:
收发器440,该收发器440可连接至该处理器420或存储器410。
其中,处理器420可以控制该收发器440与其他设备进行通信,具体地,可以向其他设备发送信息或数据,或接收其他设备发送的信息或数据。收发器440可以包括发射机和接收机。收发器440还可以进一步包括天线,天线的数量可以为一个或多个。
应当理解,该电子设备400中的各个组件通过总线系统相连,其中,总线系统除包括数据总线之外,还包括电源总线、控制总线和状态信号总线。
本申请还提供了一种计算机存储介质,其上存储有计算机程序,该计算机程序被计算机执行时使得该计算机能够执行上述方法实施例的方法。或者说,本申请实施例还提供一种包含指令的计算机程序产品,该指令被计算机执行时使得计算机执行上述方法实施例的方法。
当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。该计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行该计算机程序指令时,全部或部分地产生按照本申请实施例该的流程或功能。该计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。该计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,该计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。该计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。该可用介质可以是磁性 介质(例如,软盘、硬盘、磁带)、光介质(例如数字视频光盘(digital video disc,DVD))、或者半导体介质(例如固态硬盘(solid state disk,SSD))等。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的模块及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,该模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个模块或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或模块的间接耦合或通信连接,可以是电性,机械或其它的形式。
作为分离部件说明的模块可以是或者也可以不是物理上分开的,作为模块显示的部件可以是或者也可以不是物理模块,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。例如,在本申请各个实施例中的各功能模块可以集成在一个处理模块中,也可以是各个模块单独物理存在,也可以两个或两个以上模块集成在一个模块中。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应所述以权利要求的保护范围为准。

Claims (13)

  1. 一种裁剪视频的方法,其特征在于,包括:
    获取第一图像帧的至少一个检测框;
    根据所述至少一个检测框中的任意一个检测框的重要性得分、覆盖面积以及平滑距离中的至少一项,确定所述检测框的代价;
    其中,所述重要性得分用于表征所述检测框在所述第一图像帧中的重要程度,所述覆盖面积用于表征所述检测框和所述第一图像帧中的文本框的重叠面积,所述平滑距离用于表征所述检测框和所述第一图像帧的上一个图像帧的裁剪框之间的距离;
    将所述至少一个检测框中代价最小的第一检测框确定为裁剪框;
    基于所述裁剪框裁剪所述第一图像帧。
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述至少一个检测框中的任意一个检测框的重要性得分、覆盖面积以及平滑距离中的至少一项,确定所述检测框的代价,包括:
    基于所述检测框的重要性得分确定所述检测框的重要性代价;
    其中,所述检测框的重要性代价随所述检测框的重要性得分的增加而减小;所述检测框的代价包括所述检测框的重要性代价。
  3. 根据权利要求2所述的方法,其特征在于,所述基于所述检测框的重要性得分确定所述检测框的重要性代价,包括:
    确定所述检测框的第一比值,所述第一比值为所述检测框的重要性得分和所述第一图像帧的重要性得分的比值;
    基于所述检测框的第一比值确定所述检测框的重要性代价,所述检测框的重要性代价随所述检测框的第一比值的增加而减小。
  4. 根据权利要求2所述的方法,其特征在于,所述基于所述检测框的重要性得分确定所述检测框的重要性代价,包括:
    确定所述检测框的至少一个比值,所述检测框的至少一个比值包括所述检测框的重要性得分分别相对所述上一个图像帧中的每一个检测框的重要性得分的比值;
    基于所述检测框的至少一个比值,确定所述检测框的重要性代价,所述检测框的重要性代价随所述至少一个比值中的比值的增加而减小。
  5. 根据权利要求2所述的方法,其特征在于,所述方法还包括:
    通过显著性检测和/或人脸检测的方式,获取所述检测框的重要性得分。
  6. 根据权利要求1所述的方法,其特征在于,所述根据所述至少一个检测框中的任意一个检测框的重要性得分、覆盖面积以及平滑距离中的至少一项,确定所述检测框的代价,包括:
    基于所述检测框和所述文本框的重叠面积,确定所述检测框的覆盖代价;
    其中,所述检测框对应的覆盖代价随所述检测框的覆盖面积的增加先减小后增加;所述检测框的代价包括所述检测框的覆盖代价。
  7. 根据权利要求6所述的方法,其特征在于,所述文本框中包括所述第一图像中的文字或商标所在的区域。
  8. 根据权利要求1所述的方法,其特征在于,所述根据所述至少一个检测框中的任意一个检测框的重要性得分、覆盖面积以及平滑距离中的至少一项,确定所述检测框的代价,包括:
    基于所述检测框的距离比值确定所述检测框的距离代价;
    其中,所述检测框的距离比值为所述检测框的平滑距离与所述第一长度的比值,所述第一长度为所述第一图像帧的与第一连线平行的边长,所述第一连线为所述检测框和所述上一个图像帧的裁剪框形成的连线,所述检测框的距离代价随所述检测框的距离比值的增加而增加;所述检测框的代价包括所述检测框的距离代价。
  9. 根据权利要求1至8中任一项所述的方法,其特征在于,所述第一图像帧为经过去除边框或经过模糊处理的图像帧。
  10. 根据权利要求1至8中任一项所述的方法,其特征在于,所述方法还包括:
    平滑或插值处理裁剪后的图像帧。
  11. 一种裁剪视频的装置,其特征在于,包括:
    获取单元,用于获取第一图像帧的至少一个检测框;
    确定单元,用于:
    根据所述至少一个检测框中的任意一个检测框的重要性得分、覆盖面积以及平滑距离中的至少一项,确定所述检测框的代价;
    其中,所述重要性得分用于表征所述检测框在所述第一图像帧中的重要 程度,所述覆盖面积用于表征所述检测框和所述第一图像帧中的文本框的重叠面积,所述平滑距离用于表征所述检测框和所述第一图像帧的上一个图像帧的裁剪框之间的距离;
    将所述至少一个检测框中代价最小的第一检测框确定为裁剪框;
    裁剪单元,用于基于所述裁剪框裁剪所述第一图像帧。
  12. 一种电子设备,其特征在于,包括:
    处理器和存储器,所述存储器用于存储计算机程序,所述处理器用于调用并运行所述存储器中存储的计算机程序,以执行权利要求1至10中任一项所述的方法。
  13. 一种计算机可读存储介质,其特征在于,用于存储计算机程序,所述计算机程序使得计算机执行如权利要求1至10中任一项所述的方法。
PCT/CN2021/117458 2020-09-30 2021-09-09 裁剪视频的方法、装置、设备以及存储介质 WO2022068551A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP21874212.0A EP4224869A4 (en) 2020-09-30 2021-09-09 VIDEO CUTTING METHOD AND APPARATUS AND APPARATUS AND STORAGE MEDIUM
US18/001,067 US11881007B2 (en) 2020-09-30 2021-09-09 Video cropping method and apparatus, device, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011061772.8 2020-09-30
CN202011061772.8A CN112188283B (zh) 2020-09-30 2020-09-30 裁剪视频的方法、装置、设备以及存储介质

Publications (1)

Publication Number Publication Date
WO2022068551A1 true WO2022068551A1 (zh) 2022-04-07

Family

ID=73947708

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/117458 WO2022068551A1 (zh) 2020-09-30 2021-09-09 裁剪视频的方法、装置、设备以及存储介质

Country Status (4)

Country Link
US (1) US11881007B2 (zh)
EP (1) EP4224869A4 (zh)
CN (1) CN112188283B (zh)
WO (1) WO2022068551A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112188283B (zh) * 2020-09-30 2022-11-15 北京字节跳动网络技术有限公司 裁剪视频的方法、装置、设备以及存储介质

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130182134A1 (en) * 2012-01-16 2013-07-18 Google Inc. Methods and Systems for Processing a Video for Stabilization Using Dynamic Crop
US20140307112A1 (en) * 2013-04-16 2014-10-16 Nokia Corporation Motion Adaptive Cropping for Video Stabilization
CN109640138A (zh) * 2013-07-23 2019-04-16 微软技术许可有限责任公司 用于视频稳定的自适应路径平滑
CN110189378A (zh) * 2019-05-23 2019-08-30 北京奇艺世纪科技有限公司 一种视频处理方法、装置及电子设备
CN111356016A (zh) * 2020-03-11 2020-06-30 北京松果电子有限公司 视频处理方法、视频处理装置及存储介质
CN111695540A (zh) * 2020-06-17 2020-09-22 北京字节跳动网络技术有限公司 视频边框识别方法及裁剪方法、装置、电子设备及介质
US20200304754A1 (en) * 2019-03-20 2020-09-24 Adobe Inc. Intelligent video reframing
CN112188283A (zh) * 2020-09-30 2021-01-05 北京字节跳动网络技术有限公司 裁剪视频的方法、装置、设备以及存储介质

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8891009B2 (en) * 2011-08-29 2014-11-18 Futurewei Technologies, Inc. System and method for retargeting video sequences
WO2015044947A1 (en) * 2013-09-30 2015-04-02 Yanai Danielle Image and video processing and optimization
US10366497B2 (en) * 2016-06-10 2019-07-30 Apple Inc. Image/video editor with automatic occlusion detection and cropping
WO2018106213A1 (en) * 2016-12-05 2018-06-14 Google Llc Method for converting landscape video to portrait mobile layout
US20190130189A1 (en) * 2017-10-30 2019-05-02 Qualcomm Incorporated Suppressing duplicated bounding boxes from object detection in a video analytics system
US11282163B2 (en) * 2017-12-05 2022-03-22 Google Llc Method for converting landscape video to portrait mobile layout using a selection interface
US10929979B1 (en) * 2018-12-28 2021-02-23 Facebook, Inc. Systems and methods for processing content
US10997692B2 (en) * 2019-08-22 2021-05-04 Adobe Inc. Automatic image cropping based on ensembles of regions of interest
EP3895065A1 (en) * 2019-12-13 2021-10-20 Google LLC Personalized automatic video cropping
US11184558B1 (en) * 2020-06-12 2021-11-23 Adobe Inc. System for automatic video reframing

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130182134A1 (en) * 2012-01-16 2013-07-18 Google Inc. Methods and Systems for Processing a Video for Stabilization Using Dynamic Crop
US20140307112A1 (en) * 2013-04-16 2014-10-16 Nokia Corporation Motion Adaptive Cropping for Video Stabilization
CN109640138A (zh) * 2013-07-23 2019-04-16 微软技术许可有限责任公司 用于视频稳定的自适应路径平滑
US20200304754A1 (en) * 2019-03-20 2020-09-24 Adobe Inc. Intelligent video reframing
CN110189378A (zh) * 2019-05-23 2019-08-30 北京奇艺世纪科技有限公司 一种视频处理方法、装置及电子设备
CN111356016A (zh) * 2020-03-11 2020-06-30 北京松果电子有限公司 视频处理方法、视频处理装置及存储介质
CN111695540A (zh) * 2020-06-17 2020-09-22 北京字节跳动网络技术有限公司 视频边框识别方法及裁剪方法、装置、电子设备及介质
CN112188283A (zh) * 2020-09-30 2021-01-05 北京字节跳动网络技术有限公司 裁剪视频的方法、装置、设备以及存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4224869A4

Also Published As

Publication number Publication date
EP4224869A4 (en) 2023-12-20
US20230206591A1 (en) 2023-06-29
CN112188283A (zh) 2021-01-05
EP4224869A1 (en) 2023-08-09
US11881007B2 (en) 2024-01-23
CN112188283B (zh) 2022-11-15

Similar Documents

Publication Publication Date Title
WO2021017261A1 (zh) 识别模型训练方法、图像识别方法、装置、设备及介质
CN110473137B (zh) 图像处理方法和装置
KR102348636B1 (ko) 이미지 처리 방법, 단말기 및 저장 매체
US9699380B2 (en) Fusion of panoramic background images using color and depth data
WO2019201042A1 (zh) 图像对象识别方法和装置、存储介质及电子装置
JP2014229317A (ja) 1つ以上の画像処理アルゴリズムの自動選択のための方法およびシステム
US11790584B2 (en) Image and text typesetting method and related apparatus thereof
CN110489951A (zh) 风险识别的方法、装置、计算机设备和存储介质
CN112991180B (zh) 图像拼接方法、装置、设备以及存储介质
CN111666960A (zh) 图像识别方法、装置、电子设备及可读存储介质
US11625819B2 (en) Method and device for verifying image and video
CN110889824A (zh) 一种样本生成方法、装置、电子设备及计算机可读存储介质
US20240078680A1 (en) Image segmentation method, network training method, electronic equipment and storage medium
CN111062426A (zh) 一种建立训练集的方法、装置、电子设备以及介质
WO2021189770A1 (zh) 基于人工智能的图像增强处理方法、装置、设备及介质
WO2022068551A1 (zh) 裁剪视频的方法、装置、设备以及存储介质
CN114445651A (zh) 一种语义分割模型的训练集构建方法、装置及电子设备
CN113902932A (zh) 特征提取方法、视觉定位方法及装置、介质和电子设备
CN111428740A (zh) 网络翻拍照片的检测方法、装置、计算机设备及存储介质
CN113158773B (zh) 一种活体检测模型的训练方法及训练装置
WO2020029181A1 (zh) 三维卷积神经网络计算装置及相关产品
CN112101479B (zh) 一种发型识别方法及装置
WO2021164329A1 (zh) 图像处理方法、装置、通信设备及可读存储介质
CN113538269A (zh) 图像处理方法及装置、计算机可读存储介质和电子设备
CN109784226B (zh) 人脸抓拍方法及相关装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21874212

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2021874212

Country of ref document: EP

Effective date: 20230502