WO2022116990A1 - 视频裁剪方法、装置、存储介质及电子设备 - Google Patents

视频裁剪方法、装置、存储介质及电子设备 Download PDF

Info

Publication number
WO2022116990A1
WO2022116990A1 PCT/CN2021/134720 CN2021134720W WO2022116990A1 WO 2022116990 A1 WO2022116990 A1 WO 2022116990A1 CN 2021134720 W CN2021134720 W CN 2021134720W WO 2022116990 A1 WO2022116990 A1 WO 2022116990A1
Authority
WO
WIPO (PCT)
Prior art keywords
frame
cropping
target
target video
candidate
Prior art date
Application number
PCT/CN2021/134720
Other languages
English (en)
French (fr)
Inventor
吴昊
王长虎
Original Assignee
北京有竹居网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京有竹居网络技术有限公司 filed Critical 北京有竹居网络技术有限公司
Priority to US18/255,473 priority Critical patent/US20240112299A1/en
Publication of WO2022116990A1 publication Critical patent/WO2022116990A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/04Context-preserving transformations, e.g. by using an importance map
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • H04N21/440245Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display the reformatting operation being performed only on part of the stream, e.g. a region of the image or a time segment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4007Scaling of whole images or parts thereof, e.g. expanding or contracting based on interpolation, e.g. bilinear interpolation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/47205End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for manipulating displayed content, e.g. interacting with MPEG-4 objects, editing locally
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20112Image segmentation details
    • G06T2207/20132Image cropping

Definitions

  • the present disclosure relates to the technical field of video processing, and in particular, to a video cropping method, apparatus, storage medium and electronic device.
  • Video smart cropping is a technology required in scenarios where the video playback size is inconsistent with the original video.
  • the video intelligent cropping algorithm of the related art usually uses a cropping frame of the same size to crop each frame of the video, and then recombines each cropped frame into a video.
  • the content information contained in each frame of the video may be quite different. If a cropping frame of the same size is used to crop each frame, most of the content of the picture may be missing, affecting the quality of the cropped video.
  • the present disclosure provides a video cropping method, the method comprising:
  • the original video is cropped according to the target cropping frame corresponding to each frame of the picture.
  • the present disclosure provides a video cropping device, the device comprising:
  • the acquisition module is used to acquire the original video to be cropped
  • a frame extraction module for performing frame extraction processing on the original video to obtain multiple target video frames
  • a determination module for each described target video frame, according to the main content in this target video frame, determine the target candidate cropping frame corresponding to this target video frame;
  • an interpolation module configured to perform interpolation processing according to the target candidate cropping frame corresponding to each target video frame, and determine the target cropping frame corresponding to each frame in the original video;
  • a cropping module configured to crop the original video according to the target cropping frame corresponding to each frame of the picture.
  • the present disclosure provides a computer-readable medium on which a computer program is stored, and when the program is executed by a processing apparatus, implements the steps of the method described in the first aspect.
  • the present disclosure provides an electronic device, comprising:
  • a processing device is configured to execute the computer program in the storage device to implement the steps of the method in the first aspect.
  • Fig. 1 is the cropping schematic diagram of the video cropping method in the related art
  • FIG. 2 is a flowchart of a video cropping method according to an exemplary embodiment of the present disclosure
  • FIG. 3 is a schematic diagram of cropping in a video cropping method according to an exemplary embodiment of the present disclosure
  • FIG. 4 is a block diagram of a video cropping apparatus according to an exemplary embodiment of the present disclosure.
  • FIG. 5 is a block diagram of an electronic device according to an exemplary embodiment of the present disclosure.
  • the term “including” and variations thereof are open-ended inclusions, ie, "including but not limited to”.
  • the term “based on” is “based at least in part on.”
  • the term “one embodiment” means “at least one embodiment”; the term “another embodiment” means “at least one additional embodiment”; the term “some embodiments” means “at least some embodiments”. Relevant definitions of other terms will be given in the description below.
  • the video cropping algorithm in the related art usually uses a cropping frame of the same size to crop each frame of the video, and then reassembles each cropped frame into a video.
  • the content information contained in each frame of the video may be quite different. If a cropping frame of the same size is used to crop each frame, most of the content of the picture may be missing, affecting the quality of the cropped video.
  • the two pictures shown in FIG. 1 are two frames of the same video (16:9 size), and are cropped using a 9:16 cropping frame at the same time.
  • a crop box of this size can cover most of the main content in the frame.
  • no matter where in the picture the cropping frame of this size is used for cropping most of the main content will be missing, affecting the quality of the cropped video.
  • the present disclosure provides a video cropping method, device, storage medium, and electronic equipment, so as to solve the above-mentioned problems in the video cropping process in the related art, and realize dynamic adjustment of corresponding cropping according to the main content in different target video frames.
  • the purpose of the frame size is to retain most of the main content of each frame in the original video and improve the quality of the cropped video.
  • FIG. 2 is a flowchart of a video cropping method according to an exemplary embodiment of the present disclosure.
  • the video cropping method may include the following steps:
  • Step 201 Obtain the original video to be cropped.
  • the user can input a URL (Uniform Resource Locator) corresponding to the original video in the electronic device, and then the electronic device can download the original video from the corresponding resource server according to the URL to perform video trimming.
  • the electronic device may, in response to a video trimming request triggered by the user, acquire the stored video from the memory as the original video for video trimming, etc.
  • the embodiment of the present disclosure does not limit the acquisition method of the original video.
  • Step 202 performing frame extraction processing on the original video to obtain multiple target video frames.
  • the frame extraction process may be to extract some video frames from each video frame corresponding to the original video as the target video frame. In this way, the calculation amount of the subsequent processing process can be reduced, and the video clipping efficiency can be improved. Certainly, without considering the amount of computation and efficiency, the frame extraction process may also be to extract all video frames corresponding to the original video as target video frames, which is not limited in this embodiment of the present disclosure.
  • Step 203 for each target video frame, determine a target candidate cropping frame corresponding to the target video frame according to the main content in the target video frame.
  • the main content may be the main picture content occupying most of the image area, for example, the vehicle in FIG. 1 is the main content of the video frame.
  • at least one of the following detection methods may be performed to determine the main content: saliency detection, face detection, text detection, and logo detection.
  • saliency detection is used to detect the position of the main component of the target video frame.
  • Face detection is used to detect the location of the face in the target video frame.
  • Text detection is used to detect the position and content of text in the target video frame.
  • logo detection is used to detect the location of the logo, watermark, etc. in the target video frame.
  • each target video frame may correspond to a plurality of candidate cropping frames, and then a target candidate cropping frame may be determined in the plurality of candidate cropping frames according to the main content in the target video frame, and the target candidate cropping frame may include the The main content in the target video frame, so that most of the missing main content in the clipped video can be improved, and the video clipping quality can be improved.
  • Step 204 Perform interpolation processing according to the target candidate cropping frame corresponding to each target video frame to determine the target cropping frame corresponding to each frame in the original video.
  • the interpolation process may be to perform interpolation calculation according to the position coordinates of the target candidate cropping frame corresponding to each target video frame to obtain the position coordinates of the target cropping frames corresponding to other frames in the original video.
  • the specific interpolation processing method is similar to that in the related art, and details are not repeated here.
  • the size of the target cropping frame corresponding to other frames may be determined according to the size of the target candidate cropping frame, that is, the size of the target cropping frame corresponding to other frames is the same as the target candidate cropping frame.
  • the target candidate cropping frame can also be smoothed and denoised to improve the accuracy of the result. That is to say, the smoothing process can be performed according to the target candidate cropping frame corresponding to each target video frame to obtain the smoothing candidate cropping frame corresponding to each target video frame, and then the smoothing candidate cropping frame corresponding to each target video frame can be processed. Interpolate to obtain the target cropping frame corresponding to each frame in the original video.
  • the target candidate cropping frame may be smoothed by any smoothing and denoising method in the related art, for example, the target candidate cropping frame may be smoothed by a Gaussian filter, etc. This is not limited in this embodiment of the present disclosure.
  • Step 205 Crop the original video according to the target cropping frame corresponding to each frame of the picture.
  • the length and width of the corresponding frame picture in the original video may be cropped according to the target cropping frame, respectively.
  • the length or width of the corresponding frame picture may be cropped according to the size of the target cropping frame and the size of the original video. For example, if the size of the target cropping frame is 1:1, and the size of the original video is 720 ⁇ 1280 pixels, then the length of the corresponding frame picture (along the y-axis direction) can be cropped, and the size of the cropped video is 720 ⁇ 720 pixels.
  • each cropped frame can be re-spliced into a video to obtain a cropped video, and then the cropped video can be displayed. to users.
  • the target candidate cropping frame corresponding to each target video frame can be dynamically determined according to the main content in each target video frame, so that different target cropping frames for each frame of picture can be determined through interpolation processing.
  • the two pictures shown in FIG. 3 are two frames of pictures in the same video (with a size of 16:9).
  • the size is 9 : 16 cropping frame for cropping, which can include the main content in the frame picture.
  • cropping by a cropping frame with a size of 16:9 can include the main content in the framed picture.
  • it can improve most of the missing main content in the cropped video. , thereby improving the video cropping quality.
  • determining the target candidate cropping frame corresponding to the target video frame may be: for each target video frame, according to the target video frame
  • the main content of the target video frame and the multiple candidate cropping frames corresponding to the target video frame calculate the cost function, and then determine the target candidate cropping frame that minimizes the calculation result of the cost function in the multiple candidate cropping frames.
  • the cost function includes a first function used to characterize the importance of the main content in the target video frame and a second function used to characterize the size difference of the candidate cropping frame in the two target video frames.
  • multiple candidate cropping frames corresponding to the target video frame may be determined in the following manner: for each target video frame, a preset cropping frame is used as an initial candidate cropping frame, and the candidate cropping frame is moved according to a preset position offset. frame to obtain the new candidate cropping frame position, until the boundary of the candidate cropping frame coincides with the target video frame or exceeds the target video frame.
  • the preset position offset may be set according to the actual situation, which is not limited in this embodiment of the present disclosure. For example, the preset position offset may be set to 20 pixels.
  • multiple candidate cropping frames corresponding to the target video frame can be obtained.
  • the first function can be calculated by: for each candidate cropping frame, determining the content inclusion degree of the candidate cropping frame to the subject content in the target video frame, and determining the subject contained in the candidate cropping frame The content ratio between the content and the complete main content in the target video frame, and then the first function is calculated according to the content inclusion degree and the content ratio.
  • the content inclusion degree may be the ratio between the main content included in the candidate cropping frame and the area of the candidate cropping frame, that is, the content inclusion degree may be understood as the main content inclusion degree in the candidate cropping frame per unit area.
  • the content ratio can be the ratio between the main content contained in the candidate cropping frame and the complete main content in the target video frame, that is, the result obtained by comparing the main content contained in the candidate cropping frame to the complete main content in the target video frame.
  • f represents the calculation result of the first function
  • ⁇ 1 and ⁇ 2 represent preset weight values
  • A(C i ) represents the main content contained in the candidate cropping frame C i corresponding to the ith target video frame
  • S (C i ) represents the area of the candidate cropping frame C i corresponding to the ith target video frame
  • A(I i ) represents the complete main content in the ith target video frame
  • the preset weight value may be determined according to an actual situation, which is not limited in this embodiment of the present disclosure, as long as the preset weight value is a value greater than 0 and less than 1.
  • the calculation weight of the content inclusion degree and the content ratio in the calculation process of the first function can be adjusted by the preset weight value, so that the content inclusion degree and the content ratio are balanced with each other.
  • the first function may not be calculated using the preset weight value, that is, the expression of the first function may also be:
  • the first function is calculated according to any of the above methods.
  • the frame of the entire target video frame picture cannot realize the problem of video cropping.
  • the second function may be calculated by: determining the width difference and the length difference between the candidate cropping frame in the first target video frame and the candidate cropping frame in the second target video frame, wherein the first The target video frame is an adjacent previous video frame of the second target video frame, and then the second function is calculated according to the width difference value and the length difference value.
  • the expression of the second function may be:
  • Wi represents the width of the candidate cropping frame corresponding to the ith target video frame
  • Wi -1 represents the width of the candidate cropping frame corresponding to the ith target video frame
  • H i represents the corresponding width of the ith target video frame
  • H i-1 represents the length of the candidate cropping frame corresponding to the i-1 th target video frame.
  • weights can also be added during the calculation of the second function to facilitate the calculation, that is, the expression of the second function can also be: ⁇ 1 ⁇ (
  • ⁇ 1 represents a preset weight value
  • the value of ⁇ 1 can be set according to the actual situation, which is not limited in this embodiment of the present disclosure, as long as the preset weight value is a value greater than 0 and less than 1.
  • the second function can be used to constrain the size of the target candidate cropping frame of two adjacent frames to be as close as possible, so as to ensure the smooth change of the target candidate cropping frame, and prevent the cropped video screen from suddenly becoming larger or smaller. to improve the quality of the cropped video.
  • the cost function may also include a text energy function and a shot smoothing penalty function, where the text energy function and shot smoothing penalty function are consistent with those in the related art, and are briefly described here.
  • the text energy function can be used to represent the situation that the candidate cropping frame covers the text in the target video frame, and the expression of the text energy function can be: x(1-x), where x represents the candidate cropping frame to the text detection frame.
  • the coverage rate is the result of dividing the area of the text detection frame covered by the candidate cropping frame by the area of the text detection frame.
  • a weight intervention may be added to the literal energy function to facilitate calculation, that is, the expression of the literal energy function may also be: ⁇ 2 ⁇ (x(1-x)).
  • ⁇ 2 represents a preset weight value
  • the value of ⁇ 2 can be set according to the actual situation, which is not limited in this embodiment of the present disclosure, as long as the preset weight value is a value greater than 0 and less than 1.
  • the shot smoothing penalty function can be used to characterize the position offset degree of the candidate cropping frame in the two target video frames, and the expression of the shot smoothing penalty function can be:
  • a weight intervention may be added to the literal energy function to facilitate calculation, that is, the expression of the literal energy function may also be: ⁇ 3 ⁇
  • ⁇ 3 represents a preset weight value, and the value of ⁇ 3 can be set according to actual conditions, which is not limited in this embodiment of the present disclosure, as long as the preset weight value is a value greater than 0 and less than 1.
  • the expression of the cost function in the embodiment of the present disclosure may be:
  • F represents the calculation result of the cost function
  • n represents the number of target video frames obtained by frame extraction.
  • a target candidate cropping frame that minimizes the calculation result of the cost function may be determined in multiple candidate cropping frames corresponding to the target video frame.
  • This process can be understood as a dynamic programming process, that is, each target video frame has multiple states (that is, each target video frame corresponds to multiple candidate cropping frames), and each state has a score (that is, the calculation result of the cost function). ), there is a transition penalty for transitions between different states in different frames.
  • calculation can be performed for each candidate cropping frame of each target video frame according to the expression of the above-mentioned cost function, so as to determine the state of each candidate cropping frame. Then, calculate the state transition penalty between different frames, and record the corresponding state of each state in the previous frame that minimizes the calculation result of the cost function.
  • the state transition penalty can be understood as the position offset and size difference of the candidate cropping frame between different frames. That is, for each cropping frame candidate in the second target video frame, a cropping frame candidate that minimizes the calculation result of the cost function of the cropping frame candidate is determined in the first target video frame.
  • the first target video frame is an adjacent previous video frame of the second target video frame.
  • each candidate cropping frame has a corresponding candidate cropping frame of the previous frame with the smallest calculation result of the cost function. Then, starting from the nth target video frame, at the n-1th target video frame, a candidate cropping frame that minimizes the calculation result of the cost function can be determined, and then for the n-1th target video frame, at the n-th target video frame, Two target video frames determine the candidate cropping frame that minimizes the calculation result of the cost function, and so on, each target video frame can determine the target candidate cropping frame that minimizes the calculation result of the cost function, and the target candidate cropping frame can be The total cost function is minimized, so that the target candidate cropping frame can include the main content, and the cropping frame transition between different target video frames is smooth, which can improve the quality of video cropping.
  • each cropped frame is filled with borders.
  • the width of each frame after cropping be the cropping width
  • the length of each frame after cropping be the cropping length
  • the cropping width may be the desired cropping width input by the user
  • the cropping length may be the desired cropping length input by the user, which is not limited in this embodiment of the present disclosure.
  • the target cropping frame corresponding to each frame is determined according to the main content, and the size of the main content may be different, the size of the target cropping frame corresponding to each frame will be different, and it will be different from the expected cropping width and/or Or there may be differences in cut lengths. Therefore, in order to obtain a cropped video of a uniform size, border padding can be performed on each frame after cropping, so that the width of each frame after the cropping frame is the cropping width, and each frame after cropping is The length of the picture is the crop length.
  • each cropped frame can be filled with black borders, that is, when the width of the cropped frame is smaller than the cropped width, black borders are filled on the left and right sides of the frame, so that the black border and the cropped frame are filled with black borders.
  • the sum of the widths of the frames is equal to the crop width.
  • the target candidate cropping frame corresponding to each target video frame can be dynamically determined according to the main content in each target video frame, so that different target cropping frames for each frame can be determined through interpolation processing, so that The target cropping frame corresponding to each frame of picture includes most of the main content in each frame of picture, which improves the lack of most of the main content in the cropped video and improves the video cropping quality.
  • border filling can be performed on each frame after cropping to obtain a cropped video of the target size.
  • an embodiment of the present disclosure also provides a video cropping apparatus, which can become part or all of an electronic device through software, hardware, or a combination of the two. 4, the video cropping device includes:
  • an acquisition module 401 configured to acquire the original video to be cropped
  • Frame extraction module 402 for performing frame extraction processing on the original video to obtain multiple target video frames
  • Determining module 403 for each described target video frame, according to the main content in this target video frame, determine the target candidate cropping frame corresponding to this target video frame;
  • An interpolation module 404 configured to perform interpolation processing according to the target candidate cropping frame corresponding to each target video frame, and determine the target cropping frame corresponding to each frame in the original video;
  • the cropping module 405 is configured to crop the original video according to the target cropping frame corresponding to each frame of the picture.
  • the determining module 403 is used for:
  • a cost function is calculated according to the main content in the target video frame and a plurality of candidate cropping frames corresponding to the target video frame. a first function of the importance of the main content and a second function for characterizing the size difference of the candidate cropping frame in the two target video frames;
  • a target candidate cropping frame that minimizes the calculation result of the cost function is determined among the plurality of candidate cropping frames.
  • the first function is calculated by the following modules:
  • a content determination module configured to determine, for each candidate cropping frame, the degree of content inclusion of the candidate cropping frame to the main content in the target video frame, and determine the main content contained in the candidate cropping frame The content ratio between the complete main content in the target video frame;
  • a first calculation module configured to calculate the first function according to the content inclusion degree and the content proportion.
  • the expression of the first function is:
  • f represents the calculation result of the first function
  • ⁇ 1 and ⁇ 2 represent preset weight values
  • A(C i ) represents the main content contained in the candidate cropping frame C i corresponding to the ith target video frame
  • S (C i ) represents the area of the candidate cropping frame C i corresponding to the ith target video frame
  • A(I i ) represents the complete main content in the ith target video frame
  • the second function is calculated by the following modules:
  • a difference value determination module configured to determine the width difference and length difference between the candidate cropping frame in the first target video frame and the candidate cropping frame in the second target video frame, wherein the first target video frame is the The adjacent previous video frame of the second target video frame;
  • a second calculation module configured to calculate the second function according to the width difference and the length difference.
  • the interpolation module 404 is used to:
  • the apparatus 400 further includes:
  • a length and width obtaining module used for obtaining the cropping width and cropping length corresponding to the original video
  • the filling module is used to fill the frame of each frame after cropping, so that the width of each frame after cropping is the cropping width, and the length of each frame after cropping is the cropping length. .
  • an embodiment of the present disclosure also provides a computer-readable medium on which a computer program is stored, and when the program is executed by a processing apparatus, implements the steps of any of the foregoing video cropping methods.
  • an electronic device including:
  • a processing device is configured to execute the computer program in the storage device to implement the steps of any of the above video cropping methods.
  • Terminal devices in the embodiments of the present disclosure may include, but are not limited to, mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablets), PMPs (portable multimedia players), vehicle-mounted terminals (eg, mobile terminals such as in-vehicle navigation terminals), etc., and stationary terminals such as digital TVs, desktop computers, and the like.
  • the electronic device shown in FIG. 5 is only an example, and should not impose any limitation on the function and scope of use of the embodiments of the present disclosure.
  • an electronic device 500 may include a processing device (eg, a central processing unit, a graphics processor, etc.) 501 that may be loaded into random access according to a program stored in a read only memory (ROM) 502 or from a storage device 508 Various appropriate actions and processes are executed by the programs in the memory (RAM) 503 . In the RAM 503, various programs and data required for the operation of the electronic device 500 are also stored.
  • the processing device 501, the ROM 502, and the RAM 503 are connected to each other through a bus 504.
  • An input/output (I/O) interface 505 is also connected to bus 504 .
  • I/O interface 505 input devices 506 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speakers, vibration
  • An output device 507 such as a computer
  • a storage device 508 including, for example, a magnetic tape, a hard disk, etc.
  • Communication means 509 may allow electronic device 500 to communicate wirelessly or by wire with other devices to exchange data.
  • FIG. 5 shows electronic device 500 having various means, it should be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.
  • embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated in the flowchart.
  • the computer program may be downloaded and installed from the network via the communication device 509, or from the storage device 508, or from the ROM 502.
  • the processing apparatus 501 When the computer program is executed by the processing apparatus 501, the above-mentioned functions defined in the methods of the embodiments of the present disclosure are executed.
  • the computer-readable medium mentioned above in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two.
  • the computer-readable storage medium can be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above. More specific examples of computer readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Programmable read only memory (EPROM or flash memory), fiber optics, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with computer-readable program code embodied thereon. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • a computer-readable signal medium can also be any computer-readable medium other than a computer-readable storage medium that can transmit, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device .
  • Program code embodied on a computer readable medium may be transmitted using any suitable medium including, but not limited to, electrical wire, optical fiber cable, RF (radio frequency), etc., or any suitable combination of the foregoing.
  • any currently known or future developed network protocol such as HTTP (HyperText Transfer Protocol)
  • HTTP HyperText Transfer Protocol
  • communication networks include local area networks (“LAN”), wide area networks (“WAN”), the Internet (eg, the Internet), and peer-to-peer networks (eg, ad hoc peer-to-peer networks), as well as any currently known or future development network of.
  • the above-mentioned computer-readable medium may be included in the above-mentioned electronic device; or may exist alone without being assembled into the electronic device.
  • the above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device: obtains the original video to be cut; Obtain a plurality of target video frames; for each described target video frame, according to the main content in the target video frame, determine the target candidate cropping frame corresponding to the target video frame; The target candidate cropping frame is subjected to interpolation processing to determine the target cropping frame corresponding to each frame in the original video; the original video is cropped according to the target cropping frame corresponding to each frame.
  • Computer program code for performing operations of the present disclosure may be written in one or more programming languages, including but not limited to object-oriented programming languages—such as Java, Smalltalk, C++, and This includes conventional procedural programming languages - such as the "C" language or similar programming languages.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server.
  • the remote computer may be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (eg, using an Internet service provider to via Internet connection).
  • LAN local area network
  • WAN wide area network
  • each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more logical functions for implementing the specified functions executable instructions.
  • the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented in dedicated hardware-based systems that perform the specified functions or operations , or can be implemented in a combination of dedicated hardware and computer instructions.
  • the modules involved in the embodiments of the present disclosure may be implemented in software or hardware. Among them, the name of the module does not constitute a limitation of the module itself under certain circumstances.
  • exemplary types of hardware logic components include: Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), Systems on Chips (SOCs), Complex Programmable Logical Devices (CPLDs) and more.
  • FPGAs Field Programmable Gate Arrays
  • ASICs Application Specific Integrated Circuits
  • ASSPs Application Specific Standard Products
  • SOCs Systems on Chips
  • CPLDs Complex Programmable Logical Devices
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in connection with the instruction execution system, apparatus or device.
  • the machine-readable medium can be a machine-readable signal medium or a machine-readable storage medium.
  • Machine-readable media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices, or devices, or any suitable combination of the foregoing.
  • machine-readable storage media would include one or more wire-based electrical connections, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), fiber optics, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
  • RAM random access memory
  • ROM read only memory
  • EPROM or flash memory erasable programmable read only memory
  • CD-ROM compact disk read only memory
  • magnetic storage or any suitable combination of the foregoing.
  • Example 1 provides a video cropping method, the method comprising:
  • the original video is cropped according to the target cropping frame corresponding to each frame of the picture.
  • Example 2 provides the method of Example 1, wherein, for each target video frame, a target candidate corresponding to the target video frame is determined according to the main content in the target video frame Crop frame, including:
  • a cost function is calculated according to the main content in the target video frame and a plurality of candidate cropping frames corresponding to the target video frame. a first function of the importance of the main content and a second function for characterizing the size difference of the candidate cropping frame in the two target video frames;
  • a target candidate cropping frame that minimizes the calculation result of the cost function is determined among the plurality of candidate cropping frames.
  • Example 3 provides the method of Example 2 by computing the first function as follows:
  • For each candidate cropping frame determine the degree of content inclusion of the candidate cropping frame to the main content in the target video frame, and determine the main content contained in the candidate cropping frame and the target video frame The proportion of the content among the complete said main content;
  • the first function is calculated according to the content inclusion degree and the content proportion.
  • Example 4 provides the method of Example 3, and the expression of the first function is:
  • f represents the calculation result of the first function
  • ⁇ 1 and ⁇ 2 represent preset weight values
  • A(C i ) represents the main content contained in the candidate cropping frame C i corresponding to the ith target video frame
  • S (C i ) represents the area of the candidate cropping frame C i corresponding to the ith target video frame
  • A(I i ) represents the complete main content in the ith target video frame
  • Example 5 provides the method of Example 2, the second function is calculated by:
  • the second function is calculated from the width difference and the length difference.
  • Example 6 provides the method of any one of Examples 1-5, wherein performing interpolation processing on the target candidate cropping frame corresponding to each target video frame to determine the The target cropping frame corresponding to each frame in the original video, including:
  • Example 6 provides the method of any one of Examples 1-5, the method further comprising:
  • Border padding is performed on each frame after cropping, so that the width of each frame after cropping is the cropping width, and the length of each frame after cropping is the cropping length.
  • Example 8 provides a video cropping apparatus, the apparatus comprising:
  • the acquisition module is used to acquire the original video to be cropped
  • a frame extraction module for performing frame extraction processing on the original video to obtain multiple target video frames
  • a determination module for each described target video frame, according to the main content in this target video frame, determine the target candidate cropping frame corresponding to this target video frame;
  • an interpolation module configured to perform interpolation processing according to the target candidate cropping frame corresponding to each target video frame, and determine the target cropping frame corresponding to each frame in the original video;
  • a cropping module configured to crop the original video according to the target cropping frame corresponding to each frame of the picture.
  • Example 9 provides the apparatus of Example 8, the determining module for:
  • a cost function is calculated according to the main content in the target video frame and a plurality of candidate cropping frames corresponding to the target video frame. a first function of the importance of the main content and a second function used to characterize the difference in size of the candidate cropping frame in the two target video frames;
  • a target candidate cropping frame that minimizes the calculation result of the cost function is determined among the plurality of candidate cropping frames.
  • Example 10 provides the apparatus of Example 9, wherein the first function is calculated by:
  • a content determination module configured to determine, for each candidate cropping frame, the degree of content inclusion of the candidate cropping frame to the main content in the target video frame, and determine the main content contained in the candidate cropping frame The content ratio between the complete main content in the target video frame;
  • a first calculation module configured to calculate the first function according to the content inclusion degree and the content proportion.
  • Example 11 provides the apparatus of Example 10, and the expression of the first function is:
  • f represents the calculation result of the first function
  • ⁇ 1 and ⁇ 2 represent preset weight values
  • A(C i ) represents the main content contained in the candidate cropping frame C i corresponding to the ith target video frame
  • S (C i ) represents the area of the candidate cropping frame C i corresponding to the ith target video frame
  • A(I i ) represents the complete main content in the ith target video frame
  • Example 12 provides the apparatus of Example 9, the second function is computed by:
  • a difference value determination module configured to determine the width difference and length difference between the candidate cropping frame in the first target video frame and the candidate cropping frame in the second target video frame, wherein the first target video frame is the The adjacent previous video frame of the second target video frame;
  • a second calculation module configured to calculate the second function according to the width difference and the length difference.
  • Example 13 provides the apparatus of any of Examples 8 to 12, the interpolation module for:
  • Example 14 provides the apparatus of any of Examples 8 to 12, further comprising:
  • a length and width obtaining module used for obtaining the cropping width and cropping length corresponding to the original video
  • the filling module is used to fill the frame of each frame after cropping, so that the width of each frame after cropping is the cropping width, and the length of each frame after cropping is the cropping length. .
  • Example 15 provides a computer-readable medium having stored thereon a computer program that, when executed by a processing apparatus, implements the steps of the method of any one of Examples 1 to 7.
  • Example 16 provides an electronic device comprising:
  • a processing device for executing the computer program in the storage device to implement the steps of the method in any one of Examples 1 to 7.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Processing (AREA)
  • Television Signal Processing For Recording (AREA)

Abstract

本公开涉及一种视频裁剪方法、装置、存储介质及电子设备,以实现根据不同目标视频帧中的主体内容动态调整对应裁剪框尺寸的目的,保留原始视频中每一帧画面的大部分主体内容,提升裁剪后的视频质量。该视频裁剪方法包括:获取待裁剪的原始视频;对所述原始视频进行抽帧处理,以得到多个目标视频帧;针对每一所述目标视频帧,根据该目标视频帧中的主体内容,确定该目标视频帧对应的目标候选裁剪框;根据每一所述目标视频帧对应的所述目标候选裁剪框进行插值处理,以确定所述原始视频中每一帧画面对应的目标裁剪框;根据所述每一帧画面对应的所述目标裁剪框对所述原始视频进行裁剪。

Description

视频裁剪方法、装置、存储介质及电子设备
相关申请的交叉引用
本申请是以申请号为202011401449.0,申请日为2020年12月2日的中国申请为基础,并主张其优先权,该中国申请的公开内容在此作为整体引入本申请中。
技术领域
本公开涉及视频处理技术领域,具体地,涉及一种视频裁剪方法、装置、存储介质及电子设备。
背景技术
视频智能裁剪是作用于视频播放尺寸与原视频不一致的场景下所需的技术。相关技术的视频智能裁剪算法通常是使用尺寸相同的裁剪框去裁剪视频中的每一帧画面,然后再将裁剪完的每一帧画面重新组合成视频。但是,视频中每一帧画面所包含的内容信息可能存在较大差异,如果使用尺寸相同的裁剪框去裁剪每一帧画面,则可能造成大部分画面内容的缺失,影响裁剪后的视频质量。
发明内容
提供该发明内容部分以便以简要的形式介绍构思,这些构思将在后面的具体实施方式部分被详细描述。该发明内容部分并不旨在标识要求保护的技术方案的关键特征或必要特征,也不旨在用于限制所要求的保护的技术方案的范围。
第一方面,本公开提供一种视频裁剪方法,所述方法包括:
获取待裁剪的原始视频;
对所述原始视频进行抽帧处理,以得到多个目标视频帧;
针对每一所述目标视频帧,根据该目标视频帧中的主体内容,确定该目标视频帧对应的目标候选裁剪框;
根据每一所述目标视频帧对应的所述目标候选裁剪框进行插值处理,以确定所述原始视频中每一帧画面对应的目标裁剪框;
根据所述每一帧画面对应的所述目标裁剪框对所述原始视频进行裁剪。
第二方面,本公开提供一种视频裁剪装置,所述装置包括:
获取模块,用于获取待裁剪的原始视频;
抽帧模块,用于对所述原始视频进行抽帧处理,以得到多个目标视频帧;
确定模块,用于针对每一所述目标视频帧,根据该目标视频帧中的主体内容,确定该目标视频帧对应的目标候选裁剪框;
插值模块,用于根据每一所述目标视频帧对应的所述目标候选裁剪框进行插值处理,确定所述原始视频中每一帧画面对应的目标裁剪框;
裁剪模块,用于根据所述每一帧画面对应的所述目标裁剪框对所述原始视频进行裁剪。
第三方面,本公开提供一种计算机可读介质,其上存储有计算机程序,该程序被处理装置执行时实现第一方面中所述方法的步骤。
第四方面,本公开提供一种电子设备,包括:
存储装置,其上存储有计算机程序;
处理装置,用于执行所述存储装置中的所述计算机程序,以实现第一方面中所述方法的步骤。
本公开的其他特征和优点将在随后的具体实施方式部分予以详细说明。
附图说明
结合附图并参考以下具体实施方式,本公开各实施例的上述和其他特征、优点及方面将变得更加明显。贯穿附图中,相同或相似的附图标记表示相同或相似的元素。应当理解附图是示意性的,原件和元素不一定按照比例绘制。在附图中:
图1是相关技术中的视频裁剪方法的裁剪示意图;
图2是根据本公开一示例性实施例示出的一种视频裁剪方法的流程图;
图3是根据本公开一示例性实施例示出的一种视频裁剪方法中的裁剪示意图;
图4是根据本公开一示例性实施例示出的一种视频裁剪装置的框图;
图5是根据本公开一示例性实施例示出的一种电子设备的框图。
具体实施方式
下面将参照附图更详细地描述本公开的实施例。虽然附图中显示了本公开的某些实施例,然而应当理解的是,本公开可以通过各种形式来实现,而且不应该被解释为限于这里阐述的实施例,相反提供这些实施例是为了更加透彻和完整地理解本公开。应当理解的是,本公开的附图及实施例仅用于示例性作用,并非用于限制本公开的保护范围。
应当理解,本公开的方法实施方式中记载的各个步骤可以按照不同的顺序执行,和/或并行执行。此外,方法实施方式可以包括附加的步骤和/或省略执行示出的步骤。本公开 的范围在此方面不受限制。
本文使用的术语“包括”及其变形是开放性包括,即“包括但不限于”。术语“基于”是“至少部分地基于”。术语“一个实施例”表示“至少一个实施例”;术语“另一实施例”表示“至少一个另外的实施例”;术语“一些实施例”表示“至少一些实施例”。其他术语的相关定义将在下文描述中给出。
需要注意,本公开中提及的“第一”、“第二”等概念仅用于对不同的装置、模块或单元进行区分,并非用于限定这些装置、模块或单元所执行的功能的顺序或者相互依存关系。另外需要注意,本公开中提及的“一个”、“多个”的修饰是示意性而非限制性的,本领域技术人员应当理解,除非在上下文另有明确指出,否则应该理解为“一个或多个”。
本公开实施方式中的多个装置之间所交互的消息或者信息的名称仅用于说明性的目的,而并不是用于对这些消息或信息的范围进行限制。
正如背景技术所言,相关技术中的视频裁剪算法通常是使用尺寸相同的裁剪框去裁剪视频中的每一帧画面,然后再将裁剪完的每一帧画面重新组合成视频。但是,视频中每一帧画面所包含的内容信息可能存在较大差异,如果使用尺寸相同的裁剪框去裁剪每一帧画面,则可能造成大部分画面内容的缺失,影响裁剪后的视频质量。比如,图1中所示的两个图是同一视频(尺寸为16:9)中的两帧画面,同时使用尺寸为9:16的裁剪框进行裁剪。对于左图,该尺寸的裁剪框可以包括住画面中的大部分主体内容。但是,对于右图,无论在画面中的哪个位置使用该尺寸的裁剪框进行裁剪,都会造成大部分主体内容的缺失,影响裁剪后的视频质量。
有鉴于此,本公开提供一种视频裁剪方法、装置、存储介质及电子设备,以解决相关技术中在视频裁剪过程中存在的上述问题,实现根据不同目标视频帧中的主体内容动态调整对应裁剪框尺寸的目的,保留原始视频中每一帧画面的大部分主体内容,提升裁剪后的视频质量。
图2是根据本公开一示例性实施例示出的一种视频裁剪方法的流程图。参照图2,该视频裁剪方法可以包括以下步骤:
步骤201,获取待裁剪的原始视频。
示例地,用户可以在电子设备中输入原始视频对应的URL(Uniform Resource Locator,统一资源定位器),然后电子设备可以根据该URL从对应的资源服务器中下载原始视频进行视频裁剪。或者,电子设备可以响应于用户触发的视频裁剪请求,从存储器中获取存储的视频作为原始视频进行视频裁剪,等等,本公开实施例对于原始视频的获取方式不作限定。
步骤202,对原始视频进行抽帧处理,以得到多个目标视频帧。
示例地,抽帧处理可以是在原始视频对应的每一视频帧中抽取部分视频帧作为目标视频帧。这样,可以减少后续处理过程的运算量,提高视频裁剪效率。当然,在不考虑运算量和效率的情况下,抽帧处理也可以是抽取原始视频对应的所有视频帧作为目标视频帧,本公开实施例对此不作限定。
步骤203,针对每一目标视频帧,根据该目标视频帧中的主体内容,确定该目标视频帧对应的目标候选裁剪框。
示例地,主体内容可以是占据大部分图像区域的主要画面内容,比如图1中的车辆则为该视频帧的主体内容。针对每一目标视频帧,可以执行如下至少一种检测方式来确定主体内容:显著性检测、人脸检测、文字检测、标志(logo)检测。其中,显著性检测用于检测目标视频帧的主体成分位置。人脸检测用于检测目标视频帧中的人脸所在位置。文字检测用于检测目标视频帧中的文字所在位置以及文字内容。logo检测用于检测目标视频帧中的logo、水印等内容的所在位置。此外,还可以在检测主体内容之前,先对目标视频帧进行边框检测,然后去除检测到的黑边、高斯模糊等无用边框,以提高后续主体内容的检测准确性。
示例地,每一目标视频帧可以对应多个候选裁剪框,然后可以根据该目标视频帧中的主体内容,在该多个候选裁剪框中确定目标候选裁剪框,该目标候选裁剪框可以包括住该目标视频帧中的主体内容,从而可以改善裁剪后的视频中主体内容的大部分缺失,提高视频裁剪质量。
步骤204,根据每一目标视频帧对应的目标候选裁剪框进行插值处理,确定原始视频中每一帧画面对应的目标裁剪框。
示例地,插值处理可以是根据每一目标视频帧对应的目标候选裁剪框的位置坐标进行插值计算,得到原始视频中其他帧画面对应的目标裁剪框的位置坐标。其中,具体的插值处理方式与相关技术中类似,这里不再赘述。对于其他帧画面对应的目标裁剪框的尺寸,可以根据目标候选裁剪框的尺寸确定,即其他帧画面对应的目标裁剪框的尺寸与目标候选裁剪框相同。
在可能的方式中,在插值处理之前,还可以先对目标候选裁剪框进行平滑去噪处理,以提高结果准确性。也即是说,可以根据每一目标视频帧对应的目标候选裁剪框进行平滑处理,以得到每一目标视频帧对应的平滑候选裁剪框,然后根据每一目标视频帧对应的平滑候选裁剪框进行插值处理,以得到原始视频中每一帧画面对应的目标裁剪框。其中,可以通过相关技术中的任意平滑去噪方式对目标候选裁剪框进行平滑处理,比如可以通过高 斯滤波器对目标候选裁剪框进行平滑处理,等等,本公开实施例对此不作限定。
步骤205,根据每一帧画面对应的目标裁剪框对原始视频进行裁剪。
示例地,可以根据目标裁剪框对原始视频中对应帧画面的长度和宽度分别进行裁剪。或者,为了提高视频裁剪效率,可以根据目标裁剪框的尺寸和原始视频的尺寸,将对应帧画面的长度或宽度进行裁剪。比如,目标裁剪框的尺寸为1:1,原始视频的尺寸为720×1280像素,则可以将对应帧画面的长度(沿y轴方向)进行裁剪,裁剪后的视频尺寸为720×720像素。
示例地,根据每一帧画面对应的目标裁剪框对原始视频进行裁剪后,则可以将裁剪后的每一帧画面重新拼接成视频,以得到裁剪后的视频,然后可以将裁剪后的视频显示给用户。
通过上述方式,可以根据每一目标视频帧中的主体内容,动态确定每一目标视频帧对应的目标候选裁剪框,从而通过插值处理可以确定针对每一帧画面的不同目标裁剪框。比如,参照图3,图3中所示的两个图是同一视频(尺寸为16:9)中的两帧画面,按照本公开实施例中的视频裁剪方法,针对左图,通过尺寸为9:16的裁剪框进行裁剪,可以包括住帧画面中的主体内容。针对右图,通过尺寸为16:9的裁剪框进行裁剪,可以包括住帧画面中的主体内容,相较于相关技术中的视频裁剪方法,可以改善裁剪后的视频中主体内容的大部分缺失,从而提高视频裁剪质量。
在可能的方式中,针对每一目标视频帧,根据该目标视频帧中的主体内容,确定该目标视频帧对应的目标候选裁剪框可以是:针对每一目标视频帧,根据该目标视频帧中的主体内容和该目标视频帧对应的多个候选裁剪框计算代价函数,然后在多个候选裁剪框中确定使代价函数的计算结果最小的目标候选裁剪框。其中,代价函数包括用于表征目标视频帧中主体内容重要性的第一函数和用于表征两个目标视频帧中候选裁剪框尺寸差异的第二函数。
示例地,目标视频帧对应的多个候选裁剪框可以通过如下方式确定:针对每一目标视频帧,将预设裁剪框作为初始的候选裁剪框,并根据预设位置偏移量移动该候选裁剪框,以得到新的候选裁剪框位置,直到候选裁剪框的边界与目标视频帧重合或超出目标视频帧。其中,预设位置偏移量可以根据实际情况设定,本公开实施例对此不作限定,比如可以将预设位置偏移量设定为20个像素点,在此种情况下,可以将初始的候选裁剪框横向移动(或者纵向移动)20个像素点,得到一个新的候选裁剪框,然后将该新的候选裁剪框再横向移动(或纵向移动)20个像素点,得到一个新的候选裁剪框,以此类推,直到候选裁剪框的候选裁剪框的边界与目标视频帧重合或超出目标视频帧。由此,可以得到目标视频帧 对应的多个候选裁剪框。
在可能的方式中,可以通过如下方式计算第一函数:针对每一候选裁剪框,确定该候选裁剪框对目标视频帧中的主体内容的内容包含程度,并确定该候选裁剪框中包含的主体内容与目标视频帧中完整的主体内容之间的内容占比,然后根据内容包含程度和内容占比计算第一函数。
示例地,内容包含程度可以是候选裁剪框包括的主体内容与该候选裁剪框的面积之间的比值,即该内容包含程度可以理解为是单位面积下候选裁剪框内的主体内容包含程度。内容占比可以是候选裁剪框中包含的主体内容与目标视频帧中完整的主体内容之间的比值,即候选裁剪框中包含的主体内容比上目标视频帧中完整的主体内容得到的结果。
在可能的方式中,第一函数的表达式为:
Figure PCTCN2021134720-appb-000001
其中,f表示所述第一函数的计算结果,β 1和β 2表示预设权重值,A(C i)表示第i个目标视频帧对应的候选裁剪框C i中包含的主体内容,S(C i)表示第i个目标视频帧对应的候选裁剪框C i的面积,A(I i)表示第i个目标视频帧中完整的主体内容,
Figure PCTCN2021134720-appb-000002
表示第i个目标视频帧对应的候选裁剪框C i对第i个目标视频帧中的主体内容的内容包含程度,
Figure PCTCN2021134720-appb-000003
表示第i个目标视频帧对应的候选裁剪框C i所包含的主体内容与第i个目标视频帧中完整的主体内容之间的内容占比。
示例地,A(C i)的计算公式可以是:A(C i)=S(C i)·G(C i)·F(C i),其中,S(C i)表示根据静态显著性检测结果而得到的静态显著性得分,G(C i)表示根据动态显著性检测结果而得到的动态显著性得分,F(C i)表示根据人脸检测结果而得到的人脸得分。
示例地,预设权重值可以是根据实际情况确定的,本公开实施例对此不作限定,只要该预设权重值为大于0且小于1的数值即可。通过预设权重值可以调整内容包含程度和内容占比在第一函数计算过程中的计算权重,使得内容包含程度和内容占比相互均衡。当然,在可能的方式中,也可以不使用预设权重值计算第一函数,即第一函数的表达式还可以为:
Figure PCTCN2021134720-appb-000004
按照上述任一方式计算第一函数,候选裁剪框的面积越小,内容占比程度越大,从而 第一函数越小,进而代价函数越小。同时,内容占比越大,第一函数越小,从而代价函数越小。因此,按照上述公式,通过内容包含程度和内容占比计算第一函数,可以使得目标候选裁剪框包括住主体内容,并且该目标候选裁剪框尽可能的小,改善目标候选裁剪框变成包括住整个目标视频帧画面的框而无法实现视频裁剪的问题。
在可能的方式中,可以通过如下方式计算第二函数:确定第一目标视频帧中候选裁剪框与第二目标视频帧中候选裁剪框之间的宽度差值和长度差值,其中,第一目标视频帧为第二目标视频帧的相邻上一视频帧,然后根据该宽度差值和长度差值计算第二函数。
示例地,第二函数的表达式可以为:|W i-W i-1|+|H i-H i-1|。其中,W i表示第i个目标视频帧对应的候选裁剪框的宽度,W i-1表示第i-1个目标视频帧对应的候选裁剪框的宽度,H i表示第i个目标视频帧对应的候选裁剪框的长度,H i-1表示第i-1个目标视频帧对应的候选裁剪框的长度。或者也可以在第二函数的计算过程中增加权重干预,以便于计算,即第二函数的表达式还可以为:λ 1·(|W i-W i-1|+|H i-H i-1|)。其中,λ 1表示预设权重值,可以根据实际情况设定λ 1的数值,本公开实施例对此不作限定,只要该预设权重值为大于0且小于1的数值即可。
通过第二函数可以用来约束相邻两帧的目标候选裁剪框的大小要尽可能的接近,这样可以保证目标候选裁剪框的变化平滑,而不至于裁剪后的视频画面突然变大或变小,从而提升裁剪后的视频质量。
在其他可能的方式中,代价函数还可以包括文字能量函数和镜头平滑惩罚函数,该文字能量函数和镜头平滑惩罚函数与相关技术中一致,这里作简要说明。
示例地,文字能量函数可以用于表征候选裁剪框覆盖目标视频帧中文字的情况,该文字能量函数的表达式可以是:x(1-x),其中,x表示候选裁剪框对文字检测框的覆盖率,即候选裁剪框覆盖文字检测框的面积除以该文字检测框面积的结果。或者,可以对文字能量函数增加权重干预,以便于计算,即文字能量函数的表达式还可以为:λ 2·(x(1-x))。其中,λ 2表示预设权重值,可以根据实际情况设定λ 2的数值,本公开实施例对此不作限定,只要该预设权重值为大于0且小于1的数值即可。
示例地,镜头平滑惩罚函数可以用于表征两个目标视频帧中候选裁剪框的位置偏移程度,该镜头平滑惩罚函数的表达式可以是:|L(C i)-L(C i-1)|,其中,L(C i)表示第i个目标视频帧对应的候选裁剪框C i的位置,L(C i-1)表示第i-1个目标视频帧对应的候选裁剪框C i-1的位置。或者,可以对文字能量函数增加权重干预,以便于计算,即文字能量函 数的表达式还可以为:λ 3·|L(C i)-L(C i-1)|。其中,λ 3表示预设权重值,可以根据实际情况设定λ 3的数值,本公开实施例对此不作限定,只要该预设权重值为大于0且小于1的数值即可。
综上,本公开实施例中的代价函数的表达式可以是:
Figure PCTCN2021134720-appb-000005
其中,F表示代价函数的计算结果,n表示抽帧处理得到的目标视频帧的个数。
在本公开实施例中,可以在目标视频帧对应的多个候选裁剪框中确定使代价函数的计算结果最小的目标候选裁剪框。该过程可以理解为动态规划的过程,即每一目标视频帧都有多个状态(即每一目标视频帧对应有多个候选裁剪框),每一个状态有一个得分(即代价函数的计算结果),不同帧之间的不同状态之间的转移会有一个转移惩罚。
在具体应用中,可以按照上述代价函数的表达式,针对每一目标视频帧的每一候选裁剪框进行计算,以确定每个候选裁剪框的状态。然后,计算不同帧之间的状态转移惩罚,并记录每一个状态使其代价函数的计算结果最小的上一帧的对应状态。其中,该状态转移惩罚可以理解为不同帧之间的候选裁剪框的位置偏移和尺寸差异。也即是说,针对第二目标视频帧中的每一候选裁剪框,在第一目标视频帧中确定使该候选裁剪框的代价函数的计算结果最小的候选裁剪框。其中,第一目标视频帧为第二目标视频帧的相邻上一视频帧。由此,每一候选裁剪框都有对应的代价函数的计算结果最小的上一帧候选裁剪框。然后,可以从第n个目标视频帧开始,在第n-1个目标视频帧确定使其代价函数的计算结果最小的候选裁剪框,然后针对第n-1个目标视频帧,在第n-2个目标视频帧确定使其代价函数的计算结果最小的候选裁剪框,以此类推,每一目标视频帧可以确定使代价函数的计算结果最小的目标候选裁剪框,并且该目标候选裁剪框可以使得总的代价函数最小,从而使得目标候选裁剪框可以包括住主体内容,并且不同目标视频帧之间的裁剪框过渡平滑,可以提升视频裁剪的质量。
在可能的方式中,在根据每一帧画面对应的目标裁剪框对原始视频进行裁剪之后,还可以获取原始视频对应的裁剪宽度和裁剪长度,然后对裁剪后的每一帧画面进行边框填充,以使裁剪后的每一帧画面的宽度为该裁剪宽度,并使裁剪后的每一帧画面的长度为该裁剪长度。
示例地,裁剪宽度可以是用户输入的期望裁剪宽度,裁剪长度可以是用户输入的期望裁剪长度,本公开实施例对此不作限定。由于每一帧画面对应的目标裁剪框是根据主体内 容确定的,而主体内容的大小可能存在差异,因此每一帧画面对应的目标裁剪框的尺寸会有差异,并且与期望的裁剪宽度和/或裁剪长度可能也有差异。因此,为了得到统一尺寸的裁剪视频,可以对裁剪后的每一帧画面进行边框填充(padding),以使裁剪框后的每一帧画面的宽度为裁剪宽度,并使裁剪后的每一帧画面的长度为裁剪长度。比如,可以对裁剪后的每一帧画面进行黑边填充,即在裁剪后的帧画面的宽度小于裁剪宽度时,在该帧画面的左右两侧填充黑边,以使黑边与裁剪后的帧画面的宽度之和等于裁剪宽度。
或者,在其他可能的方式中,在裁剪后的帧画面的宽度为裁剪宽度且长度超过裁剪长度的情况,或者裁剪后的帧画面的长度为裁剪长度且宽度超过裁剪宽度的情况下,可以先将裁剪后的帧画面进行缩放,以使宽度不超过裁剪宽度,且长度不超过裁剪长度,然后再进行边框填充,以使裁剪后的视频尺寸一致。
通过上述视频裁剪方法,可以根据每一目标视频帧中的主体内容,动态确定每一目标视频帧对应的目标候选裁剪框,从而通过插值处理可以确定针对每一帧画面的不同目标裁剪框,使得每一帧画面对应的目标裁剪框包括住每一帧画面中的大部分主体内容,改善裁剪后的视频中主体内容的大部分缺失,提高视频裁剪质量。并且,可以对裁剪后的每一帧画面进行边框填充,以得到目标尺寸的裁剪视频。
基于同一发明构思,本公开实施例还提供一种视频裁剪装置,该装置可以通过软件、硬件或者两者结合的方式成为电子设备的部分或全部。参照图4,该视频裁剪装置包括:
获取模块401,用于获取待裁剪的原始视频;
抽帧模块402,用于对所述原始视频进行抽帧处理,以得到多个目标视频帧;
确定模块403,用于针对每一所述目标视频帧,根据该目标视频帧中的主体内容,确定该目标视频帧对应的目标候选裁剪框;
插值模块404,用于根据每一所述目标视频帧对应的所述目标候选裁剪框进行插值处理,确定所述原始视频中每一帧画面对应的目标裁剪框;
裁剪模块405,用于根据所述每一帧画面对应的所述目标裁剪框对所述原始视频进行裁剪。
可选地,所述确定模块403用于:
针对每一所述目标视频帧,根据该目标视频帧中的所述主体内容和该目标视频帧对应的多个候选裁剪框计算代价函数,所述代价函数包括用于表征所述目标视频帧中主体内容重要性的第一函数和用于表征两个所述目标视频帧中候选裁剪框尺寸差异的第二函数;
在所述多个候选裁剪框中确定使所述代价函数的计算结果最小的目标候选裁剪框。
可选地,通过如下模块计算所述第一函数:
内容确定模块,用于针对每一所述候选裁剪框,确定该候选裁剪框对所述目标视频帧中的所述主体内容的内容包含程度,并确定该候选裁剪框中包含的所述主体内容与所述目标视频帧中完整的所述主体内容之间的内容占比;
第一计算模块,用于根据所述内容包含程度和所述内容占比计算所述第一函数。
可选地,所述第一函数的表达式为:
Figure PCTCN2021134720-appb-000006
其中,f表示所述第一函数的计算结果,β 1和β 2表示预设权重值,A(C i)表示第i个目标视频帧对应的候选裁剪框C i中包含的主体内容,S(C i)表示第i个目标视频帧对应的候选裁剪框C i的面积,A(I i)表示第i个目标视频帧中完整的主体内容,
Figure PCTCN2021134720-appb-000007
表示第i个目标视频帧对应的候选裁剪框C i对第i个目标视频帧中的主体内容的内容包含程度,
Figure PCTCN2021134720-appb-000008
表示第i个目标视频帧对应的候选裁剪框C i所包含的主体内容与第i个目标视频帧中完整的主体内容之间的内容占比。
可选地,所述第二函数是通过如下模块计算的:
差值确定模块,用于确定第一目标视频帧中候选裁剪框与第二目标视频帧中候选裁剪框之间的宽度差值和长度差值,其中,所述第一目标视频帧为所述第二目标视频帧的相邻上一视频帧;
第二计算模块,用于根据所述宽度差值和所述长度差值计算所述第二函数。
可选地,所述插值模块404用于:
根据每一所述目标视频帧对应的所述目标候选裁剪框进行平滑处理,以得到每一所述目标视频帧对应的平滑候选裁剪框;
根据每一所述目标视频帧对应的所述平滑候选裁剪框进行插值处理,以得到所述原始视频中每一帧画面对应的目标裁剪框。
可选地,所述装置400还包括:
长宽获取模块,用于获取所述原始视频对应的裁剪宽度和裁剪长度;
填充模块,用于对裁剪后的每一帧画面进行边框填充,以使裁剪后的每一帧画面的宽度为所述裁剪宽度,并使裁剪后的每一帧画面的长度为所述裁剪长度。
关于上述实施例中的装置,其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述,此处将不做详细阐述说明。
基于同一发明构思,本公开实施例还提供一种计算机可读介质,其上存储有计算机程序,该程序被处理装置执行时实现上述任一视频裁剪方法的步骤。
基于同一发明构思,本公开实施例还提供一种电子设备,包括:
存储装置,其上存储有计算机程序;
处理装置,用于执行所述存储装置中的所述计算机程序,以实现上述任一视频裁剪方法的步骤。
下面参考图5,其示出了适于用来实现本公开实施例的电子设备500的结构示意图。本公开实施例中的终端设备可以包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、PDA(个人数字助理)、PAD(平板电脑)、PMP(便携式多媒体播放器)、车载终端(例如车载导航终端)等等的移动终端以及诸如数字TV、台式计算机等等的固定终端。图5示出的电子设备仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。
如图5所示,电子设备500可以包括处理装置(例如中央处理器、图形处理器等)501,其可以根据存储在只读存储器(ROM)502中的程序或者从存储装置508加载到随机访问存储器(RAM)503中的程序而执行各种适当的动作和处理。在RAM 503中,还存储有电子设备500操作所需的各种程序和数据。处理装置501、ROM 502以及RAM 503通过总线504彼此相连。输入/输出(I/O)接口505也连接至总线504。
通常,以下装置可以连接至I/O接口505:包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置506;包括例如液晶显示器(LCD)、扬声器、振动器等的输出装置507;包括例如磁带、硬盘等的存储装置508;以及通信装置509。通信装置509可以允许电子设备500与其他设备进行无线或有线通信以交换数据。虽然图5示出了具有各种装置的电子设备500,但是应理解的是,并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在非暂态计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信装置509从网络上被下载和安装,或者从存储装置508被安装,或者从ROM 502被安装。在该计算机程序被处理装置501执行时,执行本公开实施例的方法中限定的上述功能。
需要说明的是,本公开上述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。 计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、RF(射频)等等,或者上述的任意合适的组合。
在一些实施方式中,可以利用诸如HTTP(HyperText Transfer Protocol,超文本传输协议)之类的任何当前已知或未来研发的网络协议进行通信,并且可以与任意形式或介质的数字数据通信(例如,通信网络)互连。通信网络的示例包括局域网(“LAN”),广域网(“WAN”),网际网(例如,互联网)以及端对端网络(例如,ad hoc端对端网络),以及任何当前已知或未来研发的网络。
上述计算机可读介质可以是上述电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。
上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备执行时,使得该电子设备:获取待裁剪的原始视频;对所述原始视频进行抽帧处理,以得到多个目标视频帧;针对每一所述目标视频帧,根据该目标视频帧中的主体内容,确定该目标视频帧对应的目标候选裁剪框;根据每一所述目标视频帧对应的所述目标候选裁剪框进行插值处理,以确定所述原始视频中每一帧画面对应的目标裁剪框;根据所述每一帧画面对应的所述目标裁剪框对所述原始视频进行裁剪。
可以以一种或多种程序设计语言或其组合来编写用于执行本公开的操作的计算机程序代码,上述程序设计语言包括但不限于面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言——诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网 (LAN)或广域网(WAN)——连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
附图中的流程图和框图,图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本公开实施例中所涉及到的模块可以通过软件的方式实现,也可以通过硬件的方式来实现。其中,模块的名称在某种情况下并不构成对该模块本身的限定。
本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如,非限制性地,可以使用的示范类型的硬件逻辑部件包括:现场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、片上系统(SOC)、复杂可编程逻辑设备(CPLD)等等。
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。
根据本公开的一个或多个实施例,示例1提供了一种视频裁剪方法,所述方法包括:
获取待裁剪的原始视频;
对所述原始视频进行抽帧处理,以得到多个目标视频帧;
针对每一所述目标视频帧,根据该目标视频帧中的主体内容,确定该目标视频帧对应的目标候选裁剪框;
根据每一所述目标视频帧对应的所述目标候选裁剪框进行插值处理,以确定所述原始 视频中每一帧画面对应的目标裁剪框;
根据所述每一帧画面对应的所述目标裁剪框对所述原始视频进行裁剪。
根据本公开的一个或多个实施例,示例2提供了示例1的方法,所述针对每一所述目标视频帧,根据该目标视频帧中的主体内容,确定该目标视频帧对应的目标候选裁剪框,包括:
针对每一所述目标视频帧,根据该目标视频帧中的所述主体内容和该目标视频帧对应的多个候选裁剪框计算代价函数,所述代价函数包括用于表征所述目标视频帧中主体内容重要性的第一函数和用于表征两个所述目标视频帧中候选裁剪框尺寸差异的第二函数;
在所述多个候选裁剪框中确定使所述代价函数的计算结果最小的目标候选裁剪框。
根据本公开的一个或多个实施例,示例3提供了示例2的方法,通过如下方式计算所述第一函数:
针对每一所述候选裁剪框,确定该候选裁剪框对所述目标视频帧中的所述主体内容的内容包含程度,并确定该候选裁剪框中包含的所述主体内容与所述目标视频帧中完整的所述主体内容之间的内容占比;
根据所述内容包含程度和所述内容占比计算所述第一函数。
根据本公开的一个或多个实施例,示例4提供了示例3的方法,所述第一函数的表达式为:
Figure PCTCN2021134720-appb-000009
其中,f表示所述第一函数的计算结果,β 1和β 2表示预设权重值,A(C i)表示第i个目标视频帧对应的候选裁剪框C i中包含的主体内容,S(C i)表示第i个目标视频帧对应的候选裁剪框C i的面积,A(I i)表示第i个目标视频帧中完整的主体内容,
Figure PCTCN2021134720-appb-000010
表示第i个目标视频帧对应的候选裁剪框C i对第i个目标视频帧中的主体内容的内容包含程度,
Figure PCTCN2021134720-appb-000011
表示第i个目标视频帧对应的候选裁剪框C i所包含的主体内容与第i个目标视频帧中完整的主体内容之间的内容占比。
根据本公开的一个或多个实施例,示例5提供了示例2的方法,所述第二函数是通过如下方式计算的:
确定第一目标视频帧中候选裁剪框与第二目标视频帧中候选裁剪框之间的宽度差值和长度差值,其中,所述第一目标视频帧为所述第二目标视频帧的相邻上一视频帧;
根据所述宽度差值和所述长度差值计算所述第二函数。
根据本公开的一个或多个实施例,示例6提供了示例1-5任一项的方法,所述根据每一所述目标视频帧对应的所述目标候选裁剪框进行插值处理,确定所述原始视频中每一帧画面对应的目标裁剪框,包括:
根据每一所述目标视频帧对应的所述目标候选裁剪框进行平滑处理,以得到每一所述目标视频帧对应的平滑候选裁剪框;
根据每一所述目标视频帧对应的所述平滑候选裁剪框进行插值处理,以得到所述原始视频中每一帧画面对应的目标裁剪框。
根据本公开的一个或多个实施例,示例6提供了示例1-5任一项的方法,所述方法还包括:
获取所述原始视频对应的裁剪宽度和裁剪长度;
对裁剪后的每一帧画面进行边框填充,以使裁剪后的每一帧画面的宽度为所述裁剪宽度,并使裁剪后的每一帧画面的长度为所述裁剪长度。
根据本公开的一个或多个实施例,示例8提供了一种视频裁剪装置,所述装置包括:
获取模块,用于获取待裁剪的原始视频;
抽帧模块,用于对所述原始视频进行抽帧处理,以得到多个目标视频帧;
确定模块,用于针对每一所述目标视频帧,根据该目标视频帧中的主体内容,确定该目标视频帧对应的目标候选裁剪框;
插值模块,用于根据每一所述目标视频帧对应的所述目标候选裁剪框进行插值处理,确定所述原始视频中每一帧画面对应的目标裁剪框;
裁剪模块,用于根据所述每一帧画面对应的所述目标裁剪框对所述原始视频进行裁剪。
根据本公开的一个或多个实施例,示例9提供了示例8的装置,所述确定模块用于:
针对每一所述目标视频帧,根据该目标视频帧中的所述主体内容和该目标视频帧对应的多个候选裁剪框计算代价函数,所述代价函数包括用于表征所述目标视频帧中主体内容重要性的第一函数和用于表征两个所述目标视频帧中候选裁剪框尺寸差异的第二函数;
在所述多个候选裁剪框中确定使所述代价函数的计算结果最小的目标候选裁剪框。
根据本公开的一个或多个实施例,示例10提供了示例9的装置,其中,通过如下模块计算所述第一函数:
内容确定模块,用于针对每一所述候选裁剪框,确定该候选裁剪框对所述目标视频帧中的所述主体内容的内容包含程度,并确定该候选裁剪框中包含的所述主体内容与所述目标视频帧中完整的所述主体内容之间的内容占比;
第一计算模块,用于根据所述内容包含程度和所述内容占比计算所述第一函数。
根据本公开的一个或多个实施例,示例11提供了示例10的装置,所述第一函数的表达式为:
Figure PCTCN2021134720-appb-000012
其中,f表示所述第一函数的计算结果,β 1和β 2表示预设权重值,A(C i)表示第i个目标视频帧对应的候选裁剪框C i中包含的主体内容,S(C i)表示第i个目标视频帧对应的候选裁剪框C i的面积,A(I i)表示第i个目标视频帧中完整的主体内容,
Figure PCTCN2021134720-appb-000013
表示第i个目标视频帧对应的候选裁剪框C i对第i个目标视频帧中的主体内容的内容包含程度,
Figure PCTCN2021134720-appb-000014
表示第i个目标视频帧对应的候选裁剪框C i所包含的主体内容与第i个目标视频帧中完整的主体内容之间的内容占比。
根据本公开的一个或多个实施例,示例12提供了示例9的装置,所述第二函数是通过如下模块计算的:
差值确定模块,用于确定第一目标视频帧中候选裁剪框与第二目标视频帧中候选裁剪框之间的宽度差值和长度差值,其中,所述第一目标视频帧为所述第二目标视频帧的相邻上一视频帧;
第二计算模块,用于根据所述宽度差值和所述长度差值计算所述第二函数。
根据本公开的一个或多个实施例,示例13提供了示例8至12任一的装置,所述插值模块用于:
根据每一所述目标视频帧对应的所述目标候选裁剪框进行平滑处理,以得到每一所述目标视频帧对应的平滑候选裁剪框;
根据每一所述目标视频帧对应的所述平滑候选裁剪框进行插值处理,以得到所述原始视频中每一帧画面对应的目标裁剪框。
根据本公开的一个或多个实施例,示例14提供了示例8至12任一的装置,还包括:
长宽获取模块,用于获取所述原始视频对应的裁剪宽度和裁剪长度;
填充模块,用于对裁剪后的每一帧画面进行边框填充,以使裁剪后的每一帧画面的宽度为所述裁剪宽度,并使裁剪后的每一帧画面的长度为所述裁剪长度。
根据本公开的一个或多个实施例,示例15提供了一种计算机可读介质,其上存储有计算机程序,该程序被处理装置执行时实现示例1至7任一项所述方法的步骤。
根据本公开的一个或多个实施例,示例16提供了一种电子设备,包括:
存储装置,其上存储有计算机程序;
处理装置,用于执行所述存储装置中的所述计算机程序,以实现示例1至7任一项所述方法的步骤。
以上描述仅为本公开的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本公开中所涉及的公开范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离上述公开构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本公开中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。
此外,虽然采用特定次序描绘了各操作,但是这不应当理解为要求这些操作以所示出的特定次序或以顺序次序执行来执行。在一定环境下,多任务和并行处理可能是有利的。同样地,虽然在上面论述中包含了若干具体实现细节,但是这些不应当被解释为对本公开的范围的限制。在单独的实施例的上下文中描述的某些特征还可以组合地实现在单个实施例中。相反地,在单个实施例的上下文中描述的各种特征也可以单独地或以任何合适的子组合的方式实现在多个实施例中。
尽管已经采用特定于结构特征和/或方法逻辑动作的语言描述了本主题,但是应当理解所附权利要求书中所限定的主题未必局限于上面描述的特定特征或动作。相反,上面所描述的特定特征和动作仅仅是实现权利要求书的示例形式。关于上述实施例中的装置,其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述,此处将不做详细阐述说明。

Claims (11)

  1. 一种视频裁剪方法,所述方法包括:
    获取待裁剪的原始视频;
    对所述原始视频进行抽帧处理,以得到多个目标视频帧;
    针对每一所述目标视频帧,根据该目标视频帧中的主体内容,确定该目标视频帧对应的目标候选裁剪框;
    根据每一所述目标视频帧对应的所述目标候选裁剪框进行插值处理,以确定所述原始视频中每一帧画面对应的目标裁剪框;
    根据所述每一帧画面对应的所述目标裁剪框对所述原始视频进行裁剪。
  2. 根据权利要求1所述的方法,其中所述针对每一所述目标视频帧,根据该目标视频帧中的主体内容,确定该目标视频帧对应的目标候选裁剪框,包括:
    针对每一所述目标视频帧,根据该目标视频帧中的所述主体内容和该目标视频帧对应的多个候选裁剪框计算代价函数,所述代价函数包括用于表征所述目标视频帧中主体内容重要性的第一函数和用于表征两个所述目标视频帧中候选裁剪框尺寸差异的第二函数;
    在所述多个候选裁剪框中确定使所述代价函数的计算结果最小的目标候选裁剪框。
  3. 根据权利要求2所述的方法,其中通过如下方式计算所述第一函数:
    针对每一所述候选裁剪框,确定该候选裁剪框对所述目标视频帧中的所述主体内容的内容包含程度,并确定该候选裁剪框中包含的所述主体内容与所述目标视频帧中完整的所述主体内容之间的内容占比;
    根据所述内容包含程度和所述内容占比计算所述第一函数。
  4. 根据权利要求3所述的方法,其中所述第一函数的表达式为:
    Figure PCTCN2021134720-appb-100001
    其中,f表示所述第一函数的计算结果,β 1和β 2表示预设权重值,A(C i)表示第i个目标视频帧对应的候选裁剪框C i中包含的主体内容,S(C i)表示第i个目标视频帧对应的候选裁剪框C i的面积,A(I i)表示第i个目标视频帧中完整的主体内容,
    Figure PCTCN2021134720-appb-100002
    表示第i个 目标视频帧对应的候选裁剪框C i对第i个目标视频帧中的主体内容的内容包含程度,
    Figure PCTCN2021134720-appb-100003
    表示第i个目标视频帧对应的候选裁剪框C i所包含的主体内容与第i个目标视频帧中完整的主体内容之间的内容占比。
  5. 根据权利要求2所述的方法,其中所述第二函数是通过如下方式计算的:
    确定第一目标视频帧中候选裁剪框与第二目标视频帧中候选裁剪框之间的宽度差值和长度差值,其中,所述第一目标视频帧为所述第二目标视频帧的相邻上一视频帧;
    根据所述宽度差值和所述长度差值计算所述第二函数。
  6. 根据权利要求1-5任一项所述的方法,其中所述根据每一所述目标视频帧对应的所述目标候选裁剪框进行插值处理,确定所述原始视频中每一帧画面对应的目标裁剪框,包括:
    根据每一所述目标视频帧对应的所述目标候选裁剪框进行平滑处理,以得到每一所述目标视频帧对应的平滑候选裁剪框;
    根据每一所述目标视频帧对应的所述平滑候选裁剪框进行插值处理,以得到所述原始视频中每一帧画面对应的目标裁剪框。
  7. 根据权利要求1-5任一项所述的方法,其中所述方法还包括:
    获取所述原始视频对应的裁剪宽度和裁剪长度;
    对裁剪后的每一帧画面进行边框填充,以使裁剪后的每一帧画面的宽度为所述裁剪宽度,并使裁剪后的每一帧画面的长度为所述裁剪长度。
  8. 一种视频裁剪装置,所述装置包括:
    获取模块,被配置为获取待裁剪的原始视频;
    抽帧模块,被配置为对所述原始视频进行抽帧处理,以得到多个目标视频帧;
    确定模块,被配置为针对每一所述目标视频帧,根据该目标视频帧中的主体内容,确定该目标视频帧对应的目标候选裁剪框;
    插值模块,被配置为根据每一所述目标视频帧对应的所述目标候选裁剪框进行插值处理,确定所述原始视频中每一帧画面对应的目标裁剪框;
    裁剪模块,被配置为根据所述每一帧画面对应的所述目标裁剪框对所述原始视频进行 裁剪。
  9. 一种计算机可读介质,其上存储有计算机程序,该程序被处理装置执行时实现权利要求1-7中任一项所述方法的步骤。
  10. 一种电子设备,包括:
    存储装置,其上存储有计算机程序;
    处理装置,被配置为执行所述存储装置中的所述计算机程序,以实现权利要求1-7中任一项所述方法的步骤。
  11. 一种计算机程序产品,包括计算机程序,所述计算机程序在被处理装置执行时实现权利要求1-7中任一项所述方法的步骤。
PCT/CN2021/134720 2020-12-02 2021-12-01 视频裁剪方法、装置、存储介质及电子设备 WO2022116990A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US18/255,473 US20240112299A1 (en) 2020-12-02 2021-12-01 Video cropping method and apparatus, storage medium and electronic device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011401449.0 2020-12-02
CN202011401449.0A CN112565890B (zh) 2020-12-02 2020-12-02 视频裁剪方法、装置、存储介质及电子设备

Publications (1)

Publication Number Publication Date
WO2022116990A1 true WO2022116990A1 (zh) 2022-06-09

Family

ID=75048107

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/134720 WO2022116990A1 (zh) 2020-12-02 2021-12-01 视频裁剪方法、装置、存储介质及电子设备

Country Status (3)

Country Link
US (1) US20240112299A1 (zh)
CN (1) CN112565890B (zh)
WO (1) WO2022116990A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112565890B (zh) * 2020-12-02 2022-09-16 北京有竹居网络技术有限公司 视频裁剪方法、装置、存储介质及电子设备
CN114630058B (zh) * 2022-03-15 2024-02-09 北京达佳互联信息技术有限公司 视频处理方法、装置、电子设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130050574A1 (en) * 2011-08-29 2013-02-28 Futurewei Technologies Inc. System and Method for Retargeting Video Sequences
CN104836956A (zh) * 2015-05-09 2015-08-12 陈包容 一种手机拍摄视频的处理方法及装置
CN110189378A (zh) * 2019-05-23 2019-08-30 北京奇艺世纪科技有限公司 一种视频处理方法、装置及电子设备
CN111356016A (zh) * 2020-03-11 2020-06-30 北京松果电子有限公司 视频处理方法、视频处理装置及存储介质
CN112565890A (zh) * 2020-12-02 2021-03-26 北京有竹居网络技术有限公司 视频裁剪方法、装置、存储介质及电子设备

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10019823B2 (en) * 2013-10-24 2018-07-10 Adobe Systems Incorporated Combined composition and change-based models for image cropping
CN111586473B (zh) * 2020-05-20 2023-01-17 北京字节跳动网络技术有限公司 视频的裁剪方法、装置、设备及存储介质
CN111815645B (zh) * 2020-06-23 2021-05-11 广州筷子信息科技有限公司 一种广告视频画面裁剪的方法和系统

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130050574A1 (en) * 2011-08-29 2013-02-28 Futurewei Technologies Inc. System and Method for Retargeting Video Sequences
CN104836956A (zh) * 2015-05-09 2015-08-12 陈包容 一种手机拍摄视频的处理方法及装置
CN110189378A (zh) * 2019-05-23 2019-08-30 北京奇艺世纪科技有限公司 一种视频处理方法、装置及电子设备
CN111356016A (zh) * 2020-03-11 2020-06-30 北京松果电子有限公司 视频处理方法、视频处理装置及存储介质
CN112565890A (zh) * 2020-12-02 2021-03-26 北京有竹居网络技术有限公司 视频裁剪方法、装置、存储介质及电子设备

Also Published As

Publication number Publication date
CN112565890B (zh) 2022-09-16
CN112565890A (zh) 2021-03-26
US20240112299A1 (en) 2024-04-04

Similar Documents

Publication Publication Date Title
CN112184738B (zh) 一种图像分割方法、装置、设备及存储介质
WO2022116990A1 (zh) 视频裁剪方法、装置、存储介质及电子设备
WO2022116772A1 (zh) 视频裁剪方法、装置、存储介质及电子设备
CN110298851B (zh) 人体分割神经网络的训练方法及设备
CN110728622B (zh) 鱼眼图像处理方法、装置、电子设备及计算机可读介质
CN110647702B (zh) 一种图片预加载的方法、装置、电子设备及可读介质
CN110633434B (zh) 一种页面缓存方法、装置、电子设备及存储介质
CN110991373A (zh) 图像处理方法、装置、电子设备及介质
WO2022105622A1 (zh) 图像分割方法、装置、可读介质及电子设备
CN110781437A (zh) 网页图像加载时长的获取方法、装置及电子设备
CN111538448B (zh) 图像显示的方法及装置、终端和存储介质
CN110852946A (zh) 图片展示方法、装置和电子设备
WO2022116947A1 (zh) 视频裁剪方法、装置、存储介质及电子设备
CN111259291B (zh) 视图展示方法、装置和电子设备
WO2023103682A1 (zh) 图像处理方法、装置、设备及介质
CN113255812B (zh) 视频边框检测方法、装置和电子设备
CN111258582B (zh) 一种窗口渲染方法、装置、计算机设备及存储介质
CN111461964B (zh) 图片处理方法、装置、电子设备和计算机可读介质
CN112561840B (zh) 视频裁剪方法、装置、存储介质及电子设备
WO2023109385A1 (zh) 图标点击的检测方法、装置、设备及存储介质
CN111738958B (zh) 图片修复方法、装置、电子设备及计算机可读介质
CN113283436B (zh) 图片处理方法、装置和电子设备
CN115761248B (zh) 图像处理方法、装置、设备及存储介质
CN111338827B (zh) 表格数据的粘贴方法、装置以及电子设备
CN114863025B (zh) 三维车道线生成方法、装置、电子设备和计算机可读介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21900021

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 18255473

Country of ref document: US

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21900021

Country of ref document: EP

Kind code of ref document: A1