WO2022127877A1 - Video editing method and system, electronic device, and storage medium - Google Patents

Video editing method and system, electronic device, and storage medium Download PDF

Info

Publication number
WO2022127877A1
WO2022127877A1 PCT/CN2021/138917 CN2021138917W WO2022127877A1 WO 2022127877 A1 WO2022127877 A1 WO 2022127877A1 CN 2021138917 W CN2021138917 W CN 2021138917W WO 2022127877 A1 WO2022127877 A1 WO 2022127877A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
edited
score
dimension
occlusion
Prior art date
Application number
PCT/CN2021/138917
Other languages
French (fr)
Chinese (zh)
Inventor
龙良曲
唐小林
陈勃霖
符峥
Original Assignee
影石创新科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 影石创新科技股份有限公司 filed Critical 影石创新科技股份有限公司
Publication of WO2022127877A1 publication Critical patent/WO2022127877A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8456Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/8549Creating video summaries, e.g. movie trailer

Definitions

  • the invention belongs to the technical field of image processing, and in particular relates to a video editing method, system, electronic device and storage medium.
  • Short videos are streamlined and focused, making them easy to read, and easier to share and disseminate on social media.
  • how to edit and obtain relatively "wonderful" short videos is often a major pain point for users.
  • the purpose of the present invention is to provide a video editing method, system, electronic device and storage medium, aiming to solve the problems of high difficulty and low efficiency in manual video editing in the prior art.
  • the present invention provides a video editing method, the method includes the following steps:
  • the step of analyzing the video to be edited from a preset dimension to obtain a score curve corresponding to the dimension includes:
  • a score curve of the video to be edited in the texture dimension is generated.
  • the function value of the preset distance function is used as the summary score of the video sub-segment.
  • the preset distance function is a correlation coefficient distance function
  • the summary score of the video sub-segment is a correlation coefficient distance function
  • s i represents the summary score of the ith video sub-segment
  • v i and v j represent the ith and j th video sub-segments, respectively
  • N is the first number.
  • the training process of the occlusion analysis model includes:
  • the occlusion training sample set includes a clean sample set and an occlusion sample set, the clean sample does not include an occluder, and the occlusion sample includes at least one type of occluder;
  • the training process of the aesthetic evaluation model includes:
  • the aesthetic evaluation model is trained using the aesthetic training sample set to obtain a trained aesthetic evaluation model.
  • the shooting habit analysis model is defined as: defined as in, is a hyperparameter, used to adjust the weight parameter of the probability density of the Gaussian distribution, ⁇ represents the mean value of the Gaussian distribution, ⁇ represents the standard deviation of the Gaussian distribution, ⁇ duration represents the duration of the video to be edited, and ⁇ is a hyperparameter used to adjust The variance of the probability density of a Gaussian distribution.
  • the integrated learning model adopts a single-layer linear dense connection layer, and the comprehensive score curve is as follows:
  • M is the dimension set
  • w k and b k are the parameters of the dense connection layer, which are used to weight the weight of the standardized score curve vector corresponding to each dimension
  • s' k is the standardized score of the score curve corresponding to each dimension Curve vector
  • s is the comprehensive score curve vector.
  • the step of selecting the editing region from the first number of video sub-segments according to the comprehensive score curve includes:
  • the step of selecting a second number of video sub-segments according to the sorting result, and selecting the editing region from the second number of video sub-segments includes:
  • the clipping or complementing operation is performed on the video sub-segments whose video lengths do not belong to the preset video length interval in the second number of video sub-segments, the second number of video sub-segments after the operation are obtained, and the The area corresponding to the second number of video sub-segments is used as the clip area.
  • the present invention provides a video editing system, the system comprising:
  • a video segmentation module used for segmenting the video to be edited according to a preset video segmentation algorithm to obtain a first number of video sub-segments
  • a dimension analysis module configured to analyze the video to be edited from a preset dimension to obtain a score curve corresponding to the dimension
  • An integrated learning module for processing all the score curves using a preset integrated learning model to generate a comprehensive score curve of the video to be edited
  • segment search module for selecting clip regions from the first number of video sub-segments according to the comprehensive score curve
  • the editing module is used for editing the video to be edited according to the editing area to obtain the edited video.
  • the dimension analysis module includes:
  • a digest analysis module configured to generate a score curve of the video to be edited in the digest dimension by calculating the digest score of each video sub-segment;
  • An occlusion analysis module configured to analyze the occlusion probability of each video frame through an occlusion analysis model, and generate a score curve of the video to be edited in the occlusion dimension;
  • An aesthetic analysis module for analyzing the aesthetic score of each video frame through an aesthetic evaluation model, and generating a score curve of the video to be edited in the aesthetic dimension;
  • a shooting habit analysis module configured to use a preset shooting habit analysis model to generate a score curve of the video to be edited in the shooting habit dimension;
  • An image entropy analysis module for generating a score curve of the video to be edited in the image entropy dimension by counting the amount of information of each video frame;
  • the texture analysis module is configured to generate a score curve of the video to be edited in the texture dimension by analyzing the texture features of each video frame.
  • the present invention also provides an electronic device, comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor is implemented when the processor executes the computer program The steps of the method as described above.
  • the present invention also provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, implements the steps of the above-mentioned method.
  • the present invention divides the video to be edited according to a preset video segmentation algorithm, obtains a first number of video sub-segments, analyzes the video to be edited from a preset dimension, obtains a score curve corresponding to the dimension, and uses a preset integrated
  • the learning model processes all the score curves, generates a comprehensive score curve of the video to be edited, selects the editing area from the first number of video sub-segments according to the comprehensive score curve, and edits the video to be edited according to the editing area to obtain the edited video. , so as to realize automatic video editing based on multiple dimensional analysis, improve the editing efficiency, and ensure the video editing effect.
  • Fig. 1 is the realization flow chart of the video editing method provided in Embodiment 1 of the present invention.
  • Embodiment 2 is an example diagram of a video segmentation result provided in Embodiment 1 of the present invention.
  • Embodiment 3 is an example diagram of a score curve on the summary dimension provided by Embodiment 1 of the present invention.
  • FIG. 4 is an example diagram of a clipped sub-segment extension provided by Embodiment 1 of the present invention.
  • FIG. 5 is a schematic structural diagram of a video editing system provided in Embodiment 2 of the present invention.
  • FIG. 6 is a schematic diagram of a preferred structure of a video editing system provided by Embodiment 2 of the present invention.
  • FIG. 7 is a schematic structural diagram of an electronic device according to Embodiment 3 of the present invention.
  • FIG. 1 shows the implementation process of the video editing method provided by the first embodiment of the present invention. For the convenience of description, only the part related to the embodiment of the present invention is shown, and the details are as follows:
  • step S101 the video to be edited is segmented according to a preset video segmentation algorithm to obtain a first number of video sub-segments.
  • the embodiment of the present invention is applicable to automatic video editing, and the video to be edited may be a video shot by a user based on a camera or a mobile phone.
  • the video to be edited when the video to be edited is segmented according to a preset video segmentation algorithm, it is usually performed based on the frame extraction sequence of the video to be edited.
  • the ffmpeg program can be used to decode the video to be edited, etc. , after decoding, perform frame extraction processing on the video to be edited.
  • frame extraction processing can be performed according to a preset fps (frames per second) sampling rate, for example, fps is set to 5 or 3.
  • the frame extraction sequence of the video to be edited is expressed as: ⁇ I t ⁇ ,t ⁇ [1,M],I t ⁇ R h ⁇ w , the height and width h ⁇ w of the frame feature map can be set to 224 ⁇ 224, obtained from the original resolution image after differential scaling, the frame feature vector at time t Defined as the feature histogram of It, then use the video segmentation algorithm to divide the video to be edited into a plurality of video sub-segments according to different scenes, wherein, the preset video segmentation algorithm can be KTS (kernel function time segmentation) algorithm Or other ordered clustering algorithms, which are not limited here.
  • KTS kernel function time segmentation
  • the video v to be edited can be represented by a matrix of shape [M, 96], where M represents the total number of video frames, the lengths of the video sub-segments obtained after segmentation are different, and the positions within the segment are continuous,
  • FIG. 2 is an example diagram of a video segmentation result.
  • a video sub-segment length interval (the first video length interval) can be preset, and further, the length of each video sub-segment is obtained.
  • the video sub-segment is segmented or spliced.
  • the splicing may be performed in combination with the lengths of the video sub-segments before and after the video sub-segment and/or the similarity with the preceding and following video sub-segments.
  • the length of the video sub-segment may not be processed, but processed in the process of determining the cropping region.
  • step S102 the video to be edited is analyzed from a preset dimension to obtain a score curve corresponding to the dimension.
  • the preset dimension when analyzing the video to be edited from a preset dimension, it is usually also based on frame sampling sequence analysis to improve analysis efficiency.
  • the preset dimension may be one or more, and when there are multiple preset dimensions, correspondingly, there are multiple score curves, that is, a score curve corresponding to each dimension will be obtained.
  • the preset dimensions include a video summary dimension, an occlusion dimension, an aesthetic dimension, a shooting habit dimension, an image entropy dimension and/or a texture dimension.
  • the editing is usually based on the main content of the video, so preferably, by calculating the summary score of each video sub-segment, a score curve of the video to be edited in the summary dimension is generated, so as to obtain a summary score from the video summary dimension.
  • the main content of the video to be edited can be analyzed, and the degree to which each video frame is close to the main content can be evaluated according to the main content, so as to form a representative curve that each frame can represent the main content of the video.
  • the function value of the dist( ⁇ ) function is used as the score of each video sub-segment to simplify the complexity of obtaining the score curve.
  • the distance function is a correlation coefficient distance function
  • the summary score of the video sub-segment vi is a correlation coefficient distance function
  • s i represents the summary score of the ith video sub-segment
  • vi and v j represent the ith and j th video sub-segments, respectively
  • N is the first number.
  • the correlation coefficient matrix of each segment v i and other segments v j can be calculated, and the sum of the i-th row of the matrix is used as the summary score s i of the video sub-segment vi.
  • a normalization operation is performed on si to obtain a normalized score curve.
  • FIG. 3 is an example diagram of a video summary score curve after normalization operation.
  • the occlusion probability of each video frame is analyzed by an occlusion analysis model, and a score curve of the video to be edited in the occlusion dimension is generated, so as to analyze the video to be edited from the occlusion dimension and provide a basis for subsequent cropping.
  • the occlusion analysis model is used to analyze whether there is an occluder in the video frame, and output the probability of the occlusion in the video frame.
  • the occlusion probability of each video frame is calculated by the occlusion model, and the occlusion dimension of the video can be obtained. score curve.
  • the occlusion model can be implemented based on a deep neural network, which can be implemented based on lightweight mobile networks such as MobileNetv2 and EfficientNet, and can also be implemented based on other deep convolutional neural network models.
  • an occlusion training sample set is constructed, the occlusion training sample set includes a clean sample set and an occlusion sample set, the occlusion training samples are input into the occlusion analysis model, and the output occlusion probability and the real occlusion probability are calculated.
  • Cross entropy loss value and optimize the parameters of the occlusion analysis model through the gradient descent algorithm until the occlusion detection accuracy of the occlusion analysis model reaches the preset value.
  • the clean samples do not contain occluders, and the occlusion samples include at least one type of occlusions such as head occlusion, hand occlusion, and hair occlusion.
  • the above two types of samples can be collected from the pictures taken by the real camera, and the manual marking of occlusion is performed manually. For clean samples, it is marked as 0, and for occluded samples, it is marked as 1.
  • the video frame Before inputting the occlusion analysis model, the video frame can be preprocessed. The preprocessing can include random data enhancement, scaling, and normalization operations.
  • the last layer of the occlusion model passes through the softmax activation function.
  • the vector p(I t ) with a length of 2 is output, representing the probability of no occlusion and occlusion, respectively.
  • the sum of the two probabilities is 1.
  • the aesthetic analysis model can be implemented using deep convolutional neural networks, which can be based on lightweight mobile models, such as MobileNetv2/v3, MobileNext, ghostNet, etc.
  • the model input needs to be designed with a larger feature size, or the model adopts the method of randomly cropping the picture area to perform aesthetic evaluation on the video frame.
  • an aesthetic training sample set is constructed, and the real aesthetic score marked by each aesthetic training sample is the average of the scores of multiple users based on a preset aesthetic angle, and the aesthetic evaluation model is evaluated by using the aesthetic training sample set. Perform training to obtain a trained aesthetic evaluation model. Preprocessing operations such as scaling and normalization can also be performed on the samples before they are fed into the aesthetic analysis model.
  • the aesthetic training sample set considering that aesthetics is a very subjective concept, in order to obtain a more accurate aesthetic evaluation annotation, multiple people are used to score the same video frame, and the average score is used as the real annotation value of the video frame.
  • the user can score the samples of the data set according to the scoring standard and personal understanding, and the aesthetic mean of each sample can be counted based on the user's score.
  • shooting habits can reflect the real shooting intention of users
  • the distribution of the positions of the artificial clips can be counted, and the machine learning model can be used to approximate the distribution of the artificial clips, so preferably, the Gaussian distribution model can be used to approximate the distribution of the artificial clips to obtain the shooting habit model: defined as in, is a hyperparameter used to adjust the weight parameter of the probability density of the Gaussian distribution, ⁇ represents the mean of the Gaussian distribution, ⁇ represents the standard deviation of the Gaussian distribution, ⁇ duration represents the duration of the video to be edited, and ⁇ is a hyperparameter used to adjust the Gaussian distribution The variance of the probability density.
  • the distribution curves of different users’ shooting habits can be generated.
  • the Gaussian distribution focuses more on the video clips whose sampling timestamp is near ⁇ , that is, the central area of the video, which is also in line with human shooting habits, that is, the middle
  • the captured video clips are more likely to be the user's real shooting intent.
  • the editing is usually performed based on the amount of information of the image, so preferably, by counting the amount of information of each video frame, the score curve of the video to be edited in the image entropy dimension is generated to treat from the image entropy dimension. Clip the video for analysis and provide the basis for subsequent cropping.
  • the information amount of the video frame It is analyzed, and the image entropy Entropy (It) is defined as:
  • the image entropy can be counted from the RGB pixel vector or the grayscale pixel vector of the frame picture feature It, and after the statistics are averaged , the score of the information content of It can be obtained.
  • Table 1 below shows the entropy calculation results of the three example images.
  • the image texture can better represent the detailed information of the picture, it is preferable to generate a score curve of the video to be edited in the texture dimension by analyzing the texture features of each video frame, so as to analyze the video to be edited from the texture dimension, And provide a basis for subsequent cutting.
  • the texture of the picture can be detected by the Laplacian operator, and before the Laplacian calculation is performed, Gaussian smoothing filtering can be performed to suppress the noise pixels in the picture.
  • the calculation process can be implemented by the GaussianBlur function and the Laplacian function.
  • the size of the Gaussian kernel and the size of the Laplacian kernel can be freely set. By counting the average intensity value of the feature map of the Gaussian Laplacian operator, the image texture information can be evaluated. :
  • s texture represents the texture statistical value
  • L(I t ) represents the intensity value of the feature map of the Laplacian of Gaussian operator.
  • Table 2 shows the texture calculation results of the three example images.
  • step S103 a preset integrated learning model is used to process all the score curves to generate a comprehensive score curve of the video to be edited.
  • all the score curves obtained in step 2 are learned through integrated learning model learning, and a comprehensive score curve of the video to be edited is generated.
  • the ensemble learning model can be implemented based on a shallow neural network.
  • a single-layer linear dense connection layer can be used, and combined with standardized operations to accelerate model convergence:
  • M is the dimension set
  • w k and b k are the parameters of the dense connection layer, which are used to weight the weight of the standardized score curve vector corresponding to each dimension
  • s' k is the standardized score of the score curve corresponding to each dimension Curve vector
  • s is the comprehensive score curve vector.
  • step S104 a clip region is selected from the first number of video sub-segments according to the comprehensive score curve.
  • a score threshold may be set, and an area in each video sub-segment greater than the score threshold may be used as a clipping area.
  • the average video score of each video sub-segment is calculated according to the comprehensive score curve. The average score sorts the video sub-segments in descending order, selects the second number of video sub-segments according to the sorting result, and selects the editing area from the second number of video sub-segments to improve the continuity of the cropping area.
  • the video sub-segments whose video lengths do not belong to the preset video length interval (the second video length interval) in the second quantity of video sub-segments are trimmed or complemented to obtain
  • the area corresponding to the second number of video sub-segments after the operation is used as the clipping area.
  • the area corresponding to the second number of video sub-segments refers to a video area between the start time and the end time of each video sub-segment in the second number of video sub-segments.
  • sub-segment A is the 3-5 minute video clip
  • sub-segment B is the 7-8 minute video clip
  • the sub-segment C is a video segment of the 10th-12th minute
  • the regions corresponding to the above-mentioned second number of video sub-segments are the video regions of the 3rd-5th minute, the 7th-8th minute, and the 10th-12th minute.
  • the short-term length needs to be completed from the left or right area of the clip. Specifically, a segment with a higher average score of T min -t clip on the left or on the right is selected for completion, as shown in FIG. 4 , for example.
  • t clip ⁇ [T max ,+ ⁇ ) that is, the length is longer, select a sub-segment with a higher score from the sub-segment.
  • a sub-segment can be randomly sampled from the current region, and the duration of the sub-segment is guaranteed to be t' clip ⁇ [T min , T max ].
  • step S105 the video to be edited is edited according to the editing area to obtain the edited video.
  • the editing of the video can be completed, and the edited video can be obtained by splicing a plurality of editing sub-segments.
  • the editing area includes three editing sub-segments A, B, and C, where sub-segment A is the video segment from the 3rd to the 5th minute, sub-segment B is the video segment from the 7th to the 8th minute, and sub-segment C is the 10th minute.
  • the average video score corresponding to each clip sub clip is 0.6, 0.8 and 0.5
  • the scene categories corresponding to each clip sub clip are outdoor, indoor, outdoor, if according to the time (each video sub clip start time) to splicing the three editing sub-segments, then splicing in the order of A, B, C; Splicing is performed in the order of B, A, and C; if the three editing sub-segments are spliced according to the scene and time (in the order from outdoor to indoor, and in chronological order in the same scene), the three clips are spliced according to the sequence of A, C, and B. Splicing in sequence.
  • the video to be edited is segmented according to a preset video segmentation algorithm to obtain a first number of video sub-segments, and the video to be edited is analyzed from a preset dimension to obtain a score curve corresponding to the dimension, Use a preset integrated learning model to process all the score curves, generate a comprehensive score curve of the video to be edited, select the editing area from the first number of video sub-segments according to the comprehensive score curve, and edit the video to be edited according to the editing area, The edited video is obtained, so as to realize the automatic editing of the video based on the analysis of multiple dimensions, and ensure the video editing effect.
  • Embodiment 2 is a diagrammatic representation of Embodiment 1:
  • FIG. 5-6 show the structure of the video editing system provided by the second embodiment of the present invention. For the convenience of description, only the parts related to the embodiment of the present invention are shown, including:
  • the video segmentation module 51 is used for segmenting the video to be edited according to a preset video segmentation algorithm to obtain a first number of video sub-segments;
  • a dimension analysis module 52 configured to analyze the video to be edited from a preset dimension to obtain a score curve corresponding to the dimension
  • the integrated learning module 53 is used to process all the score curves using a preset integrated learning model, and generate a comprehensive score curve of the video to be edited;
  • segment search module 54 for selecting clip regions from the first number of video sub-segments according to the composite score curve
  • the editing module 55 is used for editing the video to be edited according to the editing area to obtain the edited video.
  • the dimension analysis module 52 includes:
  • the digest analysis module 521 is configured to generate a score curve of the video to be edited in the digest dimension by calculating the digest score of each video sub-segment; and/or
  • An occlusion analysis module 522 configured to analyze the occlusion probability of each video frame through an occlusion analysis model, and generate a score curve of the video to be edited in the occlusion dimension;
  • Aesthetic analysis module 523 for analyzing the aesthetic score of each video frame through the aesthetic evaluation model, and generating the score curve of the video to be edited in the aesthetic dimension;
  • a shooting habit analysis module 524 configured to use a preset shooting habit analysis model to generate a score curve of the video to be edited in the shooting habit dimension;
  • the image entropy analysis module 525 is used to generate a score curve of the video to be edited in the image entropy dimension by counting the amount of information of each video frame; and/or
  • the texture analysis module 526 is configured to generate a score curve of the video to be edited in the texture dimension by analyzing the texture features of each video frame.
  • the function value of the preset distance function is used as the summary score of the video sub-segment.
  • the preset distance function is a correlation coefficient distance function
  • the summary score of the video sub-segment is a correlation coefficient distance function
  • s i represents the summary score of the ith video sub-segment
  • vi and v j represent the ith and j th video sub-segments, respectively
  • N is the first number.
  • the shooting habit analysis model is defined as: defined as in, is a hyperparameter used to adjust the weight parameter of the probability density of the Gaussian distribution, ⁇ represents the mean of the Gaussian distribution, ⁇ represents the standard deviation of the Gaussian distribution, ⁇ duration represents the duration of the video to be edited, and ⁇ is a hyperparameter used to adjust the Gaussian distribution The variance of the probability density.
  • the ensemble learning model adopts a single-layer linear dense connection layer, and the comprehensive score curve is as follows:
  • M is the dimension set
  • w k and b k are the parameters of the dense connection layer, which are used to weight the weight of the standardized score curve vector corresponding to each dimension
  • s' k is the standardized score of the score curve corresponding to each dimension Curve vector
  • s is the comprehensive score curve vector.
  • each module of the video editing system may be implemented by corresponding hardware or software units, and each unit may be an independent software and hardware unit, or may be integrated into a software and hardware unit, which is not intended to limit the present invention.
  • each module of the video editing system reference may be made to the description of the foregoing method embodiments, which will not be repeated here.
  • FIG. 7 shows the structure of the electronic device provided by the third embodiment of the present invention. For convenience of description, only the part related to the embodiment of the present invention is shown.
  • the electronic device 7 of the embodiment of the present invention includes a processor 70 , a memory 71 , and a computer program 72 stored in the memory 71 and executable on the processor 70 .
  • the processor 70 executes the computer program 72, the steps in the foregoing method embodiments are implemented, for example, steps S101 to S105 shown in FIG. 1 .
  • the processor 70 executes the computer program 72, the functions of the units in the above-mentioned apparatus embodiments, for example, the functions of the units 51 to 55 shown in FIG. 5 are realized.
  • the video to be edited is segmented according to a preset video segmentation algorithm to obtain a first number of video sub-segments, and the video to be edited is analyzed from a preset dimension to obtain a score curve corresponding to the dimension, Use a preset integrated learning model to process all the score curves, generate a comprehensive score curve of the video to be edited, select the editing area from the first number of video sub-segments according to the comprehensive score curve, and edit the video to be edited according to the editing area,
  • the edited video is obtained, which realizes automatic video editing based on multiple dimension analysis, improves the editing efficiency, and ensures the video editing effect.
  • Embodiment 4 is a diagrammatic representation of Embodiment 4:
  • a computer-readable storage medium where a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, the steps in the foregoing method embodiments are implemented, for example, as shown in FIG. 1 . Steps S101 to S105 shown.
  • the functions of the units in the above-mentioned apparatus embodiments for example, the functions of the units 51 to 55 shown in FIG. 5 , are implemented.
  • the video to be edited is segmented according to a preset video segmentation algorithm to obtain a first number of video sub-segments, and the video to be edited is analyzed from a preset dimension to obtain a score curve corresponding to the dimension, Use a preset integrated learning model to process all the score curves, generate a comprehensive score curve of the video to be edited, select the editing area from the first number of video sub-segments according to the comprehensive score curve, and edit the video to be edited according to the editing area,
  • the edited video is obtained, which realizes automatic video editing based on multiple dimension analysis, improves the editing efficiency, and ensures the video editing effect.
  • the computer-readable storage medium of the embodiments of the present invention may include any entity or device capable of carrying computer program codes, recording medium, for example, memory such as ROM/RAM, magnetic disk, optical disk, flash memory, and the like.

Abstract

The present invention is applicable to the technical field of image processing, and provides a video editing method and system, an electronic device, and a storage medium. The method comprises: segmenting, according to a preset video segmentation algorithm, a video to be edited to obtain a first number of video sub-segments; analyzing said video according to preset dimensions to obtain score curves corresponding to the dimensions; processing all the score curves by using a preset ensemble learning model to generate a comprehensive score curve of said video; selecting an editing area from the first number of video sub-segments according to the comprehensive score curve; and editing said video according to the editing area to obtain an edited video. Therefore, analysis is carried out on the basis of multiple dimensions, automatic editing of the video is realized, the editing efficiency is improved, and the video editing effect is ensured.

Description

视频剪辑方法、系统、电子设备及存储介质Video editing method, system, electronic device and storage medium 技术领域technical field
本发明属于图像处理技术领域,尤其涉及一种视频剪辑方法、系统、电子设备及存储介质。The invention belongs to the technical field of image processing, and in particular relates to a video editing method, system, electronic device and storage medium.
背景技术Background technique
短视频内容精简、主题突出,便于查阅,并且更容易在社交媒体上分享和传播。但是如何剪辑获得相对“精彩”的短视频往往是用户的一大痛点。Short videos are streamlined and focused, making them easy to read, and easier to share and disseminate on social media. However, how to edit and obtain relatively "wonderful" short videos is often a major pain point for users.
技术问题technical problem
用户拍摄的原始视频数量较多、时长不一,其中既包含了较多的无趣片段,也包含了一些无关片段,而“精彩”片段需要用户人工进行挑选和剪辑,操作难度较大,效率较低。The number of original videos shot by users varies in length, including many uninteresting clips, as well as some irrelevant clips, while “wonderful” clips require users to manually select and edit, which is more difficult to operate and more efficient. Low.
技术解决方案technical solutions
本发明的目的在于提供一种视频剪辑方法、系统、电子设备及存储介质,旨在解决现有技术中人工剪辑视频操作难度大、效率低的问题。The purpose of the present invention is to provide a video editing method, system, electronic device and storage medium, aiming to solve the problems of high difficulty and low efficiency in manual video editing in the prior art.
一方面,本发明提供一种视频剪辑方法,所述方法包括下述步骤:In one aspect, the present invention provides a video editing method, the method includes the following steps:
根据预设的视频分段算法对待剪辑视频进行切分,得到第一数量个视频子片段;Segment the video to be edited according to the preset video segmentation algorithm to obtain the first number of video sub-segments;
从预设的维度对所述待剪辑视频进行分析,得到与所述维度对应的得分曲线;Analyze the video to be edited from a preset dimension to obtain a score curve corresponding to the dimension;
使用预设的集成学习模型处理所有的所述得分曲线,生成所述待剪辑视频的综合得分曲线;Use the preset integrated learning model to process all the score curves, and generate the comprehensive score curve of the video to be edited;
根据所述综合得分曲线从所述第一数量个视频子片段中选取剪辑区域;Selecting a clipping region from the first number of video sub-segments according to the comprehensive score curve;
根据所述剪辑区域对所述待剪辑视频进行剪辑,得到剪辑后的视频。Edit the video to be edited according to the editing area to obtain the edited video.
优选地,所述从预设的维度对所述待剪辑视频进行分析,得到与所述维度对应的得分曲线的步骤,包括:Preferably, the step of analyzing the video to be edited from a preset dimension to obtain a score curve corresponding to the dimension includes:
通过计算每个视频子片段的摘要得分,生成所述待剪辑视频在摘要维度 上的得分曲线;和/或By calculating the summary score of each video subsection, a score curve of the video to be edited on the summary dimension is generated; and/or
通过遮挡分析模型分析每个视频帧的遮挡概率,生成所述待剪辑视频在遮挡维度上的得分曲线;和/或Analyze the occlusion probability of each video frame through an occlusion analysis model, and generate a score curve of the video to be edited in the occlusion dimension; and/or
通过美学评价模型分析每个视频帧的美学得分,生成所述待剪辑视频在美学维度上的得分曲线;和/或Analyze the aesthetic score of each video frame through an aesthetic evaluation model, and generate a score curve of the video to be edited in the aesthetic dimension; and/or
使用预设的拍摄习惯分析模型,生成所述待剪辑视频在拍摄习惯维度上的得分曲线;和/或Use a preset shooting habit analysis model to generate a score curve of the video to be edited in the shooting habit dimension; and/or
通过统计每个视频帧的信息量,生成所述待剪辑视频在图像熵维度上的得分曲线;和/或Generate a score curve of the video to be edited in the image entropy dimension by counting the amount of information of each video frame; and/or
通过分析每个视频帧的纹理特征,生成所述待剪辑视频在纹理维度上的得分曲线。By analyzing the texture features of each video frame, a score curve of the video to be edited in the texture dimension is generated.
优选地,将预设的距离函数的函数值作为所述视频子片段的摘要得分。Preferably, the function value of the preset distance function is used as the summary score of the video sub-segment.
优选地,所述预设的距离函数为相关系数距离函数,所述视频子片段的摘要得分:
Figure PCTCN2021138917-appb-000001
其中,s i表示第i个视频子片段摘要得分,v i、v j分别表示第i、j个视频子片段,N为所述第一数量。
Preferably, the preset distance function is a correlation coefficient distance function, and the summary score of the video sub-segment:
Figure PCTCN2021138917-appb-000001
Wherein, s i represents the summary score of the ith video sub-segment, v i and v j represent the ith and j th video sub-segments, respectively, and N is the first number.
优选地,所述遮挡分析模型的训练过程包括:Preferably, the training process of the occlusion analysis model includes:
构建遮挡训练样本集,所述遮挡训练样本集中包含干净样本集和遮挡样本集,所述干净样本不包含遮挡物,所述遮挡样本至少包含一类遮挡物;constructing an occlusion training sample set, the occlusion training sample set includes a clean sample set and an occlusion sample set, the clean sample does not include an occluder, and the occlusion sample includes at least one type of occluder;
将遮挡训练样本输入所述遮挡分析模型,计算输出的遮挡概率与真实遮挡概率的交叉熵损失值,并通过梯度下降算法优化所述遮挡分析模型的参数,直至所述遮挡分析模型的遮挡检测准确率达到预设值。Input the occlusion training samples into the occlusion analysis model, calculate the cross-entropy loss value of the output occlusion probability and the true occlusion probability, and optimize the parameters of the occlusion analysis model through the gradient descent algorithm until the occlusion detection of the occlusion analysis model is accurate. rate reaches the preset value.
优选地,所述美学评价模型的训练过程包括:Preferably, the training process of the aesthetic evaluation model includes:
构建美学训练样本集,每个美学训练样本标注的真实美学得分为多个用户基于预设的美学角度评分的均值;Construct an aesthetic training sample set, and the true aesthetic score marked by each aesthetic training sample is the average of multiple users' scores based on preset aesthetic angles;
使用所述美学训练样本集对所述美学评价模型进行训练,得到训练好的美学评价模型。The aesthetic evaluation model is trained using the aesthetic training sample set to obtain a trained aesthetic evaluation model.
优选地,所述拍摄习惯分析模型定义为:
Figure PCTCN2021138917-appb-000002
定义为
Figure PCTCN2021138917-appb-000003
其中,
Figure PCTCN2021138917-appb-000004
为超参数,用于调整高斯分布概率密度的权值参数,μ表示高斯分布的均值,σ表示高斯分布的标准差,υ duration表示所述待剪辑视频的时长,β为超参数,用于调整高斯分布概率密度的方差。
Preferably, the shooting habit analysis model is defined as:
Figure PCTCN2021138917-appb-000002
defined as
Figure PCTCN2021138917-appb-000003
in,
Figure PCTCN2021138917-appb-000004
is a hyperparameter, used to adjust the weight parameter of the probability density of the Gaussian distribution, μ represents the mean value of the Gaussian distribution, σ represents the standard deviation of the Gaussian distribution, υ duration represents the duration of the video to be edited, and β is a hyperparameter used to adjust The variance of the probability density of a Gaussian distribution.
优选地,所述集成学习模型采用单层线性稠密连接层,所述综合得分曲线如下:Preferably, the integrated learning model adopts a single-layer linear dense connection layer, and the comprehensive score curve is as follows:
Figure PCTCN2021138917-appb-000005
Figure PCTCN2021138917-appb-000005
其中,M为维度集合,w k和b k为稠密连接层的参数,用于加权各个维度对应的标准化后的得分曲线向量的权值,s' k为各个维度对应的得分曲线标准化后的得分曲线向量,s为综合得分曲线向量。 Among them, M is the dimension set, w k and b k are the parameters of the dense connection layer, which are used to weight the weight of the standardized score curve vector corresponding to each dimension, and s' k is the standardized score of the score curve corresponding to each dimension Curve vector, s is the comprehensive score curve vector.
优选地,所述根据所述综合得分曲线从所述第一数量个视频子片段中选取剪辑区域的步骤,包括:Preferably, the step of selecting the editing region from the first number of video sub-segments according to the comprehensive score curve includes:
根据所述综合得分曲线计算每个所述视频子片段的视频平均分;Calculate the average video score of each of the video sub-segments according to the comprehensive score curve;
根据所述视频平均分依照由高到低的顺序对所述视频子片段进行排序,根据排序结果选取第二数量个视频子片段,从所述第二数量个视频子片段中选取所述剪辑区域。Sort the video sub-segments in descending order according to the average video score, select a second number of video sub-segments according to the sorting result, and select the editing region from the second number of video sub-segments .
优选地,所述根据排序结果选取第二数量个视频子片段,从所述第二数量个视频子片段中选取所述剪辑区域的步骤,包括:Preferably, the step of selecting a second number of video sub-segments according to the sorting result, and selecting the editing region from the second number of video sub-segments, includes:
对所述第二数量个视频子片段中视频长度不属于预设的视频长度区间的视频子片段进行裁剪或补全操作,得到操作后的第二数量个视频子片段,将所述操作后的第二数量个视频子片段对应的区域作为所述剪辑区域。The clipping or complementing operation is performed on the video sub-segments whose video lengths do not belong to the preset video length interval in the second number of video sub-segments, the second number of video sub-segments after the operation are obtained, and the The area corresponding to the second number of video sub-segments is used as the clip area.
另一方面,本发明提供了一种视频剪辑系统,所述系统包括:In another aspect, the present invention provides a video editing system, the system comprising:
视频分段模块,用于根据预设的视频分段算法对待剪辑视频进行切分,得到第一数量个视频子片段;a video segmentation module, used for segmenting the video to be edited according to a preset video segmentation algorithm to obtain a first number of video sub-segments;
维度分析模块,用于从预设的维度对所述待剪辑视频进行分析,得到与所述维度对应的得分曲线;a dimension analysis module, configured to analyze the video to be edited from a preset dimension to obtain a score curve corresponding to the dimension;
集成学习模块,用于使用预设的集成学习模型处理所有的所述得分曲线,生成所述待剪辑视频的综合得分曲线;An integrated learning module for processing all the score curves using a preset integrated learning model to generate a comprehensive score curve of the video to be edited;
片段搜索模块,用于根据所述综合得分曲线从所述第一数量个视频子片段中选取剪辑区域;以及a segment search module for selecting clip regions from the first number of video sub-segments according to the comprehensive score curve; and
剪辑模块,用于根据所述剪辑区域对所述待剪辑视频进行剪辑,得到剪辑后的视频。The editing module is used for editing the video to be edited according to the editing area to obtain the edited video.
优选地,所述维度分析模块包括:Preferably, the dimension analysis module includes:
摘要分析模块,用于通过计算每个视频子片段的摘要得分,生成所述待剪辑视频在摘要维度上的得分曲线;和/或A digest analysis module, configured to generate a score curve of the video to be edited in the digest dimension by calculating the digest score of each video sub-segment; and/or
遮挡分析模块,用于通过遮挡分析模型分析每个视频帧的遮挡概率,生成所述待剪辑视频在遮挡维度上的得分曲线;和/或An occlusion analysis module, configured to analyze the occlusion probability of each video frame through an occlusion analysis model, and generate a score curve of the video to be edited in the occlusion dimension; and/or
美学分析模块,用于通过美学评价模型分析每个视频帧的美学得分,生成所述待剪辑视频在美学维度上的得分曲线;和/或An aesthetic analysis module for analyzing the aesthetic score of each video frame through an aesthetic evaluation model, and generating a score curve of the video to be edited in the aesthetic dimension; and/or
拍摄习惯分析模块,用于使用预设的拍摄习惯分析模型,生成所述待剪辑视频在拍摄习惯维度上的得分曲线;和/或A shooting habit analysis module, configured to use a preset shooting habit analysis model to generate a score curve of the video to be edited in the shooting habit dimension; and/or
图像熵分析模块,用于通过统计每个视频帧的信息量,生成所述待剪辑视频在图像熵维度上的得分曲线;和/或An image entropy analysis module, for generating a score curve of the video to be edited in the image entropy dimension by counting the amount of information of each video frame; and/or
纹理分析模块,用于通过分析每个视频帧的纹理特征,生成所述待剪辑视频在纹理维度上的得分曲线。The texture analysis module is configured to generate a score curve of the video to be edited in the texture dimension by analyzing the texture features of each video frame.
另一方面,本发明还提供了一种电子设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现如上所述方法的步骤。In another aspect, the present invention also provides an electronic device, comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor is implemented when the processor executes the computer program The steps of the method as described above.
另一方面,本发明还提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时实现如上所述方法的步骤。In another aspect, the present invention also provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, implements the steps of the above-mentioned method.
有益效果beneficial effect
本发明根据预设的视频分段算法对待剪辑视频进行切分,得到第一数量个视频子片段,从预设的维度对待剪辑视频进行分析,得到与维度对应的得分曲线,使用预设的集成学习模型处理所有的所述得分曲线,生成待剪辑视频的综合得分曲线,根据综合得分曲线从第一数量个视频子片段中选取剪辑区域,根据剪辑区域对待剪辑视频进行剪辑,得到剪辑后的视频,从而基于多个维度分析,实现了视频的自动剪辑,提高了剪辑效率,并保证了视频剪辑效果。The present invention divides the video to be edited according to a preset video segmentation algorithm, obtains a first number of video sub-segments, analyzes the video to be edited from a preset dimension, obtains a score curve corresponding to the dimension, and uses a preset integrated The learning model processes all the score curves, generates a comprehensive score curve of the video to be edited, selects the editing area from the first number of video sub-segments according to the comprehensive score curve, and edits the video to be edited according to the editing area to obtain the edited video. , so as to realize automatic video editing based on multiple dimensional analysis, improve the editing efficiency, and ensure the video editing effect.
附图说明Description of drawings
图1是本发明实施例一提供的视频剪辑方法的实现流程图;Fig. 1 is the realization flow chart of the video editing method provided in Embodiment 1 of the present invention;
图2是本发明实施例一提供的视频分段结果示例图;2 is an example diagram of a video segmentation result provided in Embodiment 1 of the present invention;
图3是本发明实施例一提供的在摘要维度上的得分曲线示例图;3 is an example diagram of a score curve on the summary dimension provided by Embodiment 1 of the present invention;
图4是本发明实施例一提供的剪辑子片段扩展示例图;FIG. 4 is an example diagram of a clipped sub-segment extension provided by Embodiment 1 of the present invention;
图5是本发明实施例二提供的视频剪辑系统的结构示意图;5 is a schematic structural diagram of a video editing system provided in Embodiment 2 of the present invention;
图6是本发明实施例二提供的视频剪辑系统的优选结构示意图;以及6 is a schematic diagram of a preferred structure of a video editing system provided by Embodiment 2 of the present invention; and
图7是本发明实施例三提供的电子设备的结构示意图。FIG. 7 is a schematic structural diagram of an electronic device according to Embodiment 3 of the present invention.
本发明实施方式Embodiments of the present invention
为了使本发明的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本发明进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。In order to make the objectives, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention.
以下结合具体实施例对本发明的具体实现进行详细描述:The specific implementation of the present invention is described in detail below in conjunction with specific embodiments:
实施例一:Example 1:
图1示出了本发明实施例一提供的视频剪辑方法的实现流程,为了便于说明,仅示出了与本发明实施例相关的部分,详述如下:FIG. 1 shows the implementation process of the video editing method provided by the first embodiment of the present invention. For the convenience of description, only the part related to the embodiment of the present invention is shown, and the details are as follows:
在步骤S101中,根据预设的视频分段算法对待剪辑视频进行切分,得到第一数量个视频子片段。In step S101, the video to be edited is segmented according to a preset video segmentation algorithm to obtain a first number of video sub-segments.
本发明实施例适用于自动视频剪辑,该待剪辑视频可以是用户基于相机 或者手机拍摄的视频。在本发明实施例中,在根据预设的视频分段算法对待剪辑视频进行切分时,通常基于待剪辑视频的抽帧序列进行,具体实现中,可以采用ffmpeg程序等对待剪辑视频等进行解码,在解码之后,对待剪辑视频进行抽帧处理,在进行抽帧处理时,可以按照预设的fps(每秒帧数)采样率进行抽帧处理,例如,fps设置为5或者3。为便于说明,待剪辑视频的抽帧序列表示表示为:{I t},t∈[1,M],I t∈R h×w,帧特征图的高宽h×w可设置为224×224,由原始分辨率图片经过差值缩放后获得,t时刻的帧特征向量
Figure PCTCN2021138917-appb-000006
定义为I t的特征直方图,然后使用视频分段算法按照场景不同将待剪辑视频分割成多个视频子片段,其中,该预设的视频分段算法可以是KTS(核函数时间分割)算法或者其它的有序聚类算法,在此不作限定。经过上述运算后,待剪辑视频v可用形状为[M,96]的矩阵表示,其中,M表示视频帧的总数量,切分后得到的视频子片段的长度不一,且段内位置连续,图2为视频分段结果的一示例图。
The embodiment of the present invention is applicable to automatic video editing, and the video to be edited may be a video shot by a user based on a camera or a mobile phone. In the embodiment of the present invention, when the video to be edited is segmented according to a preset video segmentation algorithm, it is usually performed based on the frame extraction sequence of the video to be edited. In the specific implementation, the ffmpeg program can be used to decode the video to be edited, etc. , after decoding, perform frame extraction processing on the video to be edited. When performing frame extraction processing, frame extraction processing can be performed according to a preset fps (frames per second) sampling rate, for example, fps is set to 5 or 3. For the convenience of description, the frame extraction sequence of the video to be edited is expressed as: {I t },t∈[1,M],I t ∈R h×w , the height and width h×w of the frame feature map can be set to 224× 224, obtained from the original resolution image after differential scaling, the frame feature vector at time t
Figure PCTCN2021138917-appb-000006
Defined as the feature histogram of It, then use the video segmentation algorithm to divide the video to be edited into a plurality of video sub-segments according to different scenes, wherein, the preset video segmentation algorithm can be KTS (kernel function time segmentation) algorithm Or other ordered clustering algorithms, which are not limited here. After the above operations, the video v to be edited can be represented by a matrix of shape [M, 96], where M represents the total number of video frames, the lengths of the video sub-segments obtained after segmentation are different, and the positions within the segment are continuous, FIG. 2 is an example diagram of a video segmentation result.
为了避免视频子片段的长度出现过大或过小的情况,可以预先设置一视频子片段长度区间(第一视频长度区间),进一步地,获取每个视频子片段的长度,当任一视频子片段的长度不属于该视频子片段长度区间时,对视频子片段进行切分或者拼接处理。具体地,在对视频子片段进行拼接处理时,可以结合该视频子片段的前后视频子片段的长度和/或与前后视频子片段的近似度进行拼接。当然,在该步骤中也可以不对视频子片段长度作处理,而是在确定裁剪区域过程中处理。In order to avoid the situation that the length of the video sub-segments is too large or too small, a video sub-segment length interval (the first video length interval) can be preset, and further, the length of each video sub-segment is obtained. When the length of the segment does not belong to the length range of the video sub-segment, the video sub-segment is segmented or spliced. Specifically, when performing the splicing process on the video sub-segments, the splicing may be performed in combination with the lengths of the video sub-segments before and after the video sub-segment and/or the similarity with the preceding and following video sub-segments. Of course, in this step, the length of the video sub-segment may not be processed, but processed in the process of determining the cropping region.
在步骤S102中,从预设的维度对待剪辑视频进行分析,得到与维度对应的得分曲线。In step S102, the video to be edited is analyzed from a preset dimension to obtain a score curve corresponding to the dimension.
在本发明实施例中,在从预设的维度对待剪辑视频进行分析时,通常也是基于抽帧序列分析,以提高分析效率。其中,该预设的维度可以为一个或者多个,当该预设的维度为多个时,相应地,得分曲线为多个,即会得到与每个维度分别对应的得分曲线。优选地,预设的维度包括视频摘要维度、遮挡维度、美学维度、拍摄习惯维度、图像熵维度和/或纹理维度。In this embodiment of the present invention, when analyzing the video to be edited from a preset dimension, it is usually also based on frame sampling sequence analysis to improve analysis efficiency. Wherein, the preset dimension may be one or more, and when there are multiple preset dimensions, correspondingly, there are multiple score curves, that is, a score curve corresponding to each dimension will be obtained. Preferably, the preset dimensions include a video summary dimension, an occlusion dimension, an aesthetic dimension, a shooting habit dimension, an image entropy dimension and/or a texture dimension.
考虑到用户实际剪辑视频时,通常会基于视频的主要内容进行剪辑,从而优选地,通过计算每个视频子片段的摘要得分,生成待剪辑视频在摘要维度上的得分曲线,以从视频摘要维度对待剪辑视频进行分析,并为后续裁剪提供依据。具体地,可分析待剪辑视频的主要内容,并根据主要内容来评价每个视频帧接近主要内容的程度,形成每帧画面能够代表视频主要内容的代表性曲线。通常来说,视频摘要主要有两种优化目标:代表性优化目标和多样性优化目标。Considering that when a user actually cuts a video, the editing is usually based on the main content of the video, so preferably, by calculating the summary score of each video sub-segment, a score curve of the video to be edited in the summary dimension is generated, so as to obtain a summary score from the video summary dimension. Analyze the video to be edited and provide a basis for subsequent cropping. Specifically, the main content of the video to be edited can be analyzed, and the degree to which each video frame is close to the main content can be evaluated according to the main content, so as to form a representative curve that each frame can represent the main content of the video. Generally speaking, there are two main optimization objectives for video summarization: representative optimization objective and diversity optimization objective.
为了能获得上述每个视频子片段的摘要得分s i,优选地,将dist(·)函数(距离函数)的函数值作为每个视频子片段的得分,以简化得分曲线获取的复杂度。 In order to obtain the above-mentioned summary score si of each video sub-segment, preferably, the function value of the dist(·) function (distance function) is used as the score of each video sub-segment to simplify the complexity of obtaining the score curve.
进一步优选地,距离函数为相关系数距离函数,视频子片段v i的摘要得分: Further preferably, the distance function is a correlation coefficient distance function, and the summary score of the video sub-segment vi :
Figure PCTCN2021138917-appb-000007
Figure PCTCN2021138917-appb-000007
其中,s i表示第i个视频子片段摘要得分,v i、v j分别表示第i、j个视频子片段,N为第一数量。根据上式即可计算每个片段v i与其他片段v j的相关系数矩阵,并将第i行矩阵之和作为视频子片段v i的摘要得分s i。进一步地,对s i进行标准化操作,以得到标准化处理后的得分曲线。图3为一标准化操作后的视频摘要得分曲线的示例图。 Among them, s i represents the summary score of the ith video sub-segment, vi and v j represent the ith and j th video sub-segments, respectively, and N is the first number. According to the above formula, the correlation coefficient matrix of each segment v i and other segments v j can be calculated, and the sum of the i-th row of the matrix is used as the summary score s i of the video sub-segment vi. Further, a normalization operation is performed on si to obtain a normalized score curve. FIG. 3 is an example diagram of a video summary score curve after normalization operation.
考虑到用户在拍摄视频的过程中可能会产生手、头部或者头发等遮挡镜头的片段,而由于手、头部或者头发等遮挡物导致拍摄的画面不够纯净和优美,不适合作为剪辑片段,从而优选地,通过遮挡分析模型分析每个视频帧的遮挡概率,生成待剪辑视频在遮挡维度上的得分曲线,以从遮挡维度对待剪辑视频进行分析,并为后续裁剪提供依据。其中,遮挡分析模型用于分析视频帧中是否有遮挡物出现,并输出该视频帧中出现遮挡物的概率,通过遮挡模型计算每个视频帧的遮挡概率,即可得到视频在遮挡维度上的得分曲线。其中,遮挡模型可以基于深度神经网络实现,该深度神经网络可以基于 MobileNetv2、EfficientNet等轻量级的移动网络实现,也可以基于其他深度卷积神经网络模型实现。Considering that in the process of shooting videos, users may produce clips with hands, heads, or hair that block the lens, and the pictures taken due to occlusions such as hands, heads, or hair are not pure and beautiful enough to be used as clips. Therefore, preferably, the occlusion probability of each video frame is analyzed by an occlusion analysis model, and a score curve of the video to be edited in the occlusion dimension is generated, so as to analyze the video to be edited from the occlusion dimension and provide a basis for subsequent cropping. Among them, the occlusion analysis model is used to analyze whether there is an occluder in the video frame, and output the probability of the occlusion in the video frame. The occlusion probability of each video frame is calculated by the occlusion model, and the occlusion dimension of the video can be obtained. score curve. Among them, the occlusion model can be implemented based on a deep neural network, which can be implemented based on lightweight mobile networks such as MobileNetv2 and EfficientNet, and can also be implemented based on other deep convolutional neural network models.
在对遮挡分析模型进行训练时,优选地,构建遮挡训练样本集,遮挡训练样本集中包含干净样本集和遮挡样本集,将遮挡训练样本输入遮挡分析模型,计算输出的遮挡概率与真实遮挡概率的交叉熵损失值,并通过梯度下降算法优化遮挡分析模型的参数,直至遮挡分析模型的遮挡检测准确率达到预设值。其中,干净样本不包含遮挡物,遮挡样本包含有头部遮挡、手遮挡、头发遮挡等至少一类的遮挡物。具体实现中,上述两类样本均可采集自真实相机拍摄的画面,由人工进行是否遮挡的手动标注,对于干净样本,标注为0,对于遮挡样本,标注为1。在输入遮挡分析模型前可以对视频帧进行预处理,该预处理可以包括随机数据增强、缩放、和标准化操作,在将遮挡训练样本遮挡分析模型后,遮挡模型的最后一层经过softmax激活函数后输出长度为2的向量p(I t),分别代表未出现遮挡和出现遮挡的概率,两概率之和为1,通过计算p(I t)与真实标签的交叉熵损失值,并通过梯度下降算法即可优化模型参数,从而获得准确的遮挡模型。 When training the occlusion analysis model, preferably, an occlusion training sample set is constructed, the occlusion training sample set includes a clean sample set and an occlusion sample set, the occlusion training samples are input into the occlusion analysis model, and the output occlusion probability and the real occlusion probability are calculated. Cross entropy loss value, and optimize the parameters of the occlusion analysis model through the gradient descent algorithm until the occlusion detection accuracy of the occlusion analysis model reaches the preset value. The clean samples do not contain occluders, and the occlusion samples include at least one type of occlusions such as head occlusion, hand occlusion, and hair occlusion. In the specific implementation, the above two types of samples can be collected from the pictures taken by the real camera, and the manual marking of occlusion is performed manually. For clean samples, it is marked as 0, and for occluded samples, it is marked as 1. Before inputting the occlusion analysis model, the video frame can be preprocessed. The preprocessing can include random data enhancement, scaling, and normalization operations. After the occlusion training samples are occluded in the analysis model, the last layer of the occlusion model passes through the softmax activation function. The vector p(I t ) with a length of 2 is output, representing the probability of no occlusion and occlusion, respectively. The sum of the two probabilities is 1. By calculating the cross-entropy loss value between p(I t ) and the true label, and by gradient descent The algorithm can optimize the model parameters to obtain an accurate occlusion model.
考虑到视频剪辑过程中,通常也会基于美学角度进行剪辑,从而优选地,通过美学评价模型分析每个视频帧的美学得分,生成待剪辑视频在美学维度上的得分曲线,以从遮挡维度对待剪辑视频进行分析,并为后续裁剪提供依据。美学分析模型可以采用深度卷积神经网络实现,深度卷积神经网络可以基于轻量级移动端模型,例如,MobileNetv2/v3,MobileNext,GhostNet等。为了获得更加精准的美学评价模型,模型输入需要设计为较大的特征大小,或者模型采用随机裁剪画面区域的方式来对视频帧进行美学评价。通过美学分析模型计算每个视频帧的遮挡概率,即可得到待剪辑视频在遮挡维度上的得分曲线。Considering that in the process of video editing, editing is usually performed based on aesthetics, so it is preferable to analyze the aesthetic score of each video frame through an aesthetic evaluation model, and generate a score curve of the video to be edited in the aesthetic dimension, so as to treat it from the occlusion dimension. Clip the video for analysis and provide the basis for subsequent cropping. The aesthetic analysis model can be implemented using deep convolutional neural networks, which can be based on lightweight mobile models, such as MobileNetv2/v3, MobileNext, GhostNet, etc. In order to obtain a more accurate aesthetic evaluation model, the model input needs to be designed with a larger feature size, or the model adopts the method of randomly cropping the picture area to perform aesthetic evaluation on the video frame. By calculating the occlusion probability of each video frame through the aesthetic analysis model, the score curve of the video to be edited in the occlusion dimension can be obtained.
在对美学分析模型训练时,优选地,构建美学训练样本集,每个美学训练样本标注的真实美学得分为多个用户基于预设的美学角度评分的均值,使用美学训练样本集对美学评价模型进行训练,得到训练好的美学评价模型。 在将样本输入美学分析模型之前,还可以对样本进行缩放和标准化等预处理操作。在构建美学训练样本集时,考虑到美学是个非常主观的概念,为了获得较为准确的美学评价标注,采用了多人对同一个视频帧进行评分,并将评分均值作为视频帧的真实标注值,具体实现中,可由用户根据评分标准和个人理解,对数据集的样本进行打分,基于用户的打分统计每个样本的美学均值。When training the aesthetic analysis model, preferably, an aesthetic training sample set is constructed, and the real aesthetic score marked by each aesthetic training sample is the average of the scores of multiple users based on a preset aesthetic angle, and the aesthetic evaluation model is evaluated by using the aesthetic training sample set. Perform training to obtain a trained aesthetic evaluation model. Preprocessing operations such as scaling and normalization can also be performed on the samples before they are fed into the aesthetic analysis model. When constructing the aesthetic training sample set, considering that aesthetics is a very subjective concept, in order to obtain a more accurate aesthetic evaluation annotation, multiple people are used to score the same video frame, and the average score is used as the real annotation value of the video frame. In a specific implementation, the user can score the samples of the data set according to the scoring standard and personal understanding, and the aesthetic mean of each sample can be counted based on the user's score.
考虑到拍摄习惯可以反映用户的真正拍摄意图,从而优选地,使用预设的拍摄习惯分析模型,生成待剪辑视频在拍摄习惯维度上的得分曲线,以从拍摄习惯维度对待剪辑视频进行分析,并为后续裁剪提供依据。具体实现中,可以通过对人工剪辑片段的位置分布进行统计,利用机器学习模型逼近人工剪辑片段的分布,从而优选地,利用高斯分布模型逼近人工剪辑片段的分布,得到拍摄习惯模型:
Figure PCTCN2021138917-appb-000008
定义为
Figure PCTCN2021138917-appb-000009
Figure PCTCN2021138917-appb-000010
其中,
Figure PCTCN2021138917-appb-000011
为超参数,用于调整高斯分布概率密度的权值参数,μ表示高斯分布的均值,σ表示高斯分布的标准差,υ duration表示待剪辑视频的时长,β为超参数,用于调整高斯分布概率密度的方差。
Considering that shooting habits can reflect the real shooting intention of users, it is preferable to use a preset shooting habit analysis model to generate a score curve of the video to be edited in the dimension of shooting habits, so as to analyze the edited video from the dimension of shooting habits, and Provide a basis for subsequent cutting. In the specific implementation, the distribution of the positions of the artificial clips can be counted, and the machine learning model can be used to approximate the distribution of the artificial clips, so preferably, the Gaussian distribution model can be used to approximate the distribution of the artificial clips to obtain the shooting habit model:
Figure PCTCN2021138917-appb-000008
defined as
Figure PCTCN2021138917-appb-000009
Figure PCTCN2021138917-appb-000010
in,
Figure PCTCN2021138917-appb-000011
is a hyperparameter used to adjust the weight parameter of the probability density of the Gaussian distribution, μ represents the mean of the Gaussian distribution, σ represents the standard deviation of the Gaussian distribution, υ duration represents the duration of the video to be edited, and β is a hyperparameter used to adjust the Gaussian distribution The variance of the probability density.
通过调整α和β超参数即可生成不同用户拍摄习惯的分布曲线,高斯分布更加侧重于采样时间戳在μ附近的视频片段,即视频的中心区域,这也是符合人类的拍摄习惯,即,中间拍摄的视频片段更可能是用户真正的拍摄意图。By adjusting the α and β hyperparameters, the distribution curves of different users’ shooting habits can be generated. The Gaussian distribution focuses more on the video clips whose sampling timestamp is near μ, that is, the central area of the video, which is also in line with human shooting habits, that is, the middle The captured video clips are more likely to be the user's real shooting intent.
考虑到人工剪辑视频时,通常会基于图像的信息量进行剪辑,从而优选地,通过统计每个视频帧的信息量,生成待剪辑视频在图像熵维度上的得分曲线,以从图像熵维度对待剪辑视频进行分析,并为后续裁剪提供依据。具体实现中,分析视频帧I t的信息量,图像熵Entropy(I t)定义为: Considering that when manually editing a video, the editing is usually performed based on the amount of information of the image, so preferably, by counting the amount of information of each video frame, the score curve of the video to be edited in the image entropy dimension is generated to treat from the image entropy dimension. Clip the video for analysis and provide the basis for subsequent cropping. In the specific implementation, the information amount of the video frame It is analyzed, and the image entropy Entropy (It) is defined as:
Figure PCTCN2021138917-appb-000012
Figure PCTCN2021138917-appb-000012
图像熵可以从帧画面特征I t的RGB像素向量或灰度像素向量上进行统计, 统计后平均即可得到I t的信息量的评分。Entropy(I t)越大,说明像素分布越随机,包含的信息量也就越大。通过对视频每帧的画面进行信息量评分,即可得到视频在图像熵维度上的得分曲线。 The image entropy can be counted from the RGB pixel vector or the grayscale pixel vector of the frame picture feature It, and after the statistics are averaged , the score of the information content of It can be obtained. The larger the Entropy(I t ), the more random the pixel distribution is, and the greater the amount of information it contains. By scoring the information content of each frame of the video, the score curve of the video in the image entropy dimension can be obtained.
以下表1为3张示例图片的熵计算结果。Table 1 below shows the entropy calculation results of the three example images.
Figure PCTCN2021138917-appb-000013
Figure PCTCN2021138917-appb-000013
表1Table 1
考虑到图像纹理能够较好地代表画面的细节信息,从而优选地,通过分析每个视频帧的纹理特征,生成待剪辑视频在纹理维度上的得分曲线,以从纹理维度对待剪辑视频进行分析,并为后续裁剪提供依据。具体地,可以通过拉普拉斯算子检测图片的纹理,在进行拉普拉斯计算前,可以先进行高斯平滑滤波,抑制图片中的噪点像素。该计算过程可以通过GaussianBlur函数和Laplacian函数实现,高斯核大小和拉普拉斯核大小可自由设定,通过统计高斯拉普拉斯算子的特征图的平均强度值,即可评价图片纹理信息:Considering that the image texture can better represent the detailed information of the picture, it is preferable to generate a score curve of the video to be edited in the texture dimension by analyzing the texture features of each video frame, so as to analyze the video to be edited from the texture dimension, And provide a basis for subsequent cutting. Specifically, the texture of the picture can be detected by the Laplacian operator, and before the Laplacian calculation is performed, Gaussian smoothing filtering can be performed to suppress the noise pixels in the picture. The calculation process can be implemented by the GaussianBlur function and the Laplacian function. The size of the Gaussian kernel and the size of the Laplacian kernel can be freely set. By counting the average intensity value of the feature map of the Gaussian Laplacian operator, the image texture information can be evaluated. :
Figure PCTCN2021138917-appb-000014
Figure PCTCN2021138917-appb-000014
其中,s texture表示纹理统计值,L(I t)表示高斯拉普拉斯算子的特征图的强度值。 Among them, s texture represents the texture statistical value, and L(I t ) represents the intensity value of the feature map of the Laplacian of Gaussian operator.
以下表2为3张示例图片的纹理计算结果。The following Table 2 shows the texture calculation results of the three example images.
Figure PCTCN2021138917-appb-000015
Figure PCTCN2021138917-appb-000015
表2Table 2
在步骤S103中,使用预设的集成学习模型处理所有的所述得分曲线,生成待剪辑视频的综合得分曲线。In step S103, a preset integrated learning model is used to process all the score curves to generate a comprehensive score curve of the video to be edited.
在本发明实施例中,通过集成学习模型学习对通过步骤2得到的所有得分曲线进行学习,生成一条待剪辑视频的综合得分曲线。该集成学习模型可以基于浅层神经网络实现,为了简化模型参数量,可以采用单层线性稠密连接层,并结合标准化操作加速模型收敛:In the embodiment of the present invention, all the score curves obtained in step 2 are learned through integrated learning model learning, and a comprehensive score curve of the video to be edited is generated. The ensemble learning model can be implemented based on a shallow neural network. In order to simplify the model parameters, a single-layer linear dense connection layer can be used, and combined with standardized operations to accelerate model convergence:
Figure PCTCN2021138917-appb-000016
Figure PCTCN2021138917-appb-000016
其中,M为维度集合,w k和b k为稠密连接层的参数,用于加权各个维度对应的标准化后的得分曲线向量的权值,s' k为各个维度对应的得分曲线标准化后的得分曲线向量,s为综合得分曲线向量。 Among them, M is the dimension set, w k and b k are the parameters of the dense connection layer, which are used to weight the weight of the standardized score curve vector corresponding to each dimension, and s' k is the standardized score of the score curve corresponding to each dimension Curve vector, s is the comprehensive score curve vector.
在步骤S104中,根据综合得分曲线从第一数量个视频子片段中选取剪辑区域。In step S104, a clip region is selected from the first number of video sub-segments according to the comprehensive score curve.
在本发明实施例中,可以设置以得分阈值,将每个视频子片段中大于该得分阈值的区域作为剪辑区域,优选地,根据综合得分曲线计算每个视频子片段的视频平均分,根据视频平均分依照由高到低的顺序对视频子片段进行排序,根据排序结果选取第二数量个视频子片段,从第二数量个视频子片段中选取剪辑区域,以提高裁剪区域的连续性。为了选择时长合适的子片段,进一步优选地,对第二数量个视频子片段中视频长度不属于预设的视频长度区间(第二视频长度区间)的视频子片段进行裁剪或补全操作,得到操作后的第二数量个视频子片段,将操作后的第二数量个视频子片段对应的区域作为剪辑区域。其中,该第二数量个视频子片段对应的区域,是指该第二数量个视频子片段中每个视频子片段的开始时间和结束时间之间的视频区域。In this embodiment of the present invention, a score threshold may be set, and an area in each video sub-segment greater than the score threshold may be used as a clipping area. Preferably, the average video score of each video sub-segment is calculated according to the comprehensive score curve. The average score sorts the video sub-segments in descending order, selects the second number of video sub-segments according to the sorting result, and selects the editing area from the second number of video sub-segments to improve the continuity of the cropping area. In order to select sub-segments with suitable duration, further preferably, the video sub-segments whose video lengths do not belong to the preset video length interval (the second video length interval) in the second quantity of video sub-segments are trimmed or complemented to obtain For the second number of video sub-segments after the operation, the area corresponding to the second number of video sub-segments after the operation is used as the clipping area. The area corresponding to the second number of video sub-segments refers to a video area between the start time and the end time of each video sub-segment in the second number of video sub-segments.
作为示例地,若裁剪或者补全操作后的视频子片段为A、B、C三个,子片段A为第3-5分钟的视频片段、子片段B为第7-8分钟的视频片段、子片段C为第10-12分钟的视频片段,则上述的第二数量个视频子片段对应的区域即第3-5分钟、第7-8分钟及第10-12分钟的视频区域。As an example, if the clipped or completed video sub-segments are three A, B, and C, sub-segment A is the 3-5 minute video clip, sub-segment B is the 7-8 minute video clip, The sub-segment C is a video segment of the 10th-12th minute, and the regions corresponding to the above-mentioned second number of video sub-segments are the video regions of the 3rd-5th minute, the 7th-8th minute, and the 10th-12th minute.
作为示例地,剪辑子片段长度区间t clip∈[2,5],对于时长较长或较短的子片段需要进行裁剪和补全操作: As an example, clip the sub-segment length interval t clip ∈ [2,5], for the sub-segment with longer or shorter duration, it needs to perform clipping and completion operations:
若t clip∈[0,T min),即时长较短,则需要从片段左边或者右边区域进行补全。具体地,选择左边或者右边时长为T min-t clip平均得分较高的片段进行补全,例如图4所示。 If t clip ∈ [0, T min ), the short-term length needs to be completed from the left or right area of the clip. Specifically, a segment with a higher average score of T min -t clip on the left or on the right is selected for completion, as shown in FIG. 4 , for example.
若t clip∈[T max,+∞),即时长较长,则从该子片段中选取得分较高的一个子片段。可以随机从当前区域采样一个子片段,并保证子片段的时长t' clip∈[T min,T max]。 If t clip ∈ [T max ,+∞), that is, the length is longer, select a sub-segment with a higher score from the sub-segment. A sub-segment can be randomly sampled from the current region, and the duration of the sub-segment is guaranteed to be t' clip ∈ [T min , T max ].
在步骤S105中,根据剪辑区域对待剪辑视频进行剪辑,得到剪辑后的视频。In step S105, the video to be edited is edited according to the editing area to obtain the edited video.
在本发明实施例中,根据上述剪辑区域对应的各时间片段的开始时间和结束时间,即可完成视频的剪辑,并通过拼接多个剪辑子片段即可得到剪辑 后的视频。拼接多个剪辑子片段时,可以根据时间、得分、或场景类别等对多个剪辑子片段进行拼接。In the embodiment of the present invention, according to the start time and end time of each time segment corresponding to the above-mentioned editing area, the editing of the video can be completed, and the edited video can be obtained by splicing a plurality of editing sub-segments. When splicing multiple editing sub-clips, you can splicing multiple editing sub-clips according to time, score, or scene category.
作为示例地,剪辑区域包括三个剪辑子片段A、B、C,子片段A为第3-5分钟的视频片段、子片段B为第7-8分钟的视频片段、子片段C为第10-12分钟的视频片段,每个剪辑子片段对应的视频平均分依次为0.6、0.8和0.5、每个剪辑子片段对应的场景类别依次为室外、室内、室外,若根据时间(各视频子片段的开始时间)对该三个剪辑子片段进行拼接,则按照A、B、C的顺序进行拼接;若根据得分(视频平均分由高到底的顺序)对该三个剪辑子片段进行拼接,则按照B、A、C的顺序进行拼接;若按照场景及时间(按照由室外到室内的顺序、同一场景下按时间顺序)对该三个剪辑子片段进行拼接,则按照A、C、B的顺序进行拼接。As an example, the editing area includes three editing sub-segments A, B, and C, where sub-segment A is the video segment from the 3rd to the 5th minute, sub-segment B is the video segment from the 7th to the 8th minute, and sub-segment C is the 10th minute. -12-minute video clips, the average video score corresponding to each clip sub clip is 0.6, 0.8 and 0.5, and the scene categories corresponding to each clip sub clip are outdoor, indoor, outdoor, if according to the time (each video sub clip start time) to splicing the three editing sub-segments, then splicing in the order of A, B, C; Splicing is performed in the order of B, A, and C; if the three editing sub-segments are spliced according to the scene and time (in the order from outdoor to indoor, and in chronological order in the same scene), the three clips are spliced according to the sequence of A, C, and B. Splicing in sequence.
在本发明实施例中,根据预设的视频分段算法对待剪辑视频进行切分,得到第一数量个视频子片段,从预设的维度对待剪辑视频进行分析,得到与维度对应的得分曲线,使用预设的集成学习模型处理所有的所述得分曲线,生成待剪辑视频的综合得分曲线,根据综合得分曲线从第一数量个视频子片段中选取剪辑区域,根据剪辑区域对待剪辑视频进行剪辑,得到剪辑后的视频,从而基于多个维度分析,实现了视频的自动剪辑,并保证了视频剪辑效果。In the embodiment of the present invention, the video to be edited is segmented according to a preset video segmentation algorithm to obtain a first number of video sub-segments, and the video to be edited is analyzed from a preset dimension to obtain a score curve corresponding to the dimension, Use a preset integrated learning model to process all the score curves, generate a comprehensive score curve of the video to be edited, select the editing area from the first number of video sub-segments according to the comprehensive score curve, and edit the video to be edited according to the editing area, The edited video is obtained, so as to realize the automatic editing of the video based on the analysis of multiple dimensions, and ensure the video editing effect.
实施例二:Embodiment 2:
图5-6示出了本发明实施例二提供的视频剪辑系统的结构,为了便于说明,仅示出了与本发明实施例相关的部分,其中包括:5-6 show the structure of the video editing system provided by the second embodiment of the present invention. For the convenience of description, only the parts related to the embodiment of the present invention are shown, including:
视频分段模块51,用于根据预设的视频分段算法对待剪辑视频进行切分,得到第一数量个视频子片段;The video segmentation module 51 is used for segmenting the video to be edited according to a preset video segmentation algorithm to obtain a first number of video sub-segments;
维度分析模块52,用于从预设的维度对待剪辑视频进行分析,得到与维度对应的得分曲线;A dimension analysis module 52, configured to analyze the video to be edited from a preset dimension to obtain a score curve corresponding to the dimension;
集成学习模块53,用于使用预设的集成学习模型处理所有的所述得分曲线,生成待剪辑视频的综合得分曲线;The integrated learning module 53 is used to process all the score curves using a preset integrated learning model, and generate a comprehensive score curve of the video to be edited;
片段搜索模块54,用于根据综合得分曲线从第一数量个视频子片段中选取剪辑区域;以及a segment search module 54 for selecting clip regions from the first number of video sub-segments according to the composite score curve; and
剪辑模块55,用于根据剪辑区域对待剪辑视频进行剪辑,得到剪辑后的视频。The editing module 55 is used for editing the video to be edited according to the editing area to obtain the edited video.
优选地,维度分析模块52包括:Preferably, the dimension analysis module 52 includes:
摘要分析模块521,用于通过计算每个视频子片段的摘要得分,生成待剪辑视频在摘要维度上的得分曲线;和/或The digest analysis module 521 is configured to generate a score curve of the video to be edited in the digest dimension by calculating the digest score of each video sub-segment; and/or
遮挡分析模块522,用于通过遮挡分析模型分析每个视频帧的遮挡概率,生成待剪辑视频在遮挡维度上的得分曲线;和/或An occlusion analysis module 522, configured to analyze the occlusion probability of each video frame through an occlusion analysis model, and generate a score curve of the video to be edited in the occlusion dimension; and/or
美学分析模块523,用于通过美学评价模型分析每个视频帧的美学得分,生成待剪辑视频在美学维度上的得分曲线;和/或 Aesthetic analysis module 523, for analyzing the aesthetic score of each video frame through the aesthetic evaluation model, and generating the score curve of the video to be edited in the aesthetic dimension; and/or
拍摄习惯分析模块524,用于使用预设的拍摄习惯分析模型,生成待剪辑视频在拍摄习惯维度上的得分曲线;和/或A shooting habit analysis module 524, configured to use a preset shooting habit analysis model to generate a score curve of the video to be edited in the shooting habit dimension; and/or
图像熵分析模块525,用于通过统计每个视频帧的信息量,生成待剪辑视频在图像熵维度上的得分曲线;和/或The image entropy analysis module 525 is used to generate a score curve of the video to be edited in the image entropy dimension by counting the amount of information of each video frame; and/or
纹理分析模块526,用于通过分析每个视频帧的纹理特征,生成待剪辑视频在纹理维度上的得分曲线。The texture analysis module 526 is configured to generate a score curve of the video to be edited in the texture dimension by analyzing the texture features of each video frame.
优选地,将预设的距离函数的函数值作为视频子片段的摘要得分。Preferably, the function value of the preset distance function is used as the summary score of the video sub-segment.
优选地,预设的距离函数为相关系数距离函数,视频子片段的摘要得分:
Figure PCTCN2021138917-appb-000017
其中,s i表示第i个视频子片段摘要得分,v i、v j分别表示第i、j个视频子片段,N为第一数量。
Preferably, the preset distance function is a correlation coefficient distance function, and the summary score of the video sub-segment:
Figure PCTCN2021138917-appb-000017
Among them, s i represents the summary score of the ith video sub-segment, vi and v j represent the ith and j th video sub-segments, respectively, and N is the first number.
优选地,拍摄习惯分析模型定义为:
Figure PCTCN2021138917-appb-000018
定义为
Figure PCTCN2021138917-appb-000019
其中,
Figure PCTCN2021138917-appb-000020
为超参数,用于调整高斯分布概率密度的权值参数,μ表示高斯分布的均值,σ表示高斯分布的标准差,υ duration表示待剪辑视频的时长,β为超参数,用于调 整高斯分布概率密度的方差。
Preferably, the shooting habit analysis model is defined as:
Figure PCTCN2021138917-appb-000018
defined as
Figure PCTCN2021138917-appb-000019
in,
Figure PCTCN2021138917-appb-000020
is a hyperparameter used to adjust the weight parameter of the probability density of the Gaussian distribution, μ represents the mean of the Gaussian distribution, σ represents the standard deviation of the Gaussian distribution, υ duration represents the duration of the video to be edited, and β is a hyperparameter used to adjust the Gaussian distribution The variance of the probability density.
优选地,集成学习模型采用单层线性稠密连接层,综合得分曲线如下:Preferably, the ensemble learning model adopts a single-layer linear dense connection layer, and the comprehensive score curve is as follows:
Figure PCTCN2021138917-appb-000021
Figure PCTCN2021138917-appb-000021
其中,M为维度集合,w k和b k为稠密连接层的参数,用于加权各个维度对应的标准化后的得分曲线向量的权值,s' k为各个维度对应的得分曲线标准化后的得分曲线向量,s为综合得分曲线向量。 Among them, M is the dimension set, w k and b k are the parameters of the dense connection layer, which are used to weight the weight of the standardized score curve vector corresponding to each dimension, and s' k is the standardized score of the score curve corresponding to each dimension Curve vector, s is the comprehensive score curve vector.
在本发明实施例中,视频剪辑系统的各模块可由相应的硬件或软件单元实现,各单元可以为独立的软、硬件单元,也可以集成为一个软、硬件单元,在此不用以限制本发明。视频剪辑系统的各模块的具体实施方式可参考前述方法实施例的描述,在此不再赘述。In the embodiment of the present invention, each module of the video editing system may be implemented by corresponding hardware or software units, and each unit may be an independent software and hardware unit, or may be integrated into a software and hardware unit, which is not intended to limit the present invention. . For the specific implementation of each module of the video editing system, reference may be made to the description of the foregoing method embodiments, which will not be repeated here.
实施例三:Embodiment three:
图7示出了本发明实施例三提供的电子设备的结构,为了便于说明,仅示出了与本发明实施例相关的部分。FIG. 7 shows the structure of the electronic device provided by the third embodiment of the present invention. For convenience of description, only the part related to the embodiment of the present invention is shown.
本发明实施例的电子设备7包括处理器70、存储器71以及存储在存储器71中并可在处理器70上运行的计算机程序72。该处理器70执行计算机程序72时实现上述各方法实施例中的步骤,例如图1所示的步骤S101至S105。或者,处理器70执行计算机程序72时实现上述各装置实施例中各单元的功能,例如图5所示单元51至55的功能。The electronic device 7 of the embodiment of the present invention includes a processor 70 , a memory 71 , and a computer program 72 stored in the memory 71 and executable on the processor 70 . When the processor 70 executes the computer program 72, the steps in the foregoing method embodiments are implemented, for example, steps S101 to S105 shown in FIG. 1 . Alternatively, when the processor 70 executes the computer program 72, the functions of the units in the above-mentioned apparatus embodiments, for example, the functions of the units 51 to 55 shown in FIG. 5 are realized.
在本发明实施例中,根据预设的视频分段算法对待剪辑视频进行切分,得到第一数量个视频子片段,从预设的维度对待剪辑视频进行分析,得到与维度对应的得分曲线,使用预设的集成学习模型处理所有的所述得分曲线,生成待剪辑视频的综合得分曲线,根据综合得分曲线从第一数量个视频子片段中选取剪辑区域,根据剪辑区域对待剪辑视频进行剪辑,得到剪辑后的视频,从而基于多个维度分析,实现了视频的自动剪辑,提高了剪辑效率,并保证了视频剪辑效果。In the embodiment of the present invention, the video to be edited is segmented according to a preset video segmentation algorithm to obtain a first number of video sub-segments, and the video to be edited is analyzed from a preset dimension to obtain a score curve corresponding to the dimension, Use a preset integrated learning model to process all the score curves, generate a comprehensive score curve of the video to be edited, select the editing area from the first number of video sub-segments according to the comprehensive score curve, and edit the video to be edited according to the editing area, The edited video is obtained, which realizes automatic video editing based on multiple dimension analysis, improves the editing efficiency, and ensures the video editing effect.
实施例四:Embodiment 4:
在本发明实施例中,提供了一种计算机可读存储介质,该计算机可读存储介质存储有计算机程序,该计算机程序被处理器执行时实现上述方法实施例中的步骤,例如,图1所示的步骤S101至S105。或者,该计算机程序被处理器执行时实现上述各装置实施例中各单元的功能,例如图5所示单元51至55的功能。In an embodiment of the present invention, a computer-readable storage medium is provided, where a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, the steps in the foregoing method embodiments are implemented, for example, as shown in FIG. 1 . Steps S101 to S105 shown. Alternatively, when the computer program is executed by the processor, the functions of the units in the above-mentioned apparatus embodiments, for example, the functions of the units 51 to 55 shown in FIG. 5 , are implemented.
在本发明实施例中,根据预设的视频分段算法对待剪辑视频进行切分,得到第一数量个视频子片段,从预设的维度对待剪辑视频进行分析,得到与维度对应的得分曲线,使用预设的集成学习模型处理所有的所述得分曲线,生成待剪辑视频的综合得分曲线,根据综合得分曲线从第一数量个视频子片段中选取剪辑区域,根据剪辑区域对待剪辑视频进行剪辑,得到剪辑后的视频,从而基于多个维度分析,实现了视频的自动剪辑,提高了剪辑效率,并保证了视频剪辑效果。In the embodiment of the present invention, the video to be edited is segmented according to a preset video segmentation algorithm to obtain a first number of video sub-segments, and the video to be edited is analyzed from a preset dimension to obtain a score curve corresponding to the dimension, Use a preset integrated learning model to process all the score curves, generate a comprehensive score curve of the video to be edited, select the editing area from the first number of video sub-segments according to the comprehensive score curve, and edit the video to be edited according to the editing area, The edited video is obtained, which realizes automatic video editing based on multiple dimension analysis, improves the editing efficiency, and ensures the video editing effect.
本发明实施例的计算机可读存储介质可以包括能够携带计算机程序代码的任何实体或装置、记录介质,例如,ROM/RAM、磁盘、光盘、闪存等存储器。The computer-readable storage medium of the embodiments of the present invention may include any entity or device capable of carrying computer program codes, recording medium, for example, memory such as ROM/RAM, magnetic disk, optical disk, flash memory, and the like.
以上所述仅为本发明的较佳实施例而已,并不用以限制本发明,凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等,均应包含在本发明的保护范围之内。The above descriptions are only preferred embodiments of the present invention and are not intended to limit the present invention. Any modifications, equivalent replacements and improvements made within the spirit and principles of the present invention shall be included in the protection of the present invention. within the range.

Claims (14)

  1. 一种视频剪辑方法,其特征在于,所述方法包括下述步骤:A video editing method, characterized in that the method comprises the following steps:
    根据预设的视频分段算法对待剪辑视频进行切分,得到第一数量个视频子片段;Segment the video to be edited according to the preset video segmentation algorithm to obtain the first number of video sub-segments;
    从预设的维度对所述待剪辑视频进行分析,得到与所述维度对应的得分曲线;Analyze the video to be edited from a preset dimension to obtain a score curve corresponding to the dimension;
    使用预设的集成学习模型处理所有的所述得分曲线,生成所述待剪辑视频的综合得分曲线;Use the preset integrated learning model to process all the score curves, and generate the comprehensive score curve of the video to be edited;
    根据所述综合得分曲线从所述第一数量个视频子片段中选取剪辑区域;Selecting a clipping region from the first number of video sub-segments according to the comprehensive score curve;
    根据所述剪辑区域对所述待剪辑视频进行剪辑,得到剪辑后的视频。Edit the video to be edited according to the editing area to obtain the edited video.
  2. 如权利要求1所述的方法,其特征在于,所述从预设的维度对所述待剪辑视频进行分析,得到与所述维度对应的得分曲线的步骤,包括:The method of claim 1, wherein the step of analyzing the video to be edited from a preset dimension to obtain a score curve corresponding to the dimension comprises:
    通过计算每个视频子片段的摘要得分,生成所述待剪辑视频在摘要维度上的得分曲线;和/或By calculating the summary score of each video sub-segment, a score curve of the video to be edited in the summary dimension is generated; and/or
    通过遮挡分析模型分析每个视频帧的遮挡概率,生成所述待剪辑视频在遮挡维度上的得分曲线;和/或Analyze the occlusion probability of each video frame through an occlusion analysis model, and generate a score curve of the video to be edited in the occlusion dimension; and/or
    通过美学评价模型分析每个视频帧的美学得分,生成所述待剪辑视频在美学维度上的得分曲线;和/或Analyze the aesthetic score of each video frame through an aesthetic evaluation model, and generate a score curve of the video to be edited in the aesthetic dimension; and/or
    使用预设的拍摄习惯分析模型,生成所述待剪辑视频在拍摄习惯维度上的得分曲线;和/或Use a preset shooting habit analysis model to generate a score curve of the video to be edited in the shooting habit dimension; and/or
    通过统计每个视频帧的信息量,生成所述待剪辑视频在图像熵维度上的得分曲线;和/或Generate a score curve of the video to be edited in the image entropy dimension by counting the amount of information of each video frame; and/or
    通过分析每个视频帧的纹理特征,生成所述待剪辑视频在纹理维度上的得分曲线。By analyzing the texture features of each video frame, a score curve of the video to be edited in the texture dimension is generated.
  3. 如权利要求2所述的方法,其特征在于,将预设的距离函数的函数值作为所述视频子片段的摘要得分。The method according to claim 2, wherein the function value of the preset distance function is used as the summary score of the video sub-segment.
  4. 如权利要求3所述的方法,其特征在于,所述预设的距离函数为相关系 数距离函数,所述视频子片段的摘要得分:
    Figure PCTCN2021138917-appb-100001
    其中,s i表示第i个视频子片段摘要得分,v i、v j分别表示第i、j个视频子片段,N为所述第一数量。
    The method of claim 3, wherein the preset distance function is a correlation coefficient distance function, and the summary score of the video sub-segment:
    Figure PCTCN2021138917-appb-100001
    Wherein, s i represents the summary score of the ith video sub-segment, v i and v j represent the ith and j th video sub-segments, respectively, and N is the first number.
  5. 如权利要求2所述的方法,其特征在于,所述遮挡分析模型的训练过程包括:The method of claim 2, wherein the training process of the occlusion analysis model comprises:
    构建遮挡训练样本集,所述遮挡训练样本集中包含干净样本集和遮挡样本集,所述干净样本不包含遮挡物,所述遮挡样本至少包含一类遮挡物;constructing an occlusion training sample set, the occlusion training sample set includes a clean sample set and an occlusion sample set, the clean samples do not include occlusion objects, and the occlusion samples include at least one type of occlusion objects;
    将遮挡训练样本输入所述遮挡分析模型,计算输出的遮挡概率与真实遮挡概率的交叉熵损失值,并通过梯度下降算法优化所述遮挡分析模型的参数,直至所述遮挡分析模型的遮挡检测准确率达到预设值。Input the occlusion training samples into the occlusion analysis model, calculate the cross-entropy loss value of the output occlusion probability and the true occlusion probability, and optimize the parameters of the occlusion analysis model through the gradient descent algorithm until the occlusion detection of the occlusion analysis model is accurate. rate reaches the preset value.
  6. 如权利要求2所述的方法,其特征在于,所述美学评价模型的训练过程包括:The method of claim 2, wherein the training process of the aesthetic evaluation model comprises:
    构建美学训练样本集,每个美学训练样本标注的真实美学得分为多个用户基于预设的美学角度评分的均值;Construct an aesthetic training sample set, and the true aesthetic score marked by each aesthetic training sample is the average of multiple users' scores based on preset aesthetic angles;
    使用所述美学训练样本集对所述美学评价模型进行训练,得到训练好的美学评价模型。The aesthetic evaluation model is trained using the aesthetic training sample set to obtain a trained aesthetic evaluation model.
  7. 如权利要求2所述的方法,其特征在于,所述拍摄习惯分析模型定义为:
    Figure PCTCN2021138917-appb-100002
    定义为
    Figure PCTCN2021138917-appb-100003
    Figure PCTCN2021138917-appb-100004
    其中,
    Figure PCTCN2021138917-appb-100005
    为超参数,用于调整高斯分布概率密度的权值参数,μ表示高斯分布的均值,σ表示高斯分布的标准差,υ duration表示所述待剪辑视频的时长,β为超参数,用于调整高斯分布概率密度的方差。
    The method of claim 2, wherein the shooting habit analysis model is defined as:
    Figure PCTCN2021138917-appb-100002
    defined as
    Figure PCTCN2021138917-appb-100003
    Figure PCTCN2021138917-appb-100004
    in,
    Figure PCTCN2021138917-appb-100005
    is a hyperparameter, used to adjust the weight parameter of the probability density of the Gaussian distribution, μ represents the mean value of the Gaussian distribution, σ represents the standard deviation of the Gaussian distribution, υ duration represents the duration of the video to be edited, and β is a hyperparameter used to adjust The variance of the probability density of a Gaussian distribution.
  8. 如权利要求1所述的方法,其特征在于,所述集成学习模型采用单层线性稠密连接层,所述综合得分曲线如下:The method of claim 1, wherein the ensemble learning model adopts a single-layer linear dense connection layer, and the comprehensive score curve is as follows:
    Figure PCTCN2021138917-appb-100006
    Figure PCTCN2021138917-appb-100006
    其中,M为维度集合,w k和b k为稠密连接层的参数,用于加权各个维度对应的标准化后的得分曲线向量的权值,s′ k为各个维度对应的得分曲线标准化后的得分曲线向量,s为综合得分曲线向量。 Among them, M is the dimension set, w k and b k are the parameters of the dense connection layer, which are used to weight the weight of the standardized score curve vector corresponding to each dimension, and s′ k is the standardized score of the score curve corresponding to each dimension Curve vector, s is the comprehensive score curve vector.
  9. 如权利要求1所述的方法,其特征在于,所述根据所述综合得分曲线从所述第一数量个视频子片段中选取剪辑区域的步骤,包括:The method of claim 1, wherein the step of selecting a clipping region from the first number of video sub-segments according to the comprehensive score curve comprises:
    根据所述综合得分曲线计算每个所述视频子片段的视频平均分;Calculate the average video score of each of the video sub-segments according to the comprehensive score curve;
    根据所述视频平均分依照由高到低的顺序对所述视频子片段进行排序,根据排序结果选取第二数量个视频子片段,从所述第二数量个视频子片段中选取所述剪辑区域。Sort the video sub-segments in descending order according to the average video score, select a second number of video sub-segments according to the sorting result, and select the editing region from the second number of video sub-segments .
  10. 如权利要求9所述的方法,其特征在于,所述根据排序结果选取第二数量个视频子片段,从所述第二数量个视频子片段中选取所述剪辑区域的步骤,包括:The method of claim 9, wherein the step of selecting a second number of video sub-segments according to the sorting result, and selecting the editing region from the second number of video sub-segments, comprises:
    对所述第二数量个视频子片段中视频长度不属于预设的视频长度区间的视频子片段进行裁剪或补全操作,得到操作后的第二数量个视频子片段,将所述操作后的第二数量个视频子片段对应的区域作为所述剪辑区域。The clipping or complementing operation is performed on the video sub-segments whose video lengths do not belong to the preset video length interval in the second number of video sub-segments, the second number of video sub-segments after the operation are obtained, and the The region corresponding to the second number of video sub-segments is used as the clipping region.
  11. 一种视频剪辑系统,其特征在于,所述系统包括:A video editing system, characterized in that the system comprises:
    视频分段模块,用于根据预设的视频分段算法对待剪辑视频进行切分,得到第一数量个视频子片段;a video segmentation module, used for segmenting the video to be edited according to a preset video segmentation algorithm to obtain a first number of video sub-segments;
    维度分析模块,用于从预设的维度对所述待剪辑视频进行分析,得到与所述维度对应的得分曲线;a dimension analysis module, configured to analyze the video to be edited from a preset dimension to obtain a score curve corresponding to the dimension;
    集成学习模块,用于使用预设的集成学习模型处理所有的所述得分曲线,生成所述待剪辑视频的综合得分曲线;An integrated learning module for processing all the score curves using a preset integrated learning model to generate a comprehensive score curve of the video to be edited;
    片段搜索模块,用于根据所述综合得分曲线从所述第一数量个视频子片段中选取剪辑区域;以及a segment search module for selecting clip regions from the first number of video sub-segments according to the comprehensive score curve; and
    剪辑模块,用于根据所述剪辑区域对所述待剪辑视频进行剪辑,得到剪辑后的视频。The editing module is used for editing the video to be edited according to the editing area to obtain the edited video.
  12. 如权利要求11所述的系统,其特征在于,所述维度分析模块包括:The system of claim 11, wherein the dimensional analysis module comprises:
    摘要分析模块,用于通过计算每个视频子片段的摘要得分,生成所述待剪辑视频在摘要维度上的得分曲线;和/或A digest analysis module, configured to generate a score curve of the video to be edited in the digest dimension by calculating the digest score of each video sub-segment; and/or
    遮挡分析模块,用于通过遮挡分析模型分析每个视频帧的遮挡概率,生成所述待剪辑视频在遮挡维度上的得分曲线;和/或An occlusion analysis module, configured to analyze the occlusion probability of each video frame through an occlusion analysis model, and generate a score curve of the video to be edited in the occlusion dimension; and/or
    美学分析模块,用于通过美学评价模型分析每个视频帧的美学得分,生成所述待剪辑视频在美学维度上的得分曲线;和/或An aesthetic analysis module for analyzing the aesthetic score of each video frame through an aesthetic evaluation model, and generating a score curve of the video to be edited in the aesthetic dimension; and/or
    拍摄习惯分析模块,用于使用预设的拍摄习惯分析模型,生成所述待剪辑视频在拍摄习惯维度上的得分曲线;和/或A shooting habit analysis module, configured to use a preset shooting habit analysis model to generate a score curve of the video to be edited in the shooting habit dimension; and/or
    图像熵分析模块,用于通过统计每个视频帧的信息量,生成所述待剪辑视频在图像熵维度上的得分曲线;和/或An image entropy analysis module for generating a score curve of the video to be edited on the image entropy dimension by counting the amount of information of each video frame; and/or
    纹理分析模块,用于通过分析每个视频帧的纹理特征,生成所述待剪辑视频在纹理维度上的得分曲线。The texture analysis module is configured to generate a score curve of the video to be edited in the texture dimension by analyzing the texture features of each video frame.
  13. 一种电子设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现如权利要求1至10任一项所述方法的步骤。An electronic device, comprising a memory, a processor, and a computer program stored in the memory and running on the processor, characterized in that, when the processor executes the computer program, the computer program according to claim 1 to 10. The steps of any one of the methods.
  14. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1至10任一项所述方法的步骤。A computer-readable storage medium storing a computer program, characterized in that, when the computer program is executed by a processor, the steps of the method according to any one of claims 1 to 10 are implemented.
PCT/CN2021/138917 2020-12-16 2021-12-16 Video editing method and system, electronic device, and storage medium WO2022127877A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011489310.6 2020-12-16
CN202011489310.6A CN112770061A (en) 2020-12-16 2020-12-16 Video editing method, system, electronic device and storage medium

Publications (1)

Publication Number Publication Date
WO2022127877A1 true WO2022127877A1 (en) 2022-06-23

Family

ID=75695010

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/138917 WO2022127877A1 (en) 2020-12-16 2021-12-16 Video editing method and system, electronic device, and storage medium

Country Status (2)

Country Link
CN (1) CN112770061A (en)
WO (1) WO2022127877A1 (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112770061A (en) * 2020-12-16 2021-05-07 影石创新科技股份有限公司 Video editing method, system, electronic device and storage medium
CN113411666A (en) * 2021-06-18 2021-09-17 影石创新科技股份有限公司 Automatic clipping method, apparatus, camera, and computer-readable storage medium
CN113301430B (en) * 2021-07-27 2021-12-07 腾讯科技(深圳)有限公司 Video clipping method, video clipping device, electronic equipment and storage medium
CN115734032A (en) * 2021-08-31 2023-03-03 华为技术有限公司 Video editing method, electronic device and storage medium
CN113992975B (en) * 2021-10-13 2023-10-17 咪咕视讯科技有限公司 Video playing method, device, equipment and computer storage medium
CN116366785A (en) * 2021-12-22 2023-06-30 华为技术有限公司 Video generation system, method and related device

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030234803A1 (en) * 2002-06-19 2003-12-25 Kentaro Toyama System and method for automatically generating video cliplets from digital video
CN108288475A (en) * 2018-02-12 2018-07-17 成都睿码科技有限责任公司 A kind of sports video collection of choice specimens clipping method based on deep learning
CN110166827A (en) * 2018-11-27 2019-08-23 深圳市腾讯信息技术有限公司 Determination method, apparatus, storage medium and the electronic device of video clip
CN110798735A (en) * 2019-08-28 2020-02-14 腾讯科技(深圳)有限公司 Video processing method and device and electronic equipment
CN111698575A (en) * 2020-06-19 2020-09-22 广州华多网络科技有限公司 Live highlight video editing method, device, equipment and storage medium
CN112532897A (en) * 2020-11-25 2021-03-19 腾讯科技(深圳)有限公司 Video clipping method, device, equipment and computer readable storage medium
CN112770061A (en) * 2020-12-16 2021-05-07 影石创新科技股份有限公司 Video editing method, system, electronic device and storage medium
CN113709560A (en) * 2021-03-31 2021-11-26 腾讯科技(深圳)有限公司 Video editing method, device, equipment and storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108234995B (en) * 2016-12-21 2019-08-13 中国移动通信有限公司研究院 A kind of method and testing service device of video quality evaluation and test
CN109819338B (en) * 2019-02-22 2021-09-14 影石创新科技股份有限公司 Automatic video editing method and device and portable terminal
CN110996169B (en) * 2019-07-12 2022-03-01 北京达佳互联信息技术有限公司 Method, device, electronic equipment and computer-readable storage medium for clipping video

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030234803A1 (en) * 2002-06-19 2003-12-25 Kentaro Toyama System and method for automatically generating video cliplets from digital video
CN108288475A (en) * 2018-02-12 2018-07-17 成都睿码科技有限责任公司 A kind of sports video collection of choice specimens clipping method based on deep learning
CN110166827A (en) * 2018-11-27 2019-08-23 深圳市腾讯信息技术有限公司 Determination method, apparatus, storage medium and the electronic device of video clip
CN110798735A (en) * 2019-08-28 2020-02-14 腾讯科技(深圳)有限公司 Video processing method and device and electronic equipment
CN111698575A (en) * 2020-06-19 2020-09-22 广州华多网络科技有限公司 Live highlight video editing method, device, equipment and storage medium
CN112532897A (en) * 2020-11-25 2021-03-19 腾讯科技(深圳)有限公司 Video clipping method, device, equipment and computer readable storage medium
CN112770061A (en) * 2020-12-16 2021-05-07 影石创新科技股份有限公司 Video editing method, system, electronic device and storage medium
CN113709560A (en) * 2021-03-31 2021-11-26 腾讯科技(深圳)有限公司 Video editing method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN112770061A (en) 2021-05-07

Similar Documents

Publication Publication Date Title
WO2022127877A1 (en) Video editing method and system, electronic device, and storage medium
Yang et al. LRW-1000: A naturally-distributed large-scale benchmark for lip reading in the wild
Chen et al. Localizing visual sounds the hard way
CN110189378B (en) Video processing method and device and electronic equipment
US11321385B2 (en) Visualization of image themes based on image content
CN110263659B (en) Finger vein recognition method and system based on triplet loss and lightweight network
JP4381310B2 (en) Media processing system
CN108804578B (en) Unsupervised video abstraction method based on consistency segment generation
KR101731771B1 (en) Automated selection of keeper images from a burst photo captured set
US8879788B2 (en) Video processing apparatus, method and system
JP5355422B2 (en) Method and system for video indexing and video synopsis
KR101615254B1 (en) Detecting facial expressions in digital images
JP5036580B2 (en) How to adapt the boost classifier to new samples
CN109918539B (en) Audio and video mutual retrieval method based on user click behavior
Zhu et al. Targeting accurate object extraction from an image: A comprehensive study of natural image matting
US20040088723A1 (en) Systems and methods for generating a video summary
US7904815B2 (en) Content-based dynamic photo-to-video methods and apparatuses
JP2006508463A (en) Face detection
JP2006508461A (en) Face detection and face tracking
JP2006508601A5 (en)
JP2004199669A (en) Face detection
JP2006508601A (en) Video camera
JP2006508462A (en) Face detection
CN102750964A (en) Method and device used for controlling background music and based on facial expression
CN111160134A (en) Human-subject video scene analysis method and device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21905803

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21905803

Country of ref document: EP

Kind code of ref document: A1