WO2023160515A1 - 视频处理方法、装置、设备及介质 - Google Patents
视频处理方法、装置、设备及介质 Download PDFInfo
- Publication number
- WO2023160515A1 WO2023160515A1 PCT/CN2023/077309 CN2023077309W WO2023160515A1 WO 2023160515 A1 WO2023160515 A1 WO 2023160515A1 CN 2023077309 W CN2023077309 W CN 2023077309W WO 2023160515 A1 WO2023160515 A1 WO 2023160515A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- video
- target
- recommended
- original
- audio
- Prior art date
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 50
- 239000000463 material Substances 0.000 claims abstract description 249
- 238000012545 processing Methods 0.000 claims abstract description 84
- 238000000034 method Methods 0.000 claims abstract description 31
- 238000004458 analytical method Methods 0.000 claims abstract description 24
- 238000004590 computer program Methods 0.000 claims description 19
- 238000001228 spectrum Methods 0.000 claims description 19
- 238000000605 extraction Methods 0.000 claims description 16
- 238000004364 calculation method Methods 0.000 claims description 11
- 238000001514 detection method Methods 0.000 claims description 8
- 230000000694 effects Effects 0.000 description 22
- 238000010586 diagram Methods 0.000 description 19
- 230000008569 process Effects 0.000 description 14
- 241000989913 Gunnera petaloidea Species 0.000 description 9
- 230000006870 function Effects 0.000 description 8
- 238000004891 communication Methods 0.000 description 7
- 238000004880 explosion Methods 0.000 description 7
- 230000003287 optical effect Effects 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 4
- 101100060194 Caenorhabditis elegans clip-1 gene Proteins 0.000 description 3
- 238000003058 natural language processing Methods 0.000 description 3
- 238000005034 decoration Methods 0.000 description 2
- 238000013136 deep learning model Methods 0.000 description 2
- 239000013307 optical fiber Substances 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 230000003595 spectral effect Effects 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 230000007704 transition Effects 0.000 description 2
- 241001465754 Metazoa Species 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000000903 blocking effect Effects 0.000 description 1
- 238000004883 computer application Methods 0.000 description 1
- 230000008451 emotion Effects 0.000 description 1
- 230000002996 emotional effect Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000001151 other effect Effects 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
- 230000001755 vocal effect Effects 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/45—Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
- H04N21/466—Learning process for intelligent management, e.g. learning user preferences for recommending movies
- H04N21/4668—Learning process for intelligent management, e.g. learning user preferences for recommending movies for recommending content, e.g. movies
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/57—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for processing of video signals
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/233—Processing of audio elementary streams
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/234—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/439—Processing of audio elementary streams
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/439—Processing of audio elementary streams
- H04N21/4394—Processing of audio elementary streams involving operations for analysing the audio stream, e.g. detecting features or characteristics in audio streams
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
- H04N21/44008—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
- H04N21/44016—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for substituting a video clip
Definitions
- the present disclosure relates to the technical field of computer applications, and in particular to a video processing method, device, device and medium.
- the user selects the appropriate material to add to the video for decoration through the video editing software.
- the process of selecting and adding one by one increases the time cost and reduces the processing efficiency.
- relevant video editing software has launched video templates or one-click video decoration solutions, which can insert the captured video or picture into the selected video template, and automatically edit a beautified video with template effects.
- a video processing method comprising: extracting video content features based on original video analysis; acquiring at least one recommended material matching the video content features; The material performs video processing on the original video to generate a target video, wherein the target video is a video generated by adding the recommended material to the original video.
- a video processing device comprising: an extraction module, configured to extract features of video content based on an analysis of the original video; an acquisition module, configured to acquire features related to the video content At least one recommended material with matching features; a processing module, configured to perform video processing on the original video according to the recommended material to generate a target video, wherein the target video is generated after adding the recommended material to the original video video.
- an electronic device includes: a processor; a memory for storing instructions executable by the processor; Read the executable instruction from the computer, and execute the instruction to implement the video processing method provided by any embodiment of the present disclosure.
- a computer-readable storage medium stores a computer program, and the computer program is used to execute the video processing method provided in any embodiment of the present disclosure.
- a computer program is also provided, the computer program includes instructions, and when the instructions are executed by a processor, the processor is enabled to implement the video processing method provided in any embodiment of the present disclosure.
- FIG. 1 is a schematic flowchart of a video processing method provided by an embodiment of the present disclosure
- FIG. 2 is a schematic flowchart of another video processing method provided by an embodiment of the present disclosure
- FIG. 3 is a schematic diagram of a video processing scene provided by an embodiment of the present disclosure.
- FIG. 4 is a schematic diagram of another video processing scenario provided by an embodiment of the present disclosure.
- FIG. 5 is a schematic diagram of another video processing scenario provided by an embodiment of the present disclosure.
- FIG. 6 is a schematic diagram of another video processing scenario provided by an embodiment of the present disclosure.
- FIG. 7 is a schematic flowchart of another video processing method provided by an embodiment of the present disclosure.
- FIG. 8 is a schematic diagram of another video processing scenario provided by an embodiment of the present disclosure.
- FIG. 9 is a schematic diagram of another video processing scenario provided by an embodiment of the present disclosure.
- FIG. 10 is a schematic flowchart of another video processing method provided by an embodiment of the present disclosure.
- FIG. 11 is a schematic diagram of another video processing scenario provided by an embodiment of the present disclosure.
- FIG. 12 is a schematic flowchart of another video processing method provided by an embodiment of the present disclosure.
- FIG. 13 is a schematic diagram of another video processing scenario provided by an embodiment of the present disclosure.
- FIG. 14 is a schematic diagram of another video processing scenario provided by an embodiment of the present disclosure.
- FIG. 15 is a schematic flowchart of another video processing method provided by an embodiment of the present disclosure.
- FIG. 16 is a schematic diagram of another video processing scenario provided by an embodiment of the present disclosure.
- FIG. 17 is a schematic structural diagram of a video processing device provided by an embodiment of the present disclosure.
- Fig. 18 is a schematic structural diagram of an electronic device provided by an embodiment of the present disclosure.
- the term “comprise” and its variations are open-ended, ie “including but not limited to”.
- the term “based on” is “based at least in part on”.
- the term “one embodiment” means “at least one embodiment”; the term “another embodiment” means “at least one further embodiment”; the term “some embodiments” means “at least some embodiments.”
- Relevant definitions of other terms will be given in the description below. It should be noted that concepts such as “first” and “second” mentioned in this disclosure are only used to distinguish different devices, modules or units, and are not used to limit the sequence of functions performed by these devices, modules or units or interdependence.
- the embodiment of the present disclosure provides a video processing method, in this method, Based on the content of the video, the material related to effect processing is recommended, so that the video processed according to the recommended material has a high degree of matching between the processing effect and the video content, and the processing effect of each video with different content is significantly different, with "Thousands The processing effect of frequency and face can meet the individual needs of video processing effect.
- FIG. 1 is a schematic flow chart of a video processing method provided by an embodiment of the present disclosure.
- the method can be executed by a video processing device, where the device can be implemented by software and/or hardware, and generally can be integrated into an electronic device. As shown in FIG. 1, the method includes: steps S102-S106.
- step 101 video content features are extracted based on the analysis of the original video.
- video effect processing is performed in order to adapt to the personalized characteristics of video content, based on the analysis of the original video, video content features are extracted.
- the original video is an uploaded video to be effected.
- the video content features include but not It is limited to one or more of the audio features of the video, the text features of the video, the image features of the video, the filter features of the video, and the features of the subjects included in the video.
- step 102 at least one recommended material matching the feature of the video content is acquired.
- At least one recommended material matching the feature of the video content is acquired, and the recommended material includes but not limited to one or more of audio material, texture material, animation material, filter material, and the like.
- the manner of acquiring at least one recommended material matching the feature of the video content may vary according to different scenarios. The specific acquisition manner may be illustrated in subsequent embodiments and will not be repeated here.
- step 103 video processing is performed on the original video according to the recommended material to generate a target video, wherein the target video is a video generated by adding the recommended material to the original video.
- each material has a corresponding adding track, therefore, the corresponding material can be added based on the track of the corresponding material.
- the track of each material is defined by its corresponding field name, type, and description information.
- Table 1 is an example of a track of a material.
- each recommended material also includes corresponding parameters, so as to further facilitate some personalized adjustments to the display effect when the material is added, for example, in subsequent embodiments, after determining the area of the material, the size of the material is adjusted.
- the parameters of the text_template material shown in Table 2 below may include scaling factors, rotation angles, and the like.
- the video processing method of the embodiment of the present disclosure after extracting the video content features of the original video, at least one recommended material matching the video content features is obtained, and further, the target video is obtained after adding the recommended material to the original video.
- the video content of the adapted video is added to the video material, which improves the matching degree between the video content and the video material, and realizes the personalized effect processing of the video.
- extraction is performed based on the text content of the original video.
- extracting video content features includes steps S201 - S202 .
- step 201 speech recognition processing is performed on the target audio data of the original video to obtain corresponding text content.
- the pre-set video editing application can also identify each audio track contained in the original video, wherein each audio A track corresponds to a sound source. For example, for the original video A, which contains the speaking voices of users a and b, in this embodiment, the audio track corresponding to the voice of a and the audio track corresponding to the voice of b can be identified.
- all audio tracks displayed in the video editing application of the video file of the original video are obtained. It is easy to understand that the audio source corresponding to each audio track has an occurrence time, therefore, in some embodiments, the corresponding audio track is also displayed based on the time axis.
- the audio file of the original video is split into a video track video and two audio tracks audio1 and audio2, the corresponding audio tracks can be displayed in the video editing application.
- all audio tracks will be merged based on the time axis to generate total audio data.
- audio1 and audio2 are merged based on the time axis to generate total audio data complex-audio, which includes all audio data in the original video.
- the total audio data is also time-related. Therefore, if the first duration of the total audio data is longer than the second duration of the second video of the original video, in order to ensure the consistency of the project length, some If the audio data has no corresponding video content, the first duration of the total audio data is cut to obtain the target audio data, wherein the duration of the target audio data is consistent with the second duration.
- the audio file corresponding to the original video may include background sound in addition to the audio data of the interaction between the shooting objects.
- background sound includes the sound of music played in the environment, or the sound of vehicles passing by on the road in the environment.
- This background sound is usually irrelevant to the video content. Therefore, in order to facilitate the extraction of subsequent video content features, avoid the interference of background sounds on the extraction of video content features (for example, when extracting video text features, the background sound may be recognized. Text content in the video, etc.), in some embodiments, the background sound in the original video can also be removed.
- the audio identification of each audio track is detected, that is, according to the identification of sound features such as the sound spectrum of the audio corresponding to each audio track, the sound features of the audio corresponding to each audio track are compared with the pre-set The sound features corresponding to each set audio ID are matched, and the audio ID of each audio track is determined based on the matching result. If the target audio track representing the background music ID is detected, all audio tracks other than the target audio track are selected based on the time axis A combining process is performed to generate total audio data.
- the target audio data can also be obtained by merging all audio tracks corresponding to the original video, or it can be obtained by only merging audio tracks that meet a certain type of preset sound characteristics, etc., according to the needs of the scene. settings without limitation.
- speech recognition processing is performed on the target audio data of the original video, and then the corresponding text content is obtained, and the acquisition of the text content can be recognized through speech recognition technology.
- step 202 semantic analysis is performed on the text content to obtain the first keyword.
- the first keyword can match the recommended material for the video in the dimension of content.
- the first keyword can be an emotional keyword such as "haha, so funny", so that based on the first keyword, materials that render emotions can be recommended for the video, such as some laughing texture materials, or, Some animation materials of fireworks, etc.
- the first keyword can be a vocabulary in a professional field such as "basin”, and based on the first keyword, professional texture materials in the corresponding field can be recommended for the video, so that the vocabulary in the corresponding professional field is easier to understand wait.
- semantic analysis is performed on the text content, and the semantic result of the analysis is matched with preset keyword semantics to determine the first keyword that matches successfully.
- the text content of the target audio data can be recognized to obtain the sentence by using automatic speech recognition (Automatic Speech Recognition, ASR) technology, Furthermore, the semantics of the corresponding text sentence is understood through natural language processing technology (Natural Language Processing, NLP) to obtain the corresponding first keyword.
- ASR Automatic Speech Recognition
- NLP Natural Language Processing
- the relevance between the recommended material and the video content can be ensured in the content dimension, so as to better render the corresponding video content.
- the corresponding first keyword is shown in the form of subtitles
- the sticker material of "applause” can be recommended, so that in the processed video, the "haha”
- the audio shows “applause” stickers, which further exaggerates the happy atmosphere, and the added recommendation materials are more consistent with the video content, and the addition of recommendation materials does not appear abrupt.
- extracting video content features includes: steps 701 - 702 .
- step 701 a sound detection process is performed on the target audio data of the original video to obtain corresponding frequency spectrum data.
- the audio data may still reflect the content characteristics of the video. For example, if the audio data contains "applause”, “explosion” and so on, the recommended material can be added based on this audio data, and the atmosphere of the video can also be further enhanced with the corresponding audio.
- step 702 the frequency spectrum data is analyzed and processed to obtain a second keyword.
- the spectral data is analyzed and processed to obtain second keywords, wherein recommended materials corresponding to corresponding spectral data can be obtained based on the second keywords.
- the frequency spectrum data may be input into a deep learning model trained in advance based on a large amount of sample data, and the second keyword output by the deep learning model may be obtained.
- the acquired spectrum data may be matched with the preset spectrum data of each keyword, and the second keyword corresponding to the spectrum data is determined based on the matching degree. For example, if the matching degree between the obtained spectrum data and the spectrum data corresponding to the keyword "explosion" is greater than a preset threshold, then it is determined that the second keyword corresponding to the target audio data is "explosion".
- the first keyword and the second keyword can also be jointly recommended, wherein the second keyword can be identified based on the audio event detection (Audio event detection, AED) technology.
- AED audio event detection
- the sound detection process is performed on the target audio data of the original video, after obtaining the corresponding spectrum data, the corresponding second keyword obtained according to the spectrum data is "explosion", then the matching The recommended material for is an "explosion” sticker, so that the corresponding "explosion” sticker is displayed on the corresponding video frame to further render the video content including the explosion audio.
- the video processing method of the embodiment of the present disclosure can be based on reflecting any feature of the video content as a visual Video content features, the extracted video content features and video content have a strong correlation, which ensures the relevance of recommended materials based on video content features and video content, and provides technical support for personalized video processing effects.
- the recommended material matching the feature of the video content is further recommended, and the decision of the recommended material has a processing effect on the video.
- the determination of the recommended material will be described below with reference to specific examples.
- acquiring at least one recommended material matching the features of the video content includes: steps 1001 - 1002 .
- step 1001 video style features are determined according to the video image of the original video.
- the corresponding video styles are different. Therefore, if the same recommended material is added, it will also affect the matching degree with the video content.
- the target audio of the original video S1 According to the target audio data of the original video S2, the first keyword obtained by semantic analysis is also "haha", but the vocal object of "haha” in S1 is For anime characters, the voice of "haha” in S2 is a real person. Therefore, if the recommended material is suitable for these two styles, it will obviously affect the processing effect of the video.
- the video style feature is determined according to the video image of the original video.
- the video style feature includes the image feature of the video content, the theme style feature of the video content, the feature of the shooting object contained in the video, etc., which is not limited here.
- the convolutional network model is trained in advance based on a large amount of sample data, video images are input into the corresponding convolutional network model, and the video style features output by the convolutional network model are obtained.
- determining the video style feature according to the video image of the original video includes: Steps 1201 - 1202 .
- step 1201 an image recognition process is performed on the video image of the original video, and at least one shooting object is determined according to the recognition result.
- the shooting object may be a subject contained in a video image, including but not limited to: people, animals, furniture, tableware, and the like.
- step 1202 weighting calculation is performed on at least one shooting object according to the preset object weight, and the calculation result is matched with the preset style classification to determine the video style feature corresponding to the original video.
- the object type of each shooting object can be identified, and the pre-selected A database is set up to obtain the object weight of each shooting object, wherein the database includes training based on a large number of sample data, each shooting type and the corresponding object weight, and then at least one shooting object is weighted according to the preset object weight Computing, matching the calculation result with the preset style classification, and determining the video style feature corresponding to the successfully matched style classification.
- multiple video frames extracted from the original video may be used as video images of the original video based on the multiple video frames, so as to further improve style recognition efficiency.
- multiple frames of video frames can be extracted from the original video according to a preset time interval (such as 1 second), or a corresponding video segment can be extracted according to a preset time length at intervals, and according to the video segment A video image containing multiframe video frames as raw video.
- the video image can be input into a pre-trained image intelligent recognition model, and at least one shooting object is determined according to the recognition result.
- the shooting objects in the figure include human faces, objects, environments, etc. , and further, identify the classification features t1, t2, t3 corresponding to each shooting object, and the corresponding object weights are z1, z2, and z3 respectively, then calculate the value of t1z1+t2z2+t3z3 as the calculation result, according to the calculation result and the preset Match the style classification of the original video to determine the video style features corresponding to the original video.
- step 1002 at least one recommended material matching the video style features and video content features is obtained.
- At least one recommended material matching the video style feature and video content feature is acquired, so that the recommended material matches the video content on the video style feature and video content feature, further improving the video quality. processing effect.
- a material library that matches the video style characteristics can be obtained first, and at least one recommended material that matches the video content characteristics is obtained in the material library, thereby ensuring that the obtained recommended materials are not only compatible with The content of the video matches and is consistent with the style of the video.
- the video style feature is "girly anime”
- a material library composed of various girly-style materials matching "girly anime” is obtained, and then, based on the video content characteristics, the material library composed of various girly-style materials is obtained. Match the video materials in the library to ensure that the recommended materials obtained are all girlish.
- obtaining at least one recommended material matching the feature of the video content includes: steps 1501 - 1504 .
- step 1501 the playing time of the video frame corresponding to the video content feature is determined in the original video, wherein the video content feature is generated according to the video content of the video frame.
- the video content feature is generated according to the video content of the video frame, therefore, in the original video, determine the video content feature The playing time of the video frame corresponding to the feature, so that according to the playing time, the corresponding material is recommended and added only for the video frame containing the corresponding video content feature.
- step 1502 a time stamp is marked for the video content feature according to the playing time of the video frame.
- the video content feature is marked with a time mark according to the playing time of the video frame, so as to facilitate matching of recommended materials in terms of time.
- step 1503 for the same time stamp, if it is determined that there are multiple corresponding video content features, the multiple video content features are combined into a video feature set, and at least one recommended material matching the video feature set is obtained.
- the multiple video content features are combined into a video feature set, and the obtained and At least one recommended material matching the video feature set.
- multiple video content features can be combined to generate multiple video content feature combinations (video feature sets), query preset correspondences, and determine whether there is an enhanced material corresponding to each video content feature combination, If no enhanced material is matched, the video content feature combination is split into individual content feature matching recommended materials, and if the enhanced material is matched, the enhanced material is used as the corresponding recommended material.
- the video feature set here does not necessarily include a simple combination of recommended materials corresponding to multiple video content features, and it may be to further strengthen the video atmosphere when there is correlation between multiple video content features , to generate another recommended material with a stronger sense of atmosphere.
- the first keyword corresponding to video content feature 1 is “haha”
- the second keyword corresponding to video content feature 2 is “applause”
- the recommended materials that are jointly determined by words are transition effects materials, rather than the above-mentioned sticker materials that correspond to "haha” and "applause”.
- step 1504 for the same time identifier, if it is determined that there is a corresponding video content feature, at least one recommended material matching a video content feature is obtained.
- At least one recommended material matching a video content feature is obtained, that is, if there is a single video content feature, then a separate matching At least one recommended material.
- the adding time is the same as the display time of the video frame of the corresponding video content feature Sincerely.
- the original video is clipped according to the material addition time of the recommended material to generate the target video. Therefore, only when a video frame containing the characteristics of the video content corresponding to the material is played, the corresponding recommended material is added, so as to avoid the inconsistency between the addition of the material and the video content.
- some materials do not have size information, such as sound effect materials, transition effects materials, etc., and some materials have size information.
- size information such as sound effect materials, transition effects materials, etc.
- sticker material and text material, etc. In order to prevent some materials with size information from blocking important display content in the video content when adding them. For example, in order not to block the face in the video frame, etc., it is necessary to determine the added area of these materials with size information.
- the material type of the recommended material meets the preset target type, that is, when the recommended material has the attribute of adding size information, it is considered that the corresponding material meets the preset target type, and then, from the original video Obtain the target video frame corresponding to the material addition time of the recommended material, perform image recognition on the target video frame to obtain the subject area of the subject, where the subject area can be any position information that reflects the location of the subject, for example, it can be the center
- the coordinate point for example, may be a location range or the like.
- the shooting object is the sounding object corresponding to the "haha" audio.
- determine the recommended material in the target The footage area to add on the video frame.
- the material type label of the recommended material can be determined, and the preset corresponding relationship can be queried according to the material type label to determine the regional characteristics of the material area (such as the background area on the image, etc.), and in the target video frame The region that matches the characteristics of the region is determined as the material region.
- the object type label of the photographed object can be determined, and the preset corresponding relationship can be queried according to the object type label to determine the regional characteristics of the material region (for example, if the photographed object is a face type, the corresponding regional characteristic corresponds to the top of the head etc.), determine the region that matches the feature of the region on the target video frame as the material region.
- the original video is edited to generate the target video, and the corresponding material will be added to the material area in the video frame corresponding to the material adding time.
- the material area may be the coordinates of the center point of the material added in the corresponding video frame, or may be the coordinate range of the material added in the corresponding video frame.
- the server for style feature recognition since the server for determining the subject area and the like may not be the same server as the server for style feature recognition, in order to improve recognition efficiency, the server for style feature recognition may be a local server, in order to reduce material adding time and material area
- the server identified according to the material addition time and material area of the recommended material can be a remote server.
- the material addition time of the recommended material matching the video content feature is set to t1 and t2 respectively
- the original video is edited according to the material addition time of the recommended material
- the video segment clip1 corresponding to F1 and the video segment clip2 corresponding to F2 are obtained.
- the corresponding server optimizes the added material, performs image recognition on the target video frame to obtain the main area of the subject, and then determines the recommended material in the target area according to the main area of the subject.
- the video processing method of the embodiment of the present disclosure after determining the video content features, determines at least one recommended material that matches the multi-dimensional video content features, and also ensures the correspondence between the material and the video frame in terms of position and time, It further ensures that the processing effect of the video satisfies the personalized characteristics of the video content.
- FIG. 17 is a schematic structural diagram of a video processing device provided by an embodiment of the present disclosure.
- the device can be implemented by software and/or hardware, and generally can be integrated into an electronic device. As shown in FIG. 17 , the device includes: an extraction module 1710 , an acquisition module 1720 , and a processing module 1730 .
- the extraction module 1710 is used for extracting video content features based on the analysis of the original video.
- the acquiring module 1720 is configured to acquire at least one recommended material matching the feature of the video content.
- the processing module 1730 is configured to perform video processing on the original video according to the recommended material to generate a target video, wherein the target video is a video generated by adding the recommended material to the original video.
- the video processing device provided by the embodiment of the present disclosure can execute the video processing method provided by any embodiment of the present disclosure, and has corresponding functional modules and beneficial effects for executing the method, which will not be repeated here.
- the present disclosure further proposes a computer program product, including computer programs/instructions, when the computer program/instructions are executed by a processor, the video processing method in any of the above embodiments is implemented.
- Fig. 18 is a schematic structural diagram of an electronic device provided by an embodiment of the present disclosure.
- FIG. 18 shows a schematic structural diagram of an electronic device 1800 suitable for implementing an embodiment of the present disclosure.
- the electronic device 1800 in the embodiment of the present disclosure may include, but not limited to, mobile phones, notebook computers, digital broadcast receivers, PDAs (Personal Digital Assistants), PADs (Tablet Computers), PMPs (Portable Multimedia Players), vehicle-mounted terminals ( Mobile terminals such as car navigation terminals) and stationary terminals such as digital TVs, desktop computers and the like.
- the electronic device shown in Figure 18 is only an example and should not be used for this No limitations are imposed on the scope of function and use of the disclosed embodiments.
- an electronic device 1800 may include a processor (such as a central processing unit, a graphics processing unit, etc.) (RAM) 1803 to execute various appropriate actions and processing.
- a processor such as a central processing unit, a graphics processing unit, etc.
- RAM random access memory
- various programs and data necessary for the operation of the electronic device 1800 are also stored.
- the processor 1801, ROM 1802, and RAM 1803 are connected to each other through a bus 1804.
- An input/output (I/O) interface 1805 is also connected to the bus 1804 .
- the following devices can be connected to the I/O interface 1805: input devices 1806 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speaker, vibration an output device 1807 such as a computer; a memory 1808 including, for example, a magnetic tape, a hard disk, etc.; and a communication device 1809.
- the communication means 1809 may allow the electronic device 1800 to perform wireless or wired communication with other devices to exchange data. While FIG. 18 shows electronic device 1800 having various means, it is to be understood that implementing or having all of the means shown is not a requirement. More or fewer means may alternatively be implemented or provided.
- embodiments of the present disclosure include a computer program product, which includes a computer program carried on a non-transitory computer readable medium, where the computer program includes program code for executing the method shown in the flowchart.
- the computer program may be downloaded and installed from a network via communication means 1809, or from memory 1808, or from ROM 1802.
- the processor 1801 When the computer program is executed by the processor 1801, the above-mentioned functions defined in the video processing method of the embodiment of the present disclosure are executed.
- the computer-readable medium mentioned above in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium or any combination of the two.
- a computer readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
- a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
- a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave carrying computer-readable program code therein. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, Optical signals or any suitable combination of the above.
- a computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, which can transmit, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device .
- Program code embodied on a computer readable medium may be transmitted by any appropriate medium, including but not limited to wires, optical cables, RF (radio frequency), etc., or any suitable combination of the above.
- the client and the server can communicate using any currently known or future network protocols such as HTTP (HyperText Transfer Protocol, Hypertext Transfer Protocol), and can communicate with digital data in any form or medium
- HTTP HyperText Transfer Protocol
- the communication eg, communication network
- Examples of communication networks include local area networks (“LANs”), wide area networks (“WANs”), internetworks (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network of.
- the above-mentioned computer-readable medium may be included in the above-mentioned electronic device, or may exist independently without being incorporated into the electronic device.
- the above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device: after extracting the video content features of the original video, acquires at least one Recommend material, and then add the recommended material to the original video to obtain the target video.
- the video content of the adapted video is added to the video material, which improves the matching degree between the video content and the video material, and realizes the personalized effect processing of the video.
- Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, or combinations thereof, including but not limited to object-oriented programming languages—such as Java, Smalltalk, C++, and Includes conventional procedural programming languages - such as the "C" language or similar programming languages.
- the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
- the remote computer can be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as through an Internet service provider). Internet connection).
- LAN local area network
- WAN wide area network
- Internet service provider such as AT&T, MCI, Sprint, EarthLink, MSN, GTE, etc.
- each block in a flowchart or block diagram may represent a module, program segment, or portion of code that contains one or more logical functions for implementing specified executable instructions.
- the functions noted in the block may occur out of the order noted in the figures. For example, two Blocks shown in succession may, in fact, be executed substantially in parallel, or they may sometimes be executed in the reverse order, depending upon the functionality involved.
- each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented by a dedicated hardware-based system that performs the specified functions or operations , or may be implemented by a combination of dedicated hardware and computer instructions.
- the units involved in the embodiments described in the present disclosure may be implemented by software or by hardware. Wherein, the name of a unit does not constitute a limitation of the unit itself under certain circumstances.
- FPGAs Field Programmable Gate Arrays
- ASICs Application Specific Integrated Circuits
- ASSPs Application Specific Standard Products
- SOCs System on Chips
- CPLD Complex Programmable Logical device
- a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device.
- a machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
- a machine-readable medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing.
- machine-readable storage media would include one or more wire-based electrical connections, portable computer discs, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
- RAM random access memory
- ROM read only memory
- EPROM or flash memory erasable programmable read only memory
- CD-ROM compact disk read only memory
- magnetic storage or any suitable combination of the foregoing.
- the present disclosure provides a video processing method, including:
- the extraction of video content features based on the analysis of the original video includes:
- Semantic analysis is performed on the text content to obtain the first keyword.
- the extraction of video content features based on the analysis of the original video includes:
- the method for obtaining the target audio data includes:
- the merging of all the audio tracks based on the time axis to generate the total audio data includes:
- all audio tracks other than the target audio track are combined based on the time axis to generate total audio data.
- the obtaining at least one recommended material matching the characteristics of the video content includes:
- At least one recommended material matching the video style feature and the video content feature is acquired.
- the determining the video style feature according to the video image of the original video includes:
- Weighting calculation is performed on the at least one object to be photographed according to a preset object weight, and the calculation result is matched with a preset style classification to determine a video style feature corresponding to the original video.
- the acquiring at least one recommended material matching the characteristics of the video content includes:
- At least one recommended material matching the one video content feature is acquired.
- performing video processing on the original video according to the recommended material to generate a target video includes:
- the time stamp of the video content feature set the material addition time of the recommended material matching the video content feature
- the target video is generated by clipping the original video according to the material addition time of the recommended material.
- the clipping of the original video according to the material addition time of the recommended material to generate the target video includes:
- the main body area of the shooting object determine the material area of the recommended material added on the target video frame
- the original video is clipped to generate a target video.
- the detecting the audio identifier of each audio track includes:
- An audio identity is determined for each audio track based on the matching results.
- combining the plurality of video content features into a video feature set and obtaining at least one recommended material matching the video feature set includes:
- the video feature set query the preset corresponding relationship, and determine whether the enhanced material corresponds to the video feature set;
- the video feature set is split into separate content feature matching recommended material
- the enhanced material is matched, the enhanced material is used as the corresponding recommended material.
- the present disclosure provides a video processing device, including:
- the extraction module is used to extract video content features based on the analysis of the original video
- An acquisition module configured to acquire at least one recommended material matching the characteristics of the video content
- a processing module configured to perform video processing on the original video according to the recommended material to generate a target video, wherein the target video is a video generated by adding the recommended material to the original video.
- the extraction module is used for:
- Semantic analysis is performed on the text content to obtain the first keyword.
- the extraction module is used for:
- the extraction module is configured to: obtain all audio tracks displayed in the video clip application of the video file of the original video;
- the extraction module is used for:
- all audio tracks other than the target audio track are combined based on the time axis to generate total audio data.
- the acquisition module is specifically configured to:
- At least one recommended material matching the video style feature and the video content feature is acquired.
- the acquisition module is specifically configured to: perform image recognition processing on the video image of the original video, and determine at least one shooting object according to the recognition result;
- the acquisition module is used for:
- At least one recommended material matching the one video content feature is acquired.
- the acquisition module is used for:
- the time stamp of the video content feature set the material addition time of the recommended material matching the video content feature
- the target video is generated by clipping the original video according to the material addition time of the recommended material.
- the acquisition module is used for:
- the main body area of the shooting object determine the material area of the recommended material added on the target video frame
- the original video is clipped to generate a target video.
- the extraction module is used for:
- An audio identity is determined for each audio track based on the matching results.
- the acquisition module is used for:
- the video feature set query the preset corresponding relationship, and determine whether the enhanced material corresponds to the video feature set;
- the video feature set is split into separate content feature matching recommended material
- the enhanced material is matched, the enhanced material is used as the corresponding recommended material.
- the present disclosure provides an electronic device, including:
- the processor is configured to read the executable instructions from the memory, and execute the instructions to implement any video processing method provided in the present disclosure.
- the present disclosure provides a computer-readable storage medium, the storage medium stores a computer program, and the computer program is used to execute any one of the video processing methods provided in the present disclosure.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Databases & Information Systems (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Television Signal Processing For Recording (AREA)
Abstract
Description
Claims (16)
- 一种视频处理方法,包括:基于原始视频的分析,提取视频内容特征;获取与所述视频内容特征匹配的至少一个推荐素材;根据所述推荐素材对所述原始视频进行视频处理,生成目标视频,其中,所述目标视频是对所述原始视频添加所述推荐素材后生成的视频。
- 根据权利要求1所述的视频处理方法,其中,所述基于原始视频的分析,提取视频内容特征包括:对所述原始视频的目标音频数据进行语音识别处理,获取对应的文本内容;对所述文本内容进行语义解析处理,获取第一关键词。
- 根据权利要求1或2所述的视频处理方法,其中,所述基于原始视频的分析提取视频内容特征包括:对所述原始视频的目标音频数据进行声音检测处理,获取对应的频谱数据;对所述频谱数据进行分析处理,获取第二关键词。
- 根据权利要求2或3所述的视频处理方法,其中,所述目标音频数据的获取方法包括:获取所述原始视频的视频文件在视频剪辑应用中显示的所有音频轨道;基于时间轴将所述所有音频轨道进行合并处理,生成总音频数据;将所述总音频数据的第一时长与所述原始视频的第二时长进行比较,在所述第一时长大于所述第二时长的情况下,对所述总音频数据的第一时长进行裁剪获取所述目标音频数据,其中,所述目标音频数据的时长与所述第二时长一致。
- 根据权利要求4所述的视频处理方法,其中,所述基于时间轴将所述所有音频轨道进行合并处理,生成总音频数据包括:检测每个音频轨道的音频标识;在检测到表示背景音乐标识的目标音频轨道的情况下,基于时间轴将所述目标音频轨道之外的所有音频轨道进行合并处理,生成总音频数据。
- 根据权利要求1-5任一项所述的视频处理方法,其中,所述获取与所述视频内容特征匹配的至少一个推荐素材包括:根据所述原始视频的视频图像确定视频风格特征;获取与所述视频风格特征和所述视频内容特征匹配的至少一个推荐素材。
- 根据权利要求6所述的视频处理方法,其中,所述根据所述原始视频的视频图像确定视频风格特征包括:对所述原始视频的视频图像进行图像识别处理,根据识别结果确定至少一个拍摄对象;根据预设的对象权重对所述至少一个拍摄对象进行加权计算,根据计算结果与预设的风格分类进行匹配,确定与所述原始视频对应的视频风格特征。
- 根据权利要求1-7任一项所述的视频处理方法,其中,所述获取与所述视频内容特征匹配的至少一个推荐素材包括:在所述原始视频中确定与所述视频内容特征所对应的视频帧的播放时间,其中,所述视频内容特征是根据所述视频帧的视频内容生成的;根据所述视频帧的播放时间为所述视频内容特征标记时间标识;针对同一个时间标识,在确定存在对应的多个视频内容特征的情况下,将所述多个视频内容特征组合成视频特征集合,并获取与所述视频特征集合匹配的至少一个推荐素材;针对同一个时间标识,在确定存在对应的一个视频内容特征的情况下,获取与所述一个视频内容特征匹配的至少一个推荐素材。
- 根据权利要求8所述的视频处理方法,其中,所述根据所述推荐素材对所述原始视频进行视频处理生成目标视频包括:根据所述视频内容特征的时间标识,设置与所述视频内容特征匹配的推荐素材的素材添加时间;根据所述推荐素材的素材添加时间对所述原始视频进行剪辑处理生成目标视频。
- 根据权利要求9所述的视频处理方法,其中,所述根据所述推荐素材的素材添加时间对所述原始视频进行剪辑处理生成目标视频包括:在所述推荐素材的素材类型满足预设目标类型的情况下,从所述原始视频中获取与所述推荐素材的素材添加时间对应的目标视频帧;对所述目标视频片帧进行图像识别获取拍摄对象的主体区域;根据所述拍摄对象的主体区域,确定所述推荐素材在所述目标视频帧上添加的素材区域;根据所述推荐素材的素材添加时间和所述素材区域,对所述原始视频进行剪辑处 理生成目标视频。
- 根据权利要求5-10任一项所述的视频处理方法,其中,所述检测每个所述音频轨道的音频标识包括:识别每个音频轨道对应的音频的声音特征;将每个音频轨道对应的音频的声音特征与预先设置的每个音频标识对应的声音特征匹配;基于匹配结果确定每个音频轨道的音频标识。
- 根据权利要求8-10任一项所述的视频处理方法,其中,所述将所述多个视频内容特征组合成视频特征集合,并获取与所述视频特征集合匹配的至少一个推荐素材包括:根据所述视频特征集合,查询预设的对应关系,确定是否与所述视频特征集合对应的强化素材;在没有匹配到强化素材的情况下,则将所述视频特征集合拆分成单独的内容特征匹配推荐素材;在匹配到强化素材的情况下,则将强化素材作为对应的推荐素材。
- 一种视频处理装置,其中,包括:提取模块,用于基于原始视频的分析提取视频内容特征;获取模块,用于获取与所述视频内容特征匹配的至少一个推荐素材;处理模块,用于根据所述推荐素材对所述原始视频进行视频处理生成目标视频,其中,所述目标视频是对所述原始视频添加所述推荐素材后生成的视频。
- 一种电子设备,其中,所述电子设备包括:处理器;用于存储所述处理器可执行指令的存储器;所述处理器,用于从所述存储器中读取所述可执行指令,并执行所述指令以实现上述权利要求1-12中任一所述的视频处理方法。
- 一种计算机可读存储介质,其中,所述存储介质存储有计算机程序,所述计算机程序用于执行上述权利要求1-12中任一所述的视频处理方法。
- 一种计算机程序,包括:指令,所述指令被处理器执行时实现如权利要求1-12中任一项所述的视频处理方法。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US18/574,263 US20240244290A1 (en) | 2022-02-25 | 2023-02-21 | Video processing method and apparatus, device and storage medium |
EP23759140.9A EP4485948A1 (en) | 2022-02-25 | 2023-02-21 | Video processing method and apparatus, device and medium |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210178794.5A CN116708917A (zh) | 2022-02-25 | 2022-02-25 | 视频处理方法、装置、设备及介质 |
CN202210178794.5 | 2022-02-25 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023160515A1 true WO2023160515A1 (zh) | 2023-08-31 |
Family
ID=87764746
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2023/077309 WO2023160515A1 (zh) | 2022-02-25 | 2023-02-21 | 视频处理方法、装置、设备及介质 |
Country Status (4)
Country | Link |
---|---|
US (1) | US20240244290A1 (zh) |
EP (1) | EP4485948A1 (zh) |
CN (1) | CN116708917A (zh) |
WO (1) | WO2023160515A1 (zh) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150169747A1 (en) * | 2013-12-12 | 2015-06-18 | Google Inc. | Systems and methods for automatically suggesting media accompaniments based on identified media content |
US9270964B1 (en) * | 2013-06-24 | 2016-02-23 | Google Inc. | Extracting audio components of a portion of video to facilitate editing audio of the video |
CN110381371A (zh) * | 2019-07-30 | 2019-10-25 | 维沃移动通信有限公司 | 一种视频剪辑方法及电子设备 |
CN111541936A (zh) * | 2020-04-02 | 2020-08-14 | 腾讯科技(深圳)有限公司 | 视频及图像处理方法、装置、电子设备、存储介质 |
CN111556335A (zh) * | 2020-04-15 | 2020-08-18 | 早安科技(广州)有限公司 | 一种视频贴纸处理方法及装置 |
CN113518256A (zh) * | 2021-07-23 | 2021-10-19 | 腾讯科技(深圳)有限公司 | 视频处理方法、装置、电子设备及计算机可读存储介质 |
-
2022
- 2022-02-25 CN CN202210178794.5A patent/CN116708917A/zh active Pending
-
2023
- 2023-02-21 WO PCT/CN2023/077309 patent/WO2023160515A1/zh active Application Filing
- 2023-02-21 EP EP23759140.9A patent/EP4485948A1/en active Pending
- 2023-02-21 US US18/574,263 patent/US20240244290A1/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9270964B1 (en) * | 2013-06-24 | 2016-02-23 | Google Inc. | Extracting audio components of a portion of video to facilitate editing audio of the video |
US20150169747A1 (en) * | 2013-12-12 | 2015-06-18 | Google Inc. | Systems and methods for automatically suggesting media accompaniments based on identified media content |
CN110381371A (zh) * | 2019-07-30 | 2019-10-25 | 维沃移动通信有限公司 | 一种视频剪辑方法及电子设备 |
CN111541936A (zh) * | 2020-04-02 | 2020-08-14 | 腾讯科技(深圳)有限公司 | 视频及图像处理方法、装置、电子设备、存储介质 |
CN111556335A (zh) * | 2020-04-15 | 2020-08-18 | 早安科技(广州)有限公司 | 一种视频贴纸处理方法及装置 |
CN113518256A (zh) * | 2021-07-23 | 2021-10-19 | 腾讯科技(深圳)有限公司 | 视频处理方法、装置、电子设备及计算机可读存储介质 |
Also Published As
Publication number | Publication date |
---|---|
US20240244290A1 (en) | 2024-07-18 |
EP4485948A1 (en) | 2025-01-01 |
CN116708917A (zh) | 2023-09-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109688463B (zh) | 一种剪辑视频生成方法、装置、终端设备及存储介质 | |
US11482242B2 (en) | Audio recognition method, device and server | |
CN110503961B (zh) | 音频识别方法、装置、存储介质及电子设备 | |
WO2019148586A1 (zh) | 多人发言中发言人识别方法以及装置 | |
CN109637520A (zh) | 基于语音分析的敏感内容识别方法、装置、终端及介质 | |
CN107680584B (zh) | 用于切分音频的方法和装置 | |
US10277834B2 (en) | Suggestion of visual effects based on detected sound patterns | |
CN113596579B (zh) | 视频生成方法、装置、介质及电子设备 | |
CN112153460B (zh) | 一种视频的配乐方法、装置、电子设备和存储介质 | |
CN111798821B (zh) | 声音转换方法、装置、可读存储介质及电子设备 | |
CN112929746A (zh) | 视频生成方法和装置、存储介质和电子设备 | |
CN108877779B (zh) | 用于检测语音尾点的方法和装置 | |
CN105224581A (zh) | 在播放音乐时呈现图片的方法和装置 | |
CN112182255A (zh) | 用于存储媒体文件和用于检索媒体文件的方法和装置 | |
US12164562B1 (en) | Background audio identification for query disambiguation | |
CN111488813A (zh) | 视频的情感标注方法、装置、电子设备及存储介质 | |
CN112328830A (zh) | 一种基于深度学习的信息定位方法及相关设备 | |
CN111859970B (zh) | 用于处理信息的方法、装置、设备和介质 | |
US11410706B2 (en) | Content pushing method for display device, pushing device and display device | |
US11706505B1 (en) | Processing method, terminal device, and medium | |
WO2023160515A1 (zh) | 视频处理方法、装置、设备及介质 | |
CN111259181B (zh) | 用于展示信息、提供信息的方法和设备 | |
CN110400559B (zh) | 一种音频合成的方法、装置及设备 | |
CN109495786B (zh) | 视频处理参数信息的预配置方法、装置及电子设备 | |
CN112951274A (zh) | 语音相似度确定方法及设备、程序产品 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23759140 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 18574263 Country of ref document: US |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2024550229 Country of ref document: JP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2023759140 Country of ref document: EP |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 2023759140 Country of ref document: EP Effective date: 20240925 |