WO2023279743A1 - 一种音频切换模板生成方法及设备 - Google Patents

一种音频切换模板生成方法及设备 Download PDF

Info

Publication number
WO2023279743A1
WO2023279743A1 PCT/CN2022/079533 CN2022079533W WO2023279743A1 WO 2023279743 A1 WO2023279743 A1 WO 2023279743A1 CN 2022079533 W CN2022079533 W CN 2022079533W WO 2023279743 A1 WO2023279743 A1 WO 2023279743A1
Authority
WO
WIPO (PCT)
Prior art keywords
switching
audio
switching point
target
point
Prior art date
Application number
PCT/CN2022/079533
Other languages
English (en)
French (fr)
Inventor
张冉
王可尧
翟传磊
Original Assignee
北京达佳互联信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京达佳互联信息技术有限公司 filed Critical 北京达佳互联信息技术有限公司
Publication of WO2023279743A1 publication Critical patent/WO2023279743A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44016Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for substituting a video clip
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • H04N21/4394Processing of audio elementary streams involving operations for analysing the audio stream, e.g. detecting features or characteristics in audio streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8456Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments

Definitions

  • the present disclosure relates to the field of computer technology, in particular to a method and device for generating an audio switching template.
  • stuck video refers to the video whose picture can be switched at the time point of switching with the rhythm of the music.
  • the present disclosure provides a method and device for generating an audio switching template.
  • the technical scheme of the disclosed embodiment is as follows:
  • a method for generating an audio switching template is provided, and the method can be applied to an electronic device.
  • the method includes:
  • the switching type of the audio switching point includes: at least one of a rhythm switching point, a melody switching point or a lyrics switching point;
  • An audio switching template corresponding to the audio data is generated according to the target audio switching point; the audio switching template is used to generate a video corresponding to the audio switching template.
  • an apparatus for generating an audio switching template is provided, which can be applied to electronic equipment.
  • the device may include: an acquisition unit, a processing unit and a generation unit;
  • the acquisition unit is used to extract the picture switching point from the video data of the video to be processed
  • the acquisition unit is further configured to extract at least one audio switching point of a switching type from the audio data of the video to be processed; the switching type of the audio switching point includes: a rhythm switching point, a melody switching point or a lyrics switching point at least one of
  • the processing unit is configured to determine a target switching type according to the screen switching point and the audio switching point of the at least one switching type
  • the obtaining unit is further configured to obtain a target audio switching point from audio switching points belonging to the target switching type, and the target audio switching point corresponds to the picture switching point;
  • the generating unit is configured to generate an audio switching template corresponding to the audio data according to the target audio switching point; the audio switching template is used to generate a video corresponding to the audio switching template.
  • an electronic device which may include: a processor and a memory for storing processor-executable instructions; wherein the processor is configured to execute the instructions to implement the following steps:
  • the switching type of the audio switching point includes: at least one of a rhythm switching point, a melody switching point or a lyrics switching point;
  • An audio switching template corresponding to the audio data is generated according to the target audio switching point; the audio switching template is used to generate a video corresponding to the audio switching template.
  • a computer-readable storage medium is provided, and instructions are stored on the computer-readable storage medium, and the instructions in the computer-readable storage medium are executed by a processor of an electronic device, so that the The electronic device is capable of performing the following steps:
  • the switching type of the audio switching point includes: at least one of a rhythm switching point, a melody switching point or a lyrics switching point;
  • An audio switching template corresponding to the audio data is generated according to the target audio switching point; the audio switching template is used to generate a video corresponding to the audio switching template.
  • a computer program product includes computer instructions, and the computer instructions run on an electronic device, so that the electronic device performs the following steps:
  • the switching type of the audio switching point includes: at least one of a rhythm switching point, a melody switching point or a lyrics switching point;
  • An audio switching template corresponding to the audio data is generated according to the target audio switching point; the audio switching template is used to generate a video corresponding to the audio switching template.
  • the target switching type can be determined according to the picture switching point and at least one audio switching point of switching type. From the audio switching points belonging to the target switching type, a target audio switching point corresponding to the picture switching point is determined, and an audio switching template is generated according to the determined target audio switching point.
  • the audio switching template can be generated without manual participation, which improves the generation efficiency of the audio switching template.
  • the screen switching point corresponds to the target audio switching point, the screen switching point can accurately correspond to the audio switching point in the video to be processed, which improves the accuracy of the screen switching point, thereby improving the audio switching template. The accuracy of the target audio switch point.
  • Fig. 1 shows a schematic flowchart of a method for generating an audio switching template provided by an embodiment of the present disclosure.
  • Fig. 2 shows a schematic flowchart of another method for generating an audio switching template provided by an embodiment of the present disclosure.
  • Fig. 3 shows a schematic flowchart of another method for generating an audio switching template provided by an embodiment of the present disclosure.
  • Fig. 4 shows a schematic flowchart of another method for generating an audio switching template provided by an embodiment of the present disclosure.
  • Fig. 5 shows a schematic structural diagram of another device for generating an audio switching template provided by an embodiment of the present disclosure.
  • Fig. 6 shows a schematic structural diagram of a terminal provided by an embodiment of the present disclosure.
  • Fig. 7 shows a schematic structural diagram of a server provided by an embodiment of the present disclosure.
  • the data involved in the present disclosure may be data authorized by the user or fully authorized by all parties.
  • the method of generating stuck video in the prior art is mainly to manually use video editing software to create an audio switching template, and generate stuck video according to the audio switching template and received pictures or video clips.
  • manually crafting audio switching templates is extremely inefficient.
  • the position of the audio switching point in the above solution is completely determined manually, the accuracy of the determined position of the audio switching point is poor.
  • An embodiment of the present disclosure provides a method for generating an audio switching template. After extracting a picture switching point and at least one switching type of audio switching point from the video to be processed, it can be determined according to the picture switching point and at least one switching type of audio switching point. Target switch type. From the audio switching points belonging to the target switching type, a target audio switching point corresponding to the picture switching point is determined, and an audio switching template is generated according to the determined target audio switching point. In the above solution, the audio switching template can be generated without manual participation, which improves the generation efficiency of the audio switching template.
  • the screen switching point when the screen switching point corresponds to the target audio switching point, the screen switching point can accurately correspond to the audio switching point in the video to be processed, which improves the accuracy of the screen switching point, thereby improving the audio switching template.
  • the accuracy of the target audio switch point when the screen switching point corresponds to the target audio switching point, the screen switching point can accurately correspond to the audio switching point in the video to be processed, which improves the accuracy of the screen switching point, thereby improving the audio switching template.
  • the method for generating an audio switching template provided by the present disclosure is applied to an electronic device.
  • the electronic device is a server, or a terminal, or other electronic devices for generating an audio switching template, which is not limited in the present disclosure.
  • the server is a single server, or a server cluster composed of multiple servers.
  • the server cluster is a distributed cluster. The present disclosure does not limit the specific implementation of the server.
  • Terminals are cell phones, tablet computers, desktops, laptops, handheld computers, notebook computers, ultra-mobile personal computers (UMPCs), netbooks, and cellular phones, personal digital assistants (PDAs) , augmented reality (augmented reality, AR), virtual reality (virtual reality, VR) equipment and other equipment installed and using content community applications, the present disclosure does not specifically limit the specific form of the electronic equipment.
  • PDAs personal digital assistants
  • augmented reality augmented reality, AR
  • virtual reality virtual reality
  • the terminal can perform human-computer interaction with the user through one or more methods such as keyboard, touch panel, touch screen, remote control, voice interaction or handwriting equipment.
  • the method for generating an audio switching template when the method for generating an audio switching template is applied to an electronic device, the method for generating an audio switching template may include S101 to S105 .
  • the electronic device extracts a picture switching point from video data of a video to be processed.
  • the electronic device can extract the picture switching point from the video data of the video to be processed.
  • the video to be processed is a stuck video, that is, a video whose screen can be switched at a time point following the rhythm of the music.
  • the screen switching point is the moment of screen switching in the video to be processed.
  • the electronic device when the electronic device extracts the picture switching point from the video data of the video to be processed, it can use the picture switching detection technology to extract the picture switching point; or, the electronic device can divide the video to be processed into multiple video frames, Determining the moment when the video frame is divided as the screen switching point to obtain the screen switching point; or, the electronic device can also extract the screen switching point from the video data of the video to be processed in other ways, which is not limited in the present disclosure.
  • the electronic device when generating the audio switching template, can acquire a video with higher popularity from a large number of video clips as a video to be processed.
  • the popularity of the video may be represented by at least one of the number of plays, likes, or reposts of the video.
  • a large number of stuck videos are searched by the electronic device by searching keywords and using the search capability; or the electronic device classifies multiple videos through a video classification algorithm, and determines the category that includes the most stuck videos, Obtain videos belonging to this category; or, the electronic device determines the blogger who publishes the stuck video, and obtains the stuck video sent by the blogger, which is not limited in this disclosure.
  • the electronic device extracts at least one audio switching point of a switching type from the audio data of the video to be processed.
  • the switching type of the audio switching point includes: at least one of a beat (beat) switching point, a melody (onset) switching point or a lyrics switching point.
  • the tempo switching point is the moment of tempo switching in the video to be processed.
  • the melody switching point is the moment when the melody switches in the video to be processed.
  • the lyrics switching point is the moment when the lyrics are switched in the video to be processed.
  • the audio switching point is also called a snap point, a node.
  • the audio data of stuck video generally includes various nodes, such as beat switching point, melody switching point or lyrics switching point, etc.
  • the electronic device can extract at least one audio switching point of a switching type from the audio data of the video to be processed.
  • the audio data of the video to be processed includes a melody switching point
  • the audio data of the video to be processed does not include human voice.
  • the electronic device only needs to extract the melody switching point and the beat switching point without extracting the lyrics switching point.
  • the audio data of the video to be processed includes a lyrics switching point
  • the electronic device only needs to extract the switching point of the lyrics and the switching point of the beat, and does not need to extract the switching point of the melody.
  • the electronic device when it extracts the beat switching point from the audio data of the video to be processed, it can use a beat tracking (beat tracking) algorithm to process the audio data of the video to be processed to obtain the beat switching point of the audio data.
  • a beat tracking beat tracking
  • the onset recognition algorithm can be used to process the audio data of the video to be processed to obtain the melody switching point of the audio data.
  • the electronic device when the electronic device extracts the lyrics switching point from the audio data of the video to be processed, it can obtain the lyrics content in the audio data of the video to be processed, and determine the lyrics switching point of the audio data according to the lyrics content.
  • the electronic device can first execute S101, and then execute S102; or, the electronic device can execute S102 first, and then execute S101; or, the electronic device can also execute S101 at the same time and S102; this disclosure does not limit it.
  • the electronic device determines a target switching type according to the screen switching point and the audio switching point of at least one switching type.
  • the electronic device After extracting the picture switching point from the video data of the video to be processed, and extracting at least one audio switching point of switching type from the audio data of the video to be processed, the electronic device can according to the picture switching point and the audio switching point of at least one switching type , to determine the target switching type, where the target switching type is the switching type corresponding to the video to be processed.
  • the electronic device can determine the target switching type according to the overlap between the screen switching point and the audio switching point of each switching type; or, the electronic device can also determine the target switching type according to the screen content of the screen switching point and the lyrics The lyrics of the switching point determine the target switching type; or, the electronic device can also determine the target switching type in other ways, which is not limited in the present disclosure.
  • the electronic device acquires at least one target audio switching point from the audio switching points belonging to the target switching type, where the target audio switching point corresponds to the screen switching point.
  • the electronic device can obtain the target audio corresponding to the screen switching point from the audio switching points belonging to the target switching type. switch point.
  • the electronic device after determining that the target switching type is a tempo switching point according to the screen switching point and at least one audio switching point of the switching type, the electronic device obtains the target tempo switching point corresponding to the screen switching point from the video to be processed .
  • the number of target audio switching points is the same as the number of screen switching points. That is, there is a one-to-one correspondence between the picture switching point and the target audio switching point, thereby avoiding that one target audio switching point corresponds to multiple picture switching points.
  • a plurality refers to two or more.
  • the electronic device generates an audio switching template corresponding to the audio data according to the target audio switching point.
  • the audio switching template is used to generate a video corresponding to the audio switching template.
  • the electronic device can generate an audio switch template corresponding to the audio data according to the target audio switch point.
  • the electronic device After acquiring the target tempo switching point from the video to be processed, the electronic device generates an audio switching template corresponding to the audio data according to the acquired target tempo switching point. Subsequently, after receiving the pictures or video clips uploaded by the user, the electronic device can add the pictures or video clips uploaded by the user to the beat switching point in the generated audio switching template in a preset order, so as to generate an audio switching template corresponding video.
  • the target switching type can be determined according to the screen switching point and at least one audio switching point of a switching type. From the audio switching points belonging to the target switching type, determine the target audio switching point corresponding to the screen switching point, and generate an audio switching template according to the determined target audio switching point.
  • the audio switching template can be generated without human participation, which improves the generation efficiency of the audio switching template.
  • the screen switching point can accurately correspond to the audio switching point in the video to be processed, which improves the accuracy of the screen switching point, thereby improving the audio switching template. The accuracy of the target audio switch point.
  • the electronic device determines the target switching type according to the screen switching point and the audio switching point of at least one switching type, including S201 to S202.
  • the electronic device respectively determines the coincidence degree between the screen switching point and the audio switching point of each switching type.
  • the coincidence degree is used to represent the number of screen switching points matching the audio switching point, and the proportion of the screen switching points.
  • a target handover type is determined.
  • the total duration of the videos to be processed is 10 seconds.
  • the picture switching points extracted by the electronic device from the video data of the video to be processed are the 1st second, the 3rd second and the 8th second.
  • the tempo switching points extracted by the electronic device from the audio data of the video to be processed are the 1st second, the 3rd second and the 7th second.
  • the melody switching points extracted by the electronic device from the audio data of the video to be processed are the 1st second, the 5th second and the 9th second.
  • the lyrics switching points extracted by the electronic device from the audio data of the video to be processed are the 2nd second, the 4th second and the 9th second.
  • the electronic device can determine that the overlapping degree of the picture switching point and the rhythm switching point is 2/3, the overlapping degree of the melody switching point is 1/3, and the overlapping degree of the lyrics switching point is 0.
  • the electronic device determines the switching type whose coincidence degree satisfies the first preset condition as the target switching type.
  • At least one degree of coincidence is obtained, and the electronic device obtains from the at least one degree of coincidence the corresponding switch type, and determine the switch type as the target switch type.
  • the first preset condition is the highest degree of overlap, that is, the highest degree of overlap between the screen switching point and the audio switching point of a certain switching type.
  • the electronic device determines that the overlapping degree of the picture switching point and the beat switching point is 2/3, the overlapping degree of the melody switching point is 1/3, and the overlapping degree of the lyrics switching point is 0. In this case, the electronic device determines the beat switching point as the target switching type.
  • the switching type is determined as the target switching type.
  • the number of picture switching points extracted by the electronic device from the video data of the video to be processed is five.
  • the number of beat switching points extracted by the electronic device from the audio data of the video to be processed is five.
  • the number of melody switching points extracted by the electronic device from the audio data of the video to be processed is six.
  • the number of lyrics switching points extracted by the electronic device from the audio data of the video to be processed is seven. In this case, the electronic device determines the beat switching point as the target switching type.
  • the electronic device determines the target switching type, it can firstly determine the degree of overlap between the screen switching point and the audio switching point of each switching type. Then, the electronic device determines the switching type with the highest overlap as the target switching type, so as to ensure that the target audio switching point corresponding to the screen switching point can be obtained from the target switching type, and the correction of the screen switching point can be realized. In order to determine the accuracy of the screen switching point.
  • the electronic device acquires the target audio switching point from the audio switching points belonging to the target switching type, including S301 or S302 .
  • S301 Determine, among the audio switching points belonging to the target switching type, an audio switching point that coincides with a screen switching point as a target audio switching point corresponding to the screen switching point.
  • the electronic device when acquiring the target audio switching point, can sequentially determine whether the screen switching point coincides with the audio switching point belonging to the target switching type. In the case that the screen switching point coincides with the audio switching point belonging to the target switching type, the electronic device determines the audio switching point coincident with the screen switching point as the target audio switching point corresponding to the screen switching point.
  • the first screen switching point extracted by the electronic device is the playback moment of the first second of the video to be processed
  • the first beat switching point is also the playback moment of the first second of the video to be processed.
  • the electronic device determines an audio switching point that satisfies a second preset condition among the audio switching points belonging to the target switching type as a target audio switching point corresponding to the screen switching point.
  • the electronic device when obtaining the target audio switching point among the audio switching points belonging to the target switching type, can sequentially determine whether the screen switching point coincides with the audio switching point belonging to the target switching type. In the case that the screen switching point does not coincide with the audio switching point belonging to the target switching type, the electronic device determines the audio switching point satisfying the second preset condition among the audio switching points as the target audio switching point corresponding to the screen switching point .
  • the second preset condition is the audio switching point with the shortest time difference from the screen switching point.
  • the audio switching point with the shortest time difference with the screen switching point is determined as the target audio switching point corresponding to the screen switching point, which can realize the correction of the screen switching point , which improves the accuracy of determining the screen switching point.
  • the first picture switching point extracted by the electronic device is the playback moment of the first second of the video to be processed
  • the first beat switching point is the playback moment of the second second of the video to be processed
  • the second beat switching point is the second second of the video to be processed. 3 seconds of playback time.
  • the electronic device determines the first beat switching point as the target audio switching point corresponding to the first picture switching point.
  • the method for generating an audio switching template provided in the embodiments of the present disclosure further includes: S401 to S402.
  • the electronic device acquires an original video.
  • the electronic device can acquire the original video corresponding to the video segment.
  • the electronic device divides the original video into at least one video segment, and determines one of the at least one video segment as a video to be processed.
  • the electronic device After acquiring the original video, the electronic device can divide the original video into at least one video segment, and determine one of the at least one video segment as the video to be processed. For each video segment in the original video, the electronic device can sequentially execute the solutions of S101-S105, so as to ensure that each video segment has a corresponding audio switching template.
  • the playback time of the original video is 30 seconds
  • the switching type of the 0-10 second video clip is beat switching point
  • the switching type of the 10-20 second video clip is the lyrics switching point
  • the switching type of is melody switching point.
  • the electronic device generates different audio transition templates for each video clip, which enriches the user experience.
  • the electronic device can acquire the original video corresponding to the video clip. After acquiring the original video, the electronic device can divide the original video into at least one video segment, and determine one of the at least one video segment as the video to be processed.
  • the schemes of S101-S105 respectively for each video clip in the original video it is ensured that each video clip has a corresponding audio switching template. It can also generate audio switching templates corresponding to different types of clips for complex video clips, which enriches the application scenarios of audio switching templates and further enriches the user experience.
  • the terminal/server in the embodiments of the present disclosure may include one or more hardware structures and/or software modules for implementing the foregoing corresponding audio switching template generation method, and these executing hardware structures and/or software modules may constitute an electronic device.
  • Those skilled in the art should easily realize that, in combination with the algorithm steps of the examples described in the embodiments disclosed herein, the present disclosure can be implemented in the form of hardware or a combination of hardware and computer software. Whether a certain function is executed by hardware or computer software drives hardware depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each particular application, but such implementation should not be considered beyond the scope of the present disclosure.
  • Embodiments of the present disclosure also provide correspondingly an audio switching template generation apparatus, which can be applied to electronic equipment.
  • Fig. 5 shows a schematic structural diagram of an apparatus for generating an audio switching template provided by an embodiment of the present disclosure.
  • the device for generating an audio switching template includes: an acquiring unit 501 , a processing unit 502 and a generating unit 503 .
  • the obtaining unit 501 is configured to extract a picture switching point from video data of the video to be processed.
  • the acquiring unit 501 is configured to execute S101.
  • the acquisition unit 501 is further configured to extract at least one audio switching point of a switching type from the audio data of the video to be processed; the switching type of the audio switching point includes: at least one of a rhythm switching point, a melody switching point or a lyrics switching point .
  • the acquiring unit 501 is configured to execute S102.
  • the processing unit 502 is configured to determine a target switching type according to the screen switching point and the audio switching point of the at least one switching type. For example, referring to FIG. 1 , the processing unit 502 is configured to execute S103.
  • the acquiring unit 501 is further configured to acquire a target audio switching point from audio switching points belonging to the target switching type, where the target audio switching point corresponds to the screen switching point.
  • the acquiring unit 501 is configured to execute S104.
  • the generating unit 503 is configured to generate an audio switching template corresponding to the audio data according to the target audio switching point; the audio switching template is used to generate a video corresponding to the audio switching template.
  • the generating unit 503 is configured to execute S105.
  • the processing unit 502 is configured to:
  • the coincidence degree between the picture switching point and the audio switching point of each switching type is respectively determined, and the coincidence degree is used to indicate the number of picture switching points matching the audio switching point, and the proportion of the picture switching points.
  • the processing unit 502 is configured to execute S201.
  • the switching type whose coincidence degree satisfies the first preset condition is determined as the target switching type.
  • the processing unit 502 is configured to execute S203.
  • the first preset condition is the highest coincidence degree.
  • the acquiring unit 501 is specifically configured to:
  • the audio switching point that coincides with the picture switching point is determined as the target audio switching point corresponding to the picture switching point.
  • the acquiring unit 501 is configured to execute S301.
  • the acquiring unit 501 is specifically configured to:
  • An audio switching point that satisfies the second preset condition among the audio switching points belonging to the target switching type is determined as a target audio switching point corresponding to the screen switching point.
  • the acquiring unit 501 is configured to execute S302.
  • the second preset condition is the audio switching point with the shortest time difference from the screen switching point.
  • the acquiring unit 501 is also configured to acquire the original video.
  • the acquiring unit 501 is configured to execute S401.
  • the processing unit 502 is further configured to divide the original video into at least one video segment, and determine one of the at least one video segment as the video to be processed. For example, referring to FIG. 4 , the processing unit 502 is configured to execute S402.
  • the embodiments of the present disclosure may divide the electronic device into functional modules according to the above method example.
  • the above-mentioned integrated modules can be implemented in the form of hardware or in the form of software function modules.
  • the division of modules in the embodiment of the present disclosure is schematic, and is only a logical function division, and there may be another division manner in actual implementation.
  • each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module.
  • an electronic device includes: a processor; a memory for storing program codes executable by the processor; wherein the processor is configured to execute the instructions to implement the following step:
  • the switching type of the audio switching point includes: at least one of a rhythm switching point, a melody switching point or a lyrics switching point;
  • an audio switching template corresponding to the audio data is generated; the audio switching template is used to generate a video corresponding to the audio switching template.
  • the processor is configured to execute the program code to implement the following steps:
  • the coincidence degree is used to indicate the number of picture switching points matching the audio switching point, and the proportion of the picture switching point;
  • the switching type whose coincidence degree satisfies the first preset condition is determined as the target switching type.
  • the first preset condition is the highest coincidence degree.
  • the processor is configured to execute the program code to implement the following steps:
  • the audio switching point that coincides with the picture switching point is determined as the target audio switching point corresponding to the picture switching point.
  • the processor is configured to execute the program code to implement the following steps:
  • An audio switching point that satisfies the second preset condition among the audio switching points belonging to the target switching type is determined as a target audio switching point corresponding to the screen switching point.
  • the second preset condition is the audio switching point with the shortest time difference from the picture switching point.
  • the processor is configured to execute the program code to implement the following steps:
  • the original video is divided into at least one video segment, and one video segment in the at least one video segment is determined as the video to be processed.
  • an embodiment of the present disclosure further provides a terminal.
  • the terminal may be a user terminal such as a mobile phone or a computer.
  • Fig. 6 shows a schematic structural diagram of a terminal provided by an embodiment of the present disclosure.
  • the terminal may be an audio switching template generation device and may include at least one processor 61 , a communication bus 62 , a memory 63 and at least one communication interface 64 .
  • the processor 61 may be a processor (central processing units, CPU), a micro-processing unit, an ASIC (Application Specific Integrated Circuit, application-specific integrated circuit), or one or more integrated circuits for controlling the execution of programs in the disclosed scheme.
  • CPU central processing units
  • ASIC Application Specific Integrated Circuit
  • FIG. 5 the functions implemented by the acquiring unit 501 , the processing unit 502 and the generating unit 503 in the electronic device are the same as those implemented by the processor 61 in FIG. 6 .
  • Communication bus 62 may include a path for communicating information between the components described above.
  • Communication interface 64 using any device such as a transceiver for communicating with other devices or communication networks, such as servers, Ethernet, radio access network (radio access network, RAN), wireless local area networks (wireless local area networks, WLAN) )Wait.
  • a transceiver for communicating with other devices or communication networks, such as servers, Ethernet, radio access network (radio access network, RAN), wireless local area networks (wireless local area networks, WLAN) )Wait.
  • radio access network radio access network
  • WLAN wireless local area networks
  • Memory 63 may be read-only memory (read-only memory, ROM) or other types of static storage devices that can store static information and instructions, random access memory (random access memory, RAM) or other types that can store information and instructions It can also be an electrically erasable programmable read-only memory (EEPROM), compact disc read-only memory (CD-ROM) or other optical disc storage, optical disc storage (including compact discs, laser discs, optical discs, digital versatile discs, Blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or can be used to carry or store desired program code in the form of instructions or data structures and can be programmed by a computer Any other medium accessed, but not limited to.
  • the memory may exist independently and be connected to the processing unit through a bus. Memory can also be integrated with the processing unit.
  • the memory 63 is used to store the application program code for executing the solution of the present disclosure, and the execution is controlled by the processor 61 .
  • the processor 61 is used to execute the application program code stored in the memory 63, so as to realize the functions in the method of the present disclosure.
  • the processor 61 may include one or more CPUs, for example, CPU0 and CPU1 in FIG. 6 .
  • the terminal may include multiple processors, for example, processor 61 and processor 65 in FIG. 6 .
  • processors may be a single-core (single-CPU) processor or a multi-core (multi-CPU) processor.
  • a processor herein may refer to one or more devices, circuits, and/or processing cores for processing data (eg, computer program instructions).
  • the terminal may further include an input device 66 and an output device 67 .
  • Input device 66 communicates with output device 67 and can accept user input in a variety of ways.
  • the input device 66 may be a mouse, a keyboard, a touch screen device, or a sensory device, among others.
  • Output device 67 is in communication with processor 61 and can display information in a variety of ways.
  • the output device 61 may be a liquid crystal display (liquid crystal display, LCD), a light emitting diode (light emitting diode, LED) display device, and the like.
  • FIG. 6 does not constitute a limitation on the terminal, and may include more or less components than shown in the figure, or combine certain components, or adopt different component arrangements.
  • FIG. 7 shows a schematic structural diagram of a server provided by an embodiment of the present disclosure.
  • the server may be an audio switching template generation device.
  • the server may have relatively large differences due to different configurations or performances, and may include one or more processors 71 and one or more memory 72 . Wherein, at least one instruction is stored in the memory 72, and at least one instruction is loaded and executed by the processor 71 to implement the audio switching template generation method provided by the above method embodiments.
  • the server may also have components such as wired or wireless network interfaces, keyboards, and input and output interfaces for input and output, and the server may also include other components for implementing device functions, which will not be described in detail here.
  • the present disclosure also provides a computer-readable storage medium including instructions, and instructions are stored on the computer-readable storage medium, and when the instructions in the computer-readable storage medium are executed by a processor of a computer device, the computer The method for generating an audio switching template provided by the above-mentioned embodiment can be executed.
  • the computer-readable storage medium may be the memory 63 including instructions, and the above-mentioned instructions can be executed by the processor 61 of the terminal to complete the above-mentioned method.
  • the computer-readable storage medium may be the memory 72 including instructions, and the above-mentioned instructions can be executed by the processor 71 of the server to complete the above-mentioned method.
  • the computer-readable storage medium may be a non-transitory computer-readable storage medium, for example, the non-transitory computer-readable storage medium may be ROM, RAM, CD-ROM, magnetic tape, floppy disk, and optical data storage device, etc.
  • the present disclosure also provides a computer program product, the computer program product includes computer instructions, and when the computer instructions are run on the electronic device, the electronic device is made to execute the above-mentioned steps shown in any one of Figures 1-4. Audio switching template generation method.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • User Interface Of Digital Computer (AREA)
  • Reverberation, Karaoke And Other Acoustics (AREA)

Abstract

一种音频切换模板生成方法及设备质,涉及计算机技术领域。方法包括:从待处理视频的视频数据中提取画面切换点(S101);从待处理视频的音频数据中提取至少一个切换类型的音频切换点(S102);根据画面切换点和至少一个切换类型的音频切换点,确定待目标切换类型(S103);从属于目标切换类型的音频切换点中,获取目标音频切换点(S104);根据目标音频切换点,生成与音频数据对应的音频切换模板(S105)。

Description

一种音频切换模板生成方法及设备
本公开基于申请号为202110764340.1、申请日为2021年07月06日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本公开作为参考。
技术领域
本公开涉及计算机技术领域,尤其涉及一种音频切换模板生成方法及设备。
背景技术
随着移动互联网的迅速发展,卡点视频的出现受到了越来越多人的喜爱,其中,卡点视频是指画面能够随着音乐的节奏切换的时间点进行切换的视频。
发明内容
本公开提供一种音频切换模板生成方法及设备。本公开实施例的技术方案如下:
根据本公开实施例的一方面,提供一种音频切换模板生成方法,该方法可以应用于电子设备。该方法包括:
从待处理视频的视频数据中提取画面切换点;
从所述待处理视频的音频数据中提取至少一个切换类型的音频切换点;所述音频切换点的切换类型包括:节拍切换点、旋律切换点或者歌词切换点中的至少一项;
根据所述画面切换点和所述至少一个切换类型的音频切换点,确定目标切换类型;
从属于所述目标切换类型的音频切换点中,获取目标音频切换点,所述目标音频切换点与所述画面切换点对应;
根据所述目标音频切换点,生成与所述音频数据对应的音频切换模板;所述音频切换模板用于生成与所述音频切换模板对应的视频。
根据本公开实施例的另一方面,提供一种音频切换模板生成装置,可以应用于电子设备。该装置可以包括:获取单元、处理单元和生成单元;
所述获取单元,用于从待处理视频的视频数据中提取画面切换点;
所述获取单元,还用于从所述待处理视频的音频数据中提取至少一个切换类型的音频切换点;所述音频切换点的切换类型包括:节拍切换点、旋律切换点或者歌词切换点中的至少一项;
所述处理单元,用于根据所述画面切换点和所述至少一个切换类型的音频切换点,确定 目标切换类型;
所述获取单元,还用于从属于所述目标切换类型的音频切换点中,获取目标音频切换点,所述目标音频切换点与所述画面切换点对应;
所述生成单元,用于根据所述目标音频切换点,生成与所述音频数据对应的音频切换模板;所述音频切换模板用于生成与所述音频切换模板对应的视频。
根据本公开实施例的另一方面,提供一种电子设备,可以包括:处理器和用于存储处理器可执行指令的存储器;其中,处理器被配置为执行所述指令,以实现如下步骤:
从待处理视频的视频数据中提取画面切换点;
从所述待处理视频的音频数据中提取至少一个切换类型的音频切换点;所述音频切换点的切换类型包括:节拍切换点、旋律切换点或者歌词切换点中的至少一项;
根据所述画面切换点和所述至少一个切换类型的音频切换点,确定目标切换类型;
从属于所述目标切换类型的音频切换点中,获取目标音频切换点,所述目标音频切换点与所述画面切换点对应;
根据所述目标音频切换点,生成与所述音频数据对应的音频切换模板;所述音频切换模板用于生成与所述音频切换模板对应的视频。
根据本公开实施例的另一方面,提供一种计算机可读存储介质,计算机可读存储介质上存储有指令,所述计算机可读存储介质中的指令由电子设备的处理器执行,使得所述电子设备能够执行如下步骤:
从待处理视频的视频数据中提取画面切换点;
从所述待处理视频的音频数据中提取至少一个切换类型的音频切换点;所述音频切换点的切换类型包括:节拍切换点、旋律切换点或者歌词切换点中的至少一项;
根据所述画面切换点和所述至少一个切换类型的音频切换点,确定目标切换类型;
从属于所述目标切换类型的音频切换点中,获取目标音频切换点,所述目标音频切换点与所述画面切换点对应;
根据所述目标音频切换点,生成与所述音频数据对应的音频切换模板;所述音频切换模板用于生成与所述音频切换模板对应的视频。
根据本公开实施例的另一方面,提供一种计算机程序产品,该计算机程序产品包括计算 机指令,计算机指令在电子设备上运行,使得电子设备执行如下步骤:
从待处理视频的视频数据中提取画面切换点;
从所述待处理视频的音频数据中提取至少一个切换类型的音频切换点;所述音频切换点的切换类型包括:节拍切换点、旋律切换点或者歌词切换点中的至少一项;
根据所述画面切换点和所述至少一个切换类型的音频切换点,确定目标切换类型;
从属于所述目标切换类型的音频切换点中,获取目标音频切换点,所述目标音频切换点与所述画面切换点对应;
根据所述目标音频切换点,生成与所述音频数据对应的音频切换模板;所述音频切换模板用于生成与所述音频切换模板对应的视频。
本公开的实施例中,在从待处理视频中提取画面切换点和至少一个切换类型的音频切换点后,能够根据该画面切换点和至少一个切换类型的音频切换点确定目标切换类型。从属于该目标切换类型的音频切换点中,确定与画面切换点对应的目标音频切换点,并根据确定的目标音频切换点生成音频切换模板。上述方案,无需人工参与即可生成音频切换模版,提高了音频切换模板的生成效率。其次,在画面切换点与目标音频切换点对应的情况下,画面切换点能够准确的对应上待处理视频中的音频切换点,提高了画面切换点的准确度,进而提高了音频切换模板中的目标音频切换点的准确度。
附图说明
图1示出了本公开实施例提供的一种音频切换模板生成方法的流程示意图。
图2示出了本公开实施例提供的又一种音频切换模板生成方法的流程示意图。
图3示出了本公开实施例提供的又一种音频切换模板生成方法的流程示意图。
图4示出了本公开实施例提供的又一种音频切换模板生成方法的流程示意图。
图5示出了本公开实施例提供的又一种音频切换模板生成装置的结构示意图。
图6示出了本公开实施例提供的一种终端的结构示意图。
图7示出了本公开实施例提供的一种服务器的结构示意图。
具体实施方式
本公开所涉及的数据可以为经用户授权或者经过各方充分授权的数据。
现有技术中生成卡点视频的方法主要是人工使用视频编辑软件制作音频切换模板,根据音频切换模板和接收到的图片或视频片段,生成卡点视频。然而,人工制作音频切换模板的效率极低。另外,由于上述方案中音频切换点的位置完全人工确定,从而导致确定的音频切 换点的位置的准确性较差。
本公开实施例提供一种音频切换模板生成方法,在从待处理视频中提取画面切换点和至少一个切换类型的音频切换点后,能够根据该画面切换点和至少一个切换类型的音频切换点确定目标切换类型。从属于该目标切换类型的音频切换点中,确定与画面切换点对应的目标音频切换点,并根据确定的目标音频切换点生成音频切换模板。上述方案,无需人工参与即可生成音频切换模版,提高了音频切换模板的生成效率。其次,在画面切换点与目标音频切换点对应的情况下,画面切换点能够准确的对应上待处理视频中的音频切换点,提高了画面切换点的准确度,进而提高了音频切换模板中的目标音频切换点的准确度。
以下对本公开实施例提供的音频切换模板生成方法进行示例性说明:
本公开提供的音频切换模板生成方法应用于电子设备。
在一些实施例中,电子设备是服务器,或者是终端,或者是其他用于进行音频切换模板生成的电子设备,本公开对此不作限定。
其中,服务器是单独的一个服务器,或者是由多个服务器构成的服务器集群。在一些实施例中,服务器集群是分布式集群。本公开对服务器的具体实现方式也不作限制。
终端是手机、平板电脑、桌面型、膝上型、手持计算机、笔记本电脑、超级移动个人计算机(ultra-mobile personal computer,UMPC)、上网本,以及蜂窝电话、个人数字助理(personal digital assistant,PDA)、增强现实(augmented reality,AR)、虚拟现实(virtual reality,VR)设备等安装并使用内容社区应用的设备,本公开对该电子设备的具体形态不作特殊限制。终端能够通过键盘、触摸板、触摸屏、遥控器、语音交互或手写设备等一种或多种方式与用户进行人机交互。
下面结合附图对本公开实施例提供的音频切换模板生成方法进行详细介绍。
如图1所示,在音频切换模板生成方法应用于电子设备的情况下,该音频切换模板生成方法可以包括S101至S105。
S101、电子设备从待处理视频的视频数据中提取画面切换点。
在获取到待处理视频后,电子设备可以从待处理视频的视频数据中提取画面切换点。其中,待处理视频为卡点视频,也即画面能够随着音乐的节奏切换的时间点进行切换的视频。画面切换点为待处理视频中画面切换的时刻。
在一些实施例中,电子设备从待处理视频的视频数据中提取画面切换点时,能够利用画面切换检测技术,提取画面切换点;或者,电子设备能够将待处理视频划分为多个视频帧,将划分视频帧的时刻确定为画面切换点,以得到画面切换点;或者,电子设备还能够通过其 他方式从待处理视频的视频数据中提取画面切换点,本公开对此不作限定。
在一些实施例中,在生成音频切换模板时,电子设备能够从大量的卡点视频中,获取热度较高的视频作为待处理视频。其中,视频的热度可以由视频的播放量、点赞量或者转发量中的至少一种来表征。
在一些实施例中,大量的卡点视频由电子设备通过搜索关键词,利用搜索能力搜索得到;或者由电子设备通过视频分类算法,对多个视频进行分类,确定包括卡点视频最多的类别,获取属于该类别的视频;或者,由电子设备确定发布卡点视频的博主,获取该博主发送的卡点视频,本公开对此不作限定。
S102、电子设备从待处理视频的音频数据中提取至少一个切换类型的音频切换点。
其中,每个切换类型能够提取到至少一个音频切换点。音频切换点的切换类型包括:节拍(beat)切换点、旋律(onset)切换点或者歌词切换点中的至少一项。其中,节拍切换点为待处理视频中节拍切换的时刻。旋律切换点为待处理视频中旋律切换的时刻。歌词切换点为待处理视频中歌词切换的时刻。在一些实施例中,该音频切换点也被称为卡点、节点。
卡点视频的音频数据一般包括各种各样的节点,例如节拍切换点、旋律切换点或者歌词切换点等。在获取到待处理视频后,电子设备能够从待处理视频的音频数据中提取至少一个切换类型的音频切换点。
需要说明的是,在待处理视频的音频数据中包括旋律切换点的情况下,说明待处理视频的音频数据中不包括人声。在这种情况下,电子设备只需提取旋律切换点和节拍切换点即可,无需提取歌词切换点。在待处理视频的音频数据中包括歌词切换点的情况下,说明待处理视频的音频数据中包括人声。在这种情况下,电子设备只需提取歌词切换点和节拍切换点即可,无需提取旋律切换点。
在一些实施例中,电子设备从待处理视频的音频数据中提取节拍切换点时,能够采用beat tracking(节拍跟踪)算法对待处理视频的音频数据进行处理,得到该音频数据的节拍切换点。
在一些实施例中,电子设备从待处理视频的音频数据中提取旋律切换点时,能够采用onset识别算法对待处理视频的音频数据进行处理,得到该音频数据的旋律切换点。
在一些实施例中,电子设备从待处理视频的音频数据中提取歌词切换点时,能够获取待处理视频的音频数据中的歌词内容,根据歌词内容确定该音频数据的歌词切换点。
需要说明的是,本公开对于S101和S102的先后顺序不作限定,电子设备能够先执行S101,后执行S102;或者,电子设备能够先执行S102,后执行S101;或者,电子设备还能够同时 执行S101和S102;本公开对此不作限定。
S103、电子设备根据画面切换点和至少一个切换类型的音频切换点,确定目标切换类型。
在从待处理视频的视频数据中提取画面切换点,以及从待处理视频的音频数据中提取至少一个切换类型的音频切换点后,电子设备能够根据画面切换点和至少一个切换类型的音频切换点,确定目标切换类型,该目标切换类型为待处理视频对应的切换类型。
在一些实施例中,电子设备能够根据画面切换点与每个切换类型的音频切换点之间的重合度,确定目标切换类型;或者,电子设备能够也可以根据画面切换点的画面内容,与歌词切换点的歌词,确定目标切换类型;或者,电子设备还能够通过其他方式,确定目标切换类型,本公开对此不作限定。
S104、电子设备从属于目标切换类型的音频切换点中,获取至少一个目标音频切换点,该目标音频切换点与该画面切换点对应。
在一些实施例中,在根据画面切换点和至少一个切换类型的音频切换点,确定目标切换类型后,电子设备能够从属于目标切换类型的音频切换点中,获取与画面切换点对应的目标音频切换点。
在一些实施例中,在根据画面切换点和至少一个切换类型的音频切换点,确定目标切换类型为节拍切换点后,电子设备从待处理视频中,获取与画面切换点对应的目标节拍切换点。
需要说明的是,在画面切换点的数量为多个的情况下,目标音频切换点的数量与画面切换点的数量相同。即画面切换点与目标音频切换点为一一对应的关系,从而避免一个目标音频切换点对应多个画面切换点。其中,多个是指两个或两个以上。
S105、电子设备根据目标音频切换点,生成与音频数据对应的音频切换模板。
其中,音频切换模板用于生成与音频切换模板对应的视频。
在一些实施例中,在从属于目标切换类型的音频切换点中,获取目标音频切换点后,电子设备能够根据目标音频切换点,生成与音频数据对应的音频切换模板。
结合上述示例,从待处理视频中,获取目标节拍切换点后,电子设备根据获取到的目标节拍切换点,生成与音频数据对应的音频切换模板。后续,在接收到用户上传的图片或者视频片段后,电子设备可以将用户上传的图片或者视频片段按照预设顺序,添加到生成好的音频切换模板中节拍切换点处,以生成与音频切换模板对应的视频。
由S101-S105可知,在从待处理视频中提取画面切换点和至少一个切换类型的音频切换点后,能够根据该画面切换点和至少一个切换类型的音频切换点确定目标切换类型。从属于该目标切换类型的音频切换点中,确定与画面切换点对应的目标音频切换点,并根据确定的 目标音频切换点生成音频切换模板。无需人工参与即可生成音频切换模版,提高了音频切换模板的生成效率。其次,在画面切换点与目标音频切换点对应的情况下,画面切换点能够准确的对应上待处理视频中的音频切换点,提高了画面切换点的准确度,进而提高了音频切换模板中的目标音频切换点的准确度。
在一些实施例中,如图2所示,上述S103中,电子设备根据画面切换点和至少一个切换类型的音频切换点,确定目标切换类型,包括S201至S202。
S201、电子设备分别确定画面切换点与每个切换类型的音频切换点之间的重合度。
其中,该重合度用于表示与音频切换点匹配的画面切换点的数量,在画面切换点中所占的比例。
在一些实施例中,电子设备在根据画面切换点和至少一个切换类型的音频切换点,确定目标切换类型时,能够根据该画面切换点与每个切换类型的音频切换点之间的重合度,从该至少一个切换类型中,确定目标切换类型。
例如,待处理视频的总时长为10秒。电子设备从待处理视频的视频数据中提取到的画面切换点为第1秒、第3秒和第8秒。电子设备从待处理视频的音频数据中提取到的节拍切换点为第1秒、第3秒和第7秒。电子设备从待处理视频的音频数据中提取到的旋律切换点为第1秒、第5秒和第9秒。电子设备从待处理视频的音频数据中提取到的歌词切换点为第2秒、第4秒和第9秒。由于3个画面切换点中有2个与节拍切换点重合,1个与旋律切换点重合,没有与歌词切换点重合的画面切换点。在这种情况下,电子设备能够确定画面切换点与节拍切换点的重合度为2/3,与旋律切换点的重合度为1/3,与歌词切换点的重合度为0。
S202、电子设备将重合度满足第一预设条件切换类型,确定为目标切换类型。
在确定画面切换点与至少一个切换类型的音频切换点之间的重合度后,得到至少一个重合度,电子设备从该至少一个重合度中,获取满足第一预设条件的重合度所对应的切换类型,并将该切换类型确定为目标切换类型。
在一些实施例中,第一预设条件是重合度最高,即画面切换点与某个切换类型的音频切换点的重合度最高。通过将与画面切换点重合度最高的音频切换点对应的切换类型确定为目标切换类型,能够保证后续从该目标切换类型中获取与画面切换点对应的目标音频切换点,提高了确定目标音频切换点的准确度。
例如,电子设备确定画面切换点与节拍切换点的重合度为2/3,与旋律切换点的重合度为1/3,与歌词切换点的重合度为0。在这种情况下,电子设备将节拍切换点确定为目标切换类型。
在一些实施例中,在画面切换点数量与某个切换类型的音频切换点的数量相同的情况下,将该切换类型确定为目标切换类型。
例如,电子设备从待处理视频的视频数据中提取到的画面切换点的数量为5个。电子设备从待处理视频的音频数据中提取到的节拍切换点的数量为5个。电子设备从待处理视频的音频数据中提取到的旋律切换点的数量为6个。电子设备从待处理视频的音频数据中提取到的歌词切换点的数量为7个。在这种情况下,电子设备将节拍切换点确定为目标切换类型。
由S201-S202可知,电子设备确定目标切换类型时,能够先分别确定画面切换点与每个切换类型的音频切换点之间的重合度。然后,电子设备将重合度最高的切换类型,确定为目标切换类型,从而能够保证能够从该目标切换类型中获取与画面切换点对应的目标音频切换点,能够实现对画面切换点的校正,提高了确定画面切换点的准确度。
在一些实施例中,如图3所示,上述S104中,电子设备从属于目标切换类型的音频切换点中,获取目标音频切换点,包括S301或S302。
S301、将属于目标切换类型的音频切换点中与画面切换点重合的音频切换点,确定为画面切换点对应的目标音频切换点。
其中,在从属于目标切换类型的音频切换点中,获取目标音频切换点时,电子设备能够依次判断画面切换点与属于目标切换类型的音频切换点是否重合。在画面切换点与属于目标切换类型的音频切换点重合的情况下,电子设备将与画面切换点重合的音频切换点,确定为画面切换点对应的目标音频切换点。
例如,电子设备提取到的第一个画面切换点为待处理视频第1秒的播放时刻,第一个节拍切换点也是待处理视频第1秒的播放时刻。在确定待处理视频的目标切换类型为节拍切换点后,由于第一个画面切换点和第一个节拍切换点重合(即第一个画面切换点和第一个节拍切换点都是待处理视频第1秒的播放时刻),因此,将第一个节拍切换点确定为与第一个画面切换点对应的目标音频切换点。
S302、电子设备将属于目标切换类型的音频切换点中满足第二预设条件的音频切换点,确定为画面切换点对应的目标音频切换点。
在一些实施例中,在从属于目标切换类型的音频切换点中,获取目标音频切换点时,电子设备能够依次判断画面切换点与属于目标切换类型的音频切换点是否重合。在画面切换点与属于目标切换类型的音频切换点不重合的情况下,则电子设备将音频切换点中满足第二预设条件的音频切换点,确定为该画面切换点对应的目标音频切换点。
在一些实施例中,第二预设条件为与画面切换点的时间差最短的音频切换点。对于与属 于目标切换类型的音频切换点不重合的画面切换点,将与画面切换点的时间差最短的音频切换点确定为与画面切换点对应的目标音频切换点,能够实现对画面切换点的校正,提高了确定画面切换点的准确度。
例如,电子设备提取到的第一个画面切换点为待处理视频第1秒的播放时刻,第一个节拍切换点待处理视频第2秒的播放时刻,第二个节拍切换点待处理视频第3秒的播放时刻。在确定目标切换类型为节拍切换点后,由于第一个画面切换点和第一个节拍切换点不重合,表示该画面切换点存在误差。在第二预设条件为与画面切换点的时间差最短的音频切换点点的情况下,由于第一个节拍切换点与第一个画面切换点的时间差(1秒)小于第二个节拍切换点与第一个画面切换点的时间差(2秒),因此,电子设备将第一个节拍切换点确定为与第一个画面切换点对应的目标音频切换点。
由S301-S302可知,在确定目标切换类型后,在该目标切换类型中的音频切换点与画面切换点重合的情况下,则将重合的音频切换点直接确定为与画面切换点对应的目标音频切换点。在该目标切换类型中的音频切换点与画面切换点不重合时,则将满足第二预设条件的音频切换点确定为与画面切换点对应的目标音频切换点,实现对画面切换点的校正,提高了确定画面切换点的准确度。
在一些实施例中,本公开实施例提供的音频切换模板生成方法还包括:S401至S402。
S401、电子设备获取原始视频。
在获取待处理视频时,当待处理视频为一个视频片段时,则电子设备能够获取该视频片段对应的原始视频。
S402、电子设备将原始视频划分为至少一个视频片段,并将至少一个视频片段中的一个视频片段确定为待处理视频。
在获取原始视频后,电子设备能够将原始视频划分为至少一个视频片段,并将至少一个视频片段中的一个视频片段确定为待处理视频。对于原始视频中的每个视频片段,电子设备都能够依次执行S101-S105的方案,从而保证每个视频片段都有对应的音频切换模板。
例如,原始视频的播放时长为30秒,0-10秒的视频片段的切换类型为节拍切换点,10秒-20秒的视频片段的切换类型为歌词切换点,20秒-30秒的视频片段的切换类型为旋律切换点。在这种情况下,电子设备为每个视频片段生成不同的音频切换模板,丰富了用户体验。
由S401-S402可知,当待处理视频为一个视频片段时,则电子设备能够获取该视频片段对应的原始视频。在获取原始视频后,电子设备能够将原始视频划分为至少一个视频片段,并将至少一个视频片段中的一个视频片段确定为待处理视频。通过对于原始视频中的每个视 频片段都分别执行S101-S105的方案,从而保证每个视频片段都有对应的音频切换模板。对于复杂的卡点视频也能很好的生成不同类型片段对应的音频切换模板,丰富了音频切换模板的应用场景,进而丰富了用户体验。
本公开实施例中的终端/服务器可以包含有用于实现前述对应音频切换模板生成方法的一个或多个硬件结构和/或软件模块,这些执行硬件结构和/或软件模块可以构成一个电子设备。本领域技术人员应该很容易意识到,结合本文中所公开的实施例描述的各示例的算法步骤,本公开能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本公开的范围。
本公开实施例还对应提供一种音频切换模板生成装置,可以应用于电子设备。图5示出了本公开实施例提供的音频切换模板生成装置的结构示意图。如图5所示,该音频切换模板生成装置包括:获取单元501、处理单元502和生成单元503。
获取单元501,用于从待处理视频的视频数据中提取画面切换点。例如,结合图1,获取单元501用于执行S101。
获取单元501,还用于从待处理视频的音频数据中提取至少一个切换类型的音频切换点;该音频切换点的切换类型包括:节拍切换点、旋律切换点或者歌词切换点中的至少一项。例如,结合图1,获取单元501用于执行S102。
处理单元502,用于根据该画面切换点和该至少一个切换类型的音频切换点,确定目标切换类型。例如,结合图1,处理单元502用于执行S103。
获取单元501,还用于从属于目标切换类型的音频切换点中,获取目标音频切换点,该目标音频切换点与该画面切换点对应。例如,结合图1,获取单元501用于执行S104。
生成单元503,用于根据目标音频切换点,生成与音频数据对应的音频切换模板;该音频切换模板用于生成与该音频切换模板对应的视频。例如,结合图1,生成单元503用于执行S105。
在一些实施例中,处理单元502,用于:
分别确定画面切换点与每个切换类型的音频切换点之间的重合度,该重合度用于表示与音频切换点匹配的画面切换点的数量,在画面切换点中所占的比例。例如,结合图2,处理单元502用于执行S201。
将重合度满足第一预设条件的切换类型,确定为该目标切换类型。例如,结合图2,处理单元502用于执行S203。
在一些实施例中,第一预设条件为重合度最高。
在一些实施例中,获取单元501,具体用于:
将属于该目标切换类型的音频切换点中与该画面切换点重合的音频切换点,确定为该画面切换点对应的目标音频切换点。例如,结合图3,获取单元501用于执行S301。
在一些实施例中,获取单元501,具体用于:
将属于该目标切换类型的音频切换点中满足第二预设条件的音频切换点,确定为该画面切换点对应的目标音频切换点。例如,结合图3,获取单元501用于执行S302。
在一些实施例中,第二预设条件为与画面切换点的时间差最短的音频切换点。
在一些实施例中,获取单元501,还用于获取原始视频。例如,结合图4,获取单元501用于执行S401。
处理单元502,还用于将原始视频划分为至少一个视频片段,并将至少一个视频片段中的一个视频片段确定为待处理视频。例如,结合图4,处理单元502用于执行S402。
如上所述,本公开实施例可以根据上述方法示例对电子设备进行功能模块的划分。其中,上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。另外,还需要说明的是,本公开实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。例如,可以对应各个功能划分各个功能模块,也可以将两个或两个以上的功能集成在一个处理模块中。
关于上述实施例中的音频切换模板生成装置,其中各个模块执行操作的具体方式、以及具备的有益效果,均已经在前述方法实施例中进行了详细描述,此处不再赘述。
在示例性实施例中,提供了一种电子设备,该电子设备包括:处理器;用于存储该处理器可执行程序代码的存储器;其中,该处理器被配置为执行该指令,以实现如下步骤:
从待处理视频的视频数据中提取画面切换点;
从该待处理视频的音频数据中提取至少一个切换类型的音频切换点;该音频切换点的切换类型包括:节拍切换点、旋律切换点或者歌词切换点中的至少一项;
根据该画面切换点和该至少一个切换类型的音频切换点,确定目标切换类型;
从属于该目标切换类型的音频切换点中,获取目标音频切换点,该目标音频切换点与该画面切换点对应;
根据该目标音频切换点,生成与该音频数据对应的音频切换模板;该音频切换模板用于生成与该音频切换模板对应的视频。
在一些实施例中,该处理器被配置为执行该程序代码,以实现如下步骤:
分别确定该画面切换点与每个切换类型的音频切换点之间的重合度,该重合度用于表示与音频切换点匹配的画面切换点的数量,在该画面切换点中所占的比例;
将重合度满足第一预设条件的切换类型,确定为该目标切换类型。
在一些实施例中,该第一预设条件为重合度最高。
在一些实施例中,该处理器被配置为执行该程序代码,以实现如下步骤:
将属于该目标切换类型的音频切换点中与该画面切换点重合的音频切换点,确定为该画面切换点对应的目标音频切换点。
在一些实施例中,该处理器被配置为执行该程序代码,以实现如下步骤:
将属于该目标切换类型的音频切换点中满足第二预设条件的音频切换点,确定为该画面切换点对应的目标音频切换点。
在一些实施例中,该第二预设条件为与该画面切换点的时间差最短的音频切换点。
在一些实施例中,该处理器被配置为执行该程序代码,以实现如下步骤:
获取原始视频;
将该原始视频划分为至少一个视频片段,并将该至少一个视频片段中的一个视频片段确定为该待处理视频。
在一些实施例中,本公开实施例还提供一种终端,电子设备被提供为终端时,终端可以是手机、电脑等用户终端。图6示出了本公开实施例提供的终端的结构示意图。该终端可以是音频切换模板生成装置可以包括至少一个处理器61,通信总线62,存储器63以及至少一个通信接口64。
处理器61可以是一个处理器(central processing units,CPU),微处理单元,ASIC(Application Specific Integrated Circuit,专用集成电路),或一个或多个用于控制本公开方案程序执行的集成电路。作为一个示例,结合图5,电子设备中的获取单元501、处理单元502和生成单元503实现的功能与图6中的处理器61实现的功能相同。
通信总线62可包括一通路,在上述组件之间传送信息。
通信接口64,使用任何收发器一类的装置,用于与其他设备或通信网络通信,如服务器、以太网,无线接入网(radio access network,RAN),无线局域网(wireless local area  networks,WLAN)等。作为一个示例,
存储器63可以是只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)或者可存储信息和指令的其他类型的动态存储设备,也可以是电可擦可编程只读存储器(electrically erasable programmable read-only memory,EEPROM)、只读光盘(compact disc read-only memory,CD-ROM)或其他光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其他磁存储设备、或者能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。存储器可以是独立存在,通过总线与处理单元相连接。存储器也可以和处理单元集成在一起。
其中,存储器63用于存储执行本公开方案的应用程序代码,并由处理器61来控制执行。处理器61用于执行存储器63中存储的应用程序代码,从而实现本公开方法中的功能。
在具体实现中,作为一种实施例,处理器61可以包括一个或多个CPU,例如图6中的CPU0和CPU1。
在具体实现中,作为一种实施例,终端可以包括多个处理器,例如图6中的处理器61和处理器65。这些处理器中的每一个可以是一个单核(single-CPU)处理器,也可以是一个多核(multi-CPU)处理器。这里的处理器可以指一个或多个设备、电路、和/或用于处理数据(例如计算机程序指令)的处理核。
在具体实现中,作为一种实施例,终端还可以包括输入设备66和输出设备67。输入设备66和输出设备67通信,可以以多种方式接受用户的输入。例如,输入设备66可以是鼠标、键盘、触摸屏设备或传感设备等。输出设备67和处理器61通信,可以以多种方式来显示信息。例如,输出设备61可以是液晶显示器(liquid crystal display,LCD),发光二级管(light emitting diode,LED)显示设备等。
本领域技术人员可以理解,图6中示出的结构并不构成对终端的限定,可以包括比图示更多或更少的组件,或者组合某些组件,或者采用不同的组件布置。
本公开实施例还提供一种服务器。图7示出了本公开实施例提供的服务器的结构示意图。该服务器可以是音频切换模板生成装置。该服务器可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上处理器71和一个或一个以上的存储器72。其中,存储器72中存储有至少一条指令,至少一条指令由处理器71加载并执行以实现上述各个方法实施例提供的音频切换模板生成方法。当然,该服务器还可以具有有线或无线网络接口、键盘以及输入输出接口等部件,以便进行输入输出,该服务器还可以包括其他用于实现设备功能的部件,在此 不做赘述。
本公开还提供了一种包括指令的计算机可读存储介质,所述计算机可读存储介质上存储有指令,当所述计算机可读存储介质中的指令由计算机设备的处理器执行时,使得计算机能够执行上述所示实施例提供的音频切换模板生成方法。例如,计算机可读存储介质可以为包括指令的存储器63,上述指令可由终端的处理器61执行以完成上述方法。又例如,计算机可读存储介质可以为包括指令的存储器72,上述指令可由服务器的处理器71执行以完成上述方法。可选地,计算机可读存储介质可以是非临时性计算机可读存储介质,例如,所述非临时性计算机可读存储介质可以是ROM、RAM、CD-ROM、磁带、软盘和光数据存储设备等。
本公开还提供了一种计算机程序产品,该计算机程序产品包括计算机指令,当所述计算机指令在电子设备上运行时,使得所述电子设备执行上述图1-图4任一附图所示的音频切换模板生成方法。
本公开所有实施例均可以单独被执行,也可以与其他实施例相结合被执行,均视为本公开要求的保护范围。

Claims (23)

  1. 一种音频切换模板生成方法,包括:
    从待处理视频的视频数据中提取画面切换点;
    从所述待处理视频的音频数据中提取至少一个切换类型的音频切换点;所述音频切换点的切换类型包括:节拍切换点、旋律切换点或者歌词切换点中的至少一项;
    根据所述画面切换点和所述至少一个切换类型的音频切换点,确定目标切换类型;
    从属于所述目标切换类型的音频切换点中,获取目标音频切换点,所述目标音频切换点与所述画面切换点对应;
    根据所述目标音频切换点,生成与所述音频数据对应的音频切换模板;所述音频切换模板用于生成与所述音频切换模板对应的视频。
  2. 根据权利要求1所述的音频切换模板生成方法,其中,所述根据所述画面切换点和所述至少一个切换类型的音频切换点,确定目标切换类型,包括:
    分别确定所述画面切换点与每个切换类型的音频切换点之间的重合度,所述重合度用于表示与音频切换点匹配的画面切换点的数量,在所述画面切换点中所占的比例;
    将重合度满足第一预设条件的切换类型,确定为所述目标切换类型。
  3. 根据权利要求2所述的音频切换模板生成方法,其中,所述第一预设条件为重合度最高。
  4. 根据权利要求1所述的音频切换模板生成方法,其中,所述从属于所述目标切换类型的音频切换点中,获取目标音频切换点,包括:
    将属于所述目标切换类型的音频切换点中与所述画面切换点重合的音频切换点,确定为所述画面切换点对应的目标音频切换点。
  5. 根据权利要求1所述的音频切换模板生成方法,其中,所述从属于所述目标切换类型的音频切换点中,获取目标音频切换点,包括:
    将属于所述目标切换类型的音频切换点中满足第二预设条件的音频切换点,确定为所述画面切换点对应的目标音频切换点。
  6. 根据权利要求5所述的音频切换模板生成方法,其中,所述第二预设条件为与所述画面切换点的时间差最短的音频切换点。
  7. 根据权利要求1-6任一项所述的音频切换模板生成方法,其中,还包括:
    获取原始视频;
    将所述原始视频划分为至少一个视频片段,并将所述至少一个视频片段中的一个视频片段确定为所述待处理视频。
  8. 一种音频切换模板生成装置,包括:获取单元、处理单元和生成单元;
    所述获取单元,用于从待处理视频的视频数据中提取画面切换点;
    所述获取单元,还用于从所述待处理视频的音频数据中提取至少一个切换类型的音频切换点;所述音频切换点的切换类型包括:节拍切换点、旋律切换点或者歌词切换点中的至少一项;
    所述处理单元,用于根据所述画面切换点和所述至少一个切换类型的音频切换点,确定目标切换类型;
    所述获取单元,还用于从属于所述目标切换类型的音频切换点中,获取目标音频切换点,所述目标音频切换点与所述画面切换点对应;
    所述生成单元,用于根据所述目标音频切换点,生成与所述音频数据对应的音频切换模板;所述音频切换模板用于生成与所述音频切换模板对应的视频。
  9. 根据权利要求8所述的音频切换模板生成装置,其中,所述处理单元,用于:
    分别确定所述画面切换点与每个切换类型的音频切换点之间的重合度,所述重合度用于表示与音频切换点重合的画面切换点的数量,在所述画面切换点中所占的比例;
    将重合度满足第一预设条件的切换类型,确定为所述目标切换类型。
  10. 根据权利要求9所述的音频切换模板生成装置,其中,所述第一预设条件为重合度最高。
  11. 根据权利要求8所述的音频切换模板生成装置,其中,所述获取单元,用于:
    将属于所述目标切换类型的音频切换点中与所述画面切换点重合的音频切换点,确定为所述画面切换点对应的目标音频切换点。
  12. 根据权利要求8所述的音频切换模板生成装置,其中,所述获取单元,用于:
    将属于所述目标切换类型的音频切换点中满足第二预设条件的音频切换点,确定为所述画面切换点对应的目标音频切换点。
  13. 根据权利要求12所述的音频切换模板生成装置,其中,所述第二预设条件为与所述画面切换点的时间差最短的音频切换点。
  14. 根据权利要求8-13任一项所述的音频切换模板生成装置,其中,
    所述获取单元,还用于获取原始视频;
    所述处理单元,还用于将所述原始视频划分为至少一个视频片段,并将所述至少一个视频片段中的一个视频片段确定为所述待处理视频。
  15. 一种电子设备,所述电子设备包括:
    处理器;
    用于存储所述处理器可执行指令的存储器;
    其中,所述处理器被配置为执行所述指令,以实现如下步骤:
    从待处理视频的视频数据中提取画面切换点;
    从所述待处理视频的音频数据中提取至少一个切换类型的音频切换点;所述音频切换点的切换类型包括:节拍切换点、旋律切换点或者歌词切换点中的至少一项;
    根据所述画面切换点和所述至少一个切换类型的音频切换点,确定目标切换类型;
    从属于所述目标切换类型的音频切换点中,获取目标音频切换点,所述目标音频切换点与所述画面切换点对应;
    根据所述目标音频切换点,生成与所述音频数据对应的音频切换模板;所述音频切换模板用于生成与所述音频切换模板对应的视频。
  16. 根据权利要求15所述的电子设备,其中,所述处理器被配置为执行所述程序代码,以实现如下步骤:
    分别确定所述画面切换点与每个切换类型的音频切换点之间的重合度,所述重合度用于表示与音频切换点匹配的画面切换点的数量,在所述画面切换点中所占的比例;
    将重合度满足第一预设条件的切换类型,确定为所述目标切换类型。
  17. 根据权利要求16所述的电子设备,其中,所述第一预设条件为重合度最高。
  18. 根据权利要求15所述的电子设备,其中,所述处理器被配置为执行所述程序代码,以实现如下步骤:
    将属于所述目标切换类型的音频切换点中与所述画面切换点重合的音频切换点,确定为所述画面切换点对应的目标音频切换点。
  19. 根据权利要求15所述的电子设备,其中,所述处理器被配置为执行所述程序代码,以实现如下步骤:
    将属于所述目标切换类型的音频切换点中满足第二预设条件的音频切换点,确定为所述画面切换点对应的目标音频切换点。
  20. 根据权利要求19所述的电子设备,其中,所述第二预设条件为与所述画面切换点的时间差最短的音频切换点。
  21. 根据权利要求15-20任一项所述的电子设备,其中,所述处理器被配置为执行所述程序代码,以实现如下步骤:
    获取原始视频;
    将所述原始视频划分为至少一个视频片段,并将所述至少一个视频片段中的一个视频片段确定为所述待处理视频。
  22. 一种非易失性计算机可读存储介质,所述计算机可读存储介质上存储有指令,所述计算机可读存储介质中的指令由电子设备的处理器执行,使得所述电子设备能够执行如下步骤:
    从待处理视频的视频数据中提取画面切换点;
    从所述待处理视频的音频数据中提取至少一个切换类型的音频切换点;所述音频切换点的切换类型包括:节拍切换点、旋律切换点或者歌词切换点中的至少一项;
    根据所述画面切换点和所述至少一个切换类型的音频切换点,确定目标切换类型;
    从属于所述目标切换类型的音频切换点中,获取目标音频切换点,所述目标音频切换点与所述画面切换点对应;
    根据所述目标音频切换点,生成与所述音频数据对应的音频切换模板;所述音频切换模板用于生成与所述音频切换模板对应的视频。
  23. 一种计算机程序产品,包括指令,所述指令在电子设备上运行,使得所述电子设备执行如下步骤:
    从待处理视频的视频数据中提取画面切换点;
    从所述待处理视频的音频数据中提取至少一个切换类型的音频切换点;所述音频切换点的切换类型包括:节拍切换点、旋律切换点或者歌词切换点中的至少一项;
    根据所述画面切换点和所述至少一个切换类型的音频切换点,确定目标切换类型;
    从属于所述目标切换类型的音频切换点中,获取目标音频切换点,所述目标音频切换点与所述画面切换点对应;
    根据所述目标音频切换点,生成与所述音频数据对应的音频切换模板;所述音频切换模板用于生成与所述音频切换模板对应的视频。
PCT/CN2022/079533 2021-07-06 2022-03-07 一种音频切换模板生成方法及设备 WO2023279743A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110764340.1A CN113613061B (zh) 2021-07-06 2021-07-06 一种卡点模板生成方法、装置、设备及存储介质
CN202110764340.1 2021-07-06

Publications (1)

Publication Number Publication Date
WO2023279743A1 true WO2023279743A1 (zh) 2023-01-12

Family

ID=78337362

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/079533 WO2023279743A1 (zh) 2021-07-06 2022-03-07 一种音频切换模板生成方法及设备

Country Status (2)

Country Link
CN (1) CN113613061B (zh)
WO (1) WO2023279743A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116132708A (zh) * 2023-01-28 2023-05-16 北京达佳互联信息技术有限公司 卡点信息获取方法、装置、电子设备和存储介质

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113613061B (zh) * 2021-07-06 2023-03-21 北京达佳互联信息技术有限公司 一种卡点模板生成方法、装置、设备及存储介质
CN116166151A (zh) * 2023-02-21 2023-05-26 北京字跳网络技术有限公司 一种信息发布方法、识别方法、电子设备及介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108419035A (zh) * 2018-02-28 2018-08-17 北京小米移动软件有限公司 图片视频的合成方法及装置
CN110265057A (zh) * 2019-07-10 2019-09-20 腾讯科技(深圳)有限公司 生成多媒体的方法及装置、电子设备、存储介质
CN110677711A (zh) * 2019-10-17 2020-01-10 北京字节跳动网络技术有限公司 视频配乐方法、装置、电子设备及计算机可读介质
CN110933487A (zh) * 2019-12-18 2020-03-27 北京百度网讯科技有限公司 卡点视频的生成方法、装置、设备及存储介质
CN111064992A (zh) * 2019-12-10 2020-04-24 懂频智能科技(上海)有限公司 一种根据音乐节拍自动进行视频内容切换的方法
US20210084388A1 (en) * 2019-09-18 2021-03-18 Adam Kunsberg Beat based editing
CN113613061A (zh) * 2021-07-06 2021-11-05 北京达佳互联信息技术有限公司 一种卡点模板生成方法、装置、设备及存储介质

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111198958A (zh) * 2018-11-19 2020-05-26 Tcl集团股份有限公司 匹配背景音乐的方法、装置及终端
CN109670074B (zh) * 2018-12-12 2020-05-15 北京字节跳动网络技术有限公司 一种节奏点识别方法、装置、电子设备及存储介质
CN112235631B (zh) * 2019-07-15 2022-05-03 北京字节跳动网络技术有限公司 视频处理方法、装置、电子设备及存储介质
CN110336960B (zh) * 2019-07-17 2021-12-10 广州酷狗计算机科技有限公司 视频合成的方法、装置、终端及存储介质
CN110688496A (zh) * 2019-09-26 2020-01-14 联想(北京)有限公司 一种多媒体文件处理的方法及装置
CN110855904B (zh) * 2019-11-26 2021-10-01 Oppo广东移动通信有限公司 视频处理方法、电子装置和存储介质
CN111526427B (zh) * 2020-04-30 2022-05-17 维沃移动通信有限公司 视频生成方法、装置及电子设备
CN111741233B (zh) * 2020-07-16 2021-06-15 腾讯科技(深圳)有限公司 视频配乐方法、装置、存储介质以及电子设备
CN112866584B (zh) * 2020-12-31 2023-01-20 北京达佳互联信息技术有限公司 视频合成方法、装置、终端及存储介质
CN113050857B (zh) * 2021-03-26 2023-02-24 北京字节跳动网络技术有限公司 一种音乐分享方法、装置、电子设备及存储介质

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108419035A (zh) * 2018-02-28 2018-08-17 北京小米移动软件有限公司 图片视频的合成方法及装置
CN110265057A (zh) * 2019-07-10 2019-09-20 腾讯科技(深圳)有限公司 生成多媒体的方法及装置、电子设备、存储介质
US20210084388A1 (en) * 2019-09-18 2021-03-18 Adam Kunsberg Beat based editing
CN110677711A (zh) * 2019-10-17 2020-01-10 北京字节跳动网络技术有限公司 视频配乐方法、装置、电子设备及计算机可读介质
CN111064992A (zh) * 2019-12-10 2020-04-24 懂频智能科技(上海)有限公司 一种根据音乐节拍自动进行视频内容切换的方法
CN110933487A (zh) * 2019-12-18 2020-03-27 北京百度网讯科技有限公司 卡点视频的生成方法、装置、设备及存储介质
CN113613061A (zh) * 2021-07-06 2021-11-05 北京达佳互联信息技术有限公司 一种卡点模板生成方法、装置、设备及存储介质

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116132708A (zh) * 2023-01-28 2023-05-16 北京达佳互联信息技术有限公司 卡点信息获取方法、装置、电子设备和存储介质

Also Published As

Publication number Publication date
CN113613061A (zh) 2021-11-05
CN113613061B (zh) 2023-03-21

Similar Documents

Publication Publication Date Title
WO2023279743A1 (zh) 一种音频切换模板生成方法及设备
EP3648099B1 (en) Voice recognition method, device, apparatus, and storage medium
CN109564571B (zh) 利用搜索上下文的查询推荐方法及系统
US20240107127A1 (en) Video display method and apparatus, video processing method, apparatus, and system, device, and medium
US9747279B2 (en) Context carryover in language understanding systems or methods
US20130080162A1 (en) User Query History Expansion for Improving Language Model Adaptation
US9659052B1 (en) Data object resolver
CN107924679A (zh) 输入理解处理期间在响应选择中的延迟绑定
JP7046134B2 (ja) 情報を出力するための方法および装置
WO2017076315A1 (zh) 页面显示方法、装置、系统以及页面显示辅助方法、装置
WO2021068467A1 (zh) 语音包的推荐方法、装置、电子设备和存储介质
WO2021057740A1 (zh) 视频生成方法、装置、电子设备和计算机可读介质
JP2021128327A (ja) 口形特徴予測方法、装置及び電子機器
WO2018094952A1 (zh) 一种内容推荐方法与装置
WO2023142451A1 (zh) 工作流生成方法、装置、电子设备
EP3799036A1 (en) Speech control method, speech control device, electronic device, and readable storage medium
WO2013189156A1 (zh) 基于自然交互输入的视频搜索系统及方法和视频搜索服务器
US11818491B2 (en) Image special effect configuration method, image recognition method, apparatus and electronic device
CN109325180B (zh) 文章摘要推送方法、装置、终端设备、服务器及存储介质
JP2023502815A (ja) 放送音声を生成する方法、装置、機器、およびコンピュータ記憶媒体
CN107924398A (zh) 用于提供以评论为中心的新闻阅读器的系统和方法
WO2024179519A1 (zh) 语义识别方法及其装置
WO2022184077A1 (zh) 文档编辑的方法、装置、终端及非暂时性存储介质
US20240112702A1 (en) Method and apparatus for template recommendation, device, and storage medium
JP2021108095A (ja) スピーチ理解における解析異常の情報を出力するための方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22836503

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 07.05.2024)