WO2021088830A1 - 用于展示音乐点的方法、装置、电子设备和介质 - Google Patents

用于展示音乐点的方法、装置、电子设备和介质 Download PDF

Info

Publication number
WO2021088830A1
WO2021088830A1 PCT/CN2020/126261 CN2020126261W WO2021088830A1 WO 2021088830 A1 WO2021088830 A1 WO 2021088830A1 CN 2020126261 W CN2020126261 W CN 2020126261W WO 2021088830 A1 WO2021088830 A1 WO 2021088830A1
Authority
WO
WIPO (PCT)
Prior art keywords
music
audio
point
points
video
Prior art date
Application number
PCT/CN2020/126261
Other languages
English (en)
French (fr)
Inventor
王妍
刘舒
Original Assignee
北京字节跳动网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京字节跳动网络技术有限公司 filed Critical 北京字节跳动网络技术有限公司
Priority to KR1020227015287A priority Critical patent/KR20220091500A/ko
Priority to EP20884477.9A priority patent/EP4044611A4/en
Priority to JP2022525690A priority patent/JP7508552B2/ja
Publication of WO2021088830A1 publication Critical patent/WO2021088830A1/zh
Priority to US17/735,962 priority patent/US11587593B2/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/8549Creating video summaries, e.g. movie trailer
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/451Execution arrangements for user interfaces
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/0008Associated control or indicating means
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/02Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
    • G11B27/031Electronic editing of digitised analogue information signals, e.g. audio or video signals
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel
    • G11B27/19Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier
    • G11B27/28Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information detectable on the record carrier by using information signals recorded by the same method as the main recording
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel
    • G11B27/34Indicating arrangements 
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/414Specialised client platforms, e.g. receiver in car or embedded in a mobile appliance
    • H04N21/41407Specialised client platforms, e.g. receiver in car or embedded in a mobile appliance embedded in a portable device, e.g. video client on a mobile phone, PDA, laptop
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • H04N21/4394Processing of audio elementary streams involving operations for analysing the audio stream, e.g. detecting features or characteristics in audio streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44016Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for substituting a video clip
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/47205End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for manipulating displayed content, e.g. interacting with MPEG-4 objects, editing locally
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/47217End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for controlling playback functions for recorded or on-demand content, e.g. using progress bars, mode or play-point indicators or bookmarks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/8106Monomedia components thereof involving special audio data, e.g. different tracks for different languages
    • H04N21/8113Monomedia components thereof involving special audio data, e.g. different tracks for different languages comprising music, e.g. song in MP3 format
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8455Structuring of content, e.g. decomposing content into time segments involving pointers to the content, e.g. pointers to the I-frames of the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8456Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0484Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range
    • G06F3/04842Selection of displayed objects or displayed text elements
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/031Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
    • G10H2210/051Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for extraction or detection of onsets of musical sounds or notes, i.e. note attack timings
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/031Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
    • G10H2210/076Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for extraction of timing, tempo; Beat detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2220/00Input/output interfacing specifically adapted for electrophonic musical tools or instruments
    • G10H2220/091Graphical user interface [GUI] specifically adapted for electrophonic musical instruments, e.g. interactive musical displays, musical instrument icons or menus; Details of user interactions therewith
    • G10H2220/101Graphical user interface [GUI] specifically adapted for electrophonic musical instruments, e.g. interactive musical displays, musical instrument icons or menus; Details of user interactions therewith for graphical creation, edition or control of musical data or parameters

Definitions

  • the present disclosure relates to the field of computer technology, and more specifically, to a method, device, electronic device, and medium for displaying music spots.
  • the purpose of the present disclosure is to provide a method, device, electronic device, and medium for displaying music spots to solve the technical problems mentioned in the background art section above.
  • the present disclosure discloses a method for displaying music points.
  • the method includes: acquiring audio materials; analyzing initial music points in the audio materials; the initial music points include beat points and/or in the audio materials. Or the starting point of the musical note; on the operation interface of the video clip, according to the position of the audio material on the editing timeline and the position of the target music point in the audio material, the identification of the target music point is displayed on the editing timeline;
  • the above-mentioned target music points are part or all of the above-mentioned initial music points.
  • the above method further includes: acquiring image materials for video editing, where the above image materials include at least one of the following: image materials, video materials; determining alignment music points from the above target music points, wherein, The above-mentioned aligned music points are used to divide the above-mentioned audio material into multiple audio segments; in response to detecting a first user operation on the above-mentioned operation interface, the above-mentioned image material is used to generate a video for each music band in the above-mentioned audio material.
  • Fragments to obtain multiple video fragments where the corresponding audio fragments and video fragments have the same duration; according to the position of the audio material on the editing time axis and the position of the audio fragment corresponding to the video fragment in the audio material, Show the above-mentioned video clips on the above-mentioned editing timeline.
  • determining the alignment music points from the above-mentioned target music points includes: in response to the number of image materials of the above-mentioned image material being less than the number of the above-mentioned multiple audio clips, according to the number of the above-mentioned image materials, from the above-mentioned multiple audio clips Determine the number of audio clips of the above-mentioned image material in, and determine the number of the aforementioned aligned music points according to the number of the above-mentioned audio clips.
  • determining the number of aligned music points according to the number of image materials includes: determining the number of audio clips corresponding to the number of image materials according to the number of image materials; and determining the number of aligned music points according to the number of audio clips.
  • the first target number; the first target number is selected from the target music points according to the priority order of music points from high to low and/or the time when the music points appear in the audio material from first to last Two music points are used as aligned music points, where the priority of the retake point is higher than the priority of the secondary retake point, and the priority of the secondary retake point is higher than the priority of the accent point.
  • determining the alignment music points from the above-mentioned target music points includes: in response to the number of image materials of the above-mentioned image material being more than the number of the above-mentioned multiple audio clips, combining the second target number of music points and the above-mentioned target music The points are determined as aligned music points, where the second target number is determined according to the difference between the number of the multiple audio clips and the number of the image materials.
  • using the image material to generate a video clip for each music band in the audio material to obtain multiple video clips includes: determining the audio clip corresponding to the image material according to the duration of the image material; Wherein, the length of the video material corresponding to the music segment is not less than the length of the music segment; the video material is used to generate a video segment for the audio segment.
  • the video material is used to generate a video clip for each music frequency band in the audio material to obtain multiple video clips, including: responding to the video material having a duration less than the music clip corresponding to the video material Adjust the playback speed of the video material to obtain a video clip of the duration of the music clip.
  • the above method further includes: in response to detecting a second user operation on the first video clip in the above operation interface, displaying an adjustment interface of the image material corresponding to the first video clip; in response to detecting that For the manual intercept operation of the image material on the adjustment interface of the image material, the interception interval selected in the image material by the manual interception operation is determined; and the material is intercepted from the image material as the second video segment according to the interception interval.
  • analyzing the initial music point in the audio material includes: in response to detecting a third user operation on the first control on the operation interface of the music display interface, determining the retake point in the audio material as the initial music point.
  • Music point wherein the music display interface is displayed in response to detecting a selection operation for the audio material on the operation interface; in response to detecting a fourth user operation for the second control on the music display interface operation interface , Determining the beat point in the audio material as the initial music point; in response to detecting a fifth user operation on the third control on the operating interface of the music display interface, determining the accent point in the audio material as the initial music point.
  • the above method further includes: in response to detecting a third user operation on the operation interface, determining a target music point of the audio from the initial music point, wherein the third user operation includes at least one of the following Item: music point adding operation, music point deleting operation.
  • the identification of the target music point is displayed on the editing timeline. , Including: displaying the audio waveform of the audio material on the editing time axis, and displaying the identifier of the target music point in the corresponding position of the audio waveform.
  • some embodiments of the present disclosure provide an apparatus for displaying music points.
  • the apparatus includes: an acquisition unit configured to acquire audio materials; an analysis unit configured to analyze the initial music points in the above audio materials
  • the initial music point includes the beat point and/or the starting point of the note in the audio material
  • the display unit is configured to be on the operation interface of the video clip, according to the position of the audio material on the editing time axis and the target music point
  • the position in the audio material shows the identification of the target music point on the editing time axis; the target music point is part or all of the initial music point.
  • the above-mentioned apparatus further includes: a first obtaining unit configured to obtain image materials used for video editing, wherein the above-mentioned image materials include at least one of the following: image materials, video materials; and the determining unit is configured The alignment music point is determined from the target music points, wherein the alignment music point is used to divide the audio material into a plurality of audio segments; the generating unit is configured to respond to detecting a first user operation on the operation interface , Using the above image material to generate a video clip for each music band in the above audio material to obtain multiple video clips, wherein the corresponding audio clip and the video clip have the same duration; the first display unit is configured According to the position of the audio material on the editing time axis and the position of the audio clip corresponding to the video clip in the audio material, the video clip is displayed on the editing time axis.
  • a first obtaining unit configured to obtain image materials used for video editing, wherein the above-mentioned image materials include at least one of the following: image materials, video materials; and
  • the determining unit in the above-mentioned device further includes: a first determining subunit configured to respond to the number of image materials of the above-mentioned image material being less than the number of the above-mentioned multiple audio clips, according to the number of the above-mentioned image materials, from The number of audio segments of the image material is determined from the plurality of audio segments; the second determining unit is configured to determine the number of aligned music points according to the number of audio segments.
  • the second determining unit in the determining unit in the above device is further configured to: determine the number of audio clips corresponding to the number of image materials according to the number of image materials; and determine the alignment music points according to the number of audio clips.
  • the number of the first target according to the priority of the music points from high to low and/or the time when the music points appear in the audio material from first to last, select the first target from the above target music points
  • a number of music points are used as aligned music points, where the priority of the rebeat point is higher than the priority of the second rebeat point, and the priority of the second rebeat point is higher than the priority of the accent point.
  • the determining unit in the above-mentioned device is further configured to: in response to the number of image materials of the above-mentioned image material being greater than the number of the above-mentioned multiple audio clips, determining the first number of music points and the above-mentioned target music points as Align the music points, wherein the first number is determined according to the difference between the number of the multiple audio clips and the number of the image materials.
  • the generating unit in the device is further configured to: determine the audio segment corresponding to the image material according to the duration of the image material, wherein the length of the image material corresponding to the music segment is not less than the length of the music segment;
  • the above-mentioned video material generates a video segment for the above-mentioned audio segment.
  • the generating unit in the above device is further configured to: in response to the duration of the video material being less than the duration of the music clip corresponding to the video material, adjust the playback speed of the video material to obtain a video of the duration of the music clip Fragment.
  • the above-mentioned apparatus is further configured to: in response to detecting a second user operation on the first video segment in the above-mentioned operation interface, display an adjustment interface of the image material corresponding to the above-mentioned first video segment; and in response to detecting To the manual interception operation for the above-mentioned image material on the adjustment interface of the above-mentioned image material, determine the interception interval selected by the above-mentioned manual interception operation in the above-mentioned image material; according to the above-mentioned interception interval, intercept the material from the above-mentioned image material as the second video Fragment.
  • the analysis unit in the above-mentioned apparatus is further configured to: in response to detecting a third user operation on the first control on the above-mentioned operation interface, determine the retake point in the above-mentioned audio material as the above-mentioned initial music point; After detecting a fourth user operation for the second control on the operation interface, determine the beat point in the audio material as the initial music point; in response to detecting a fifth user operation for the third control on the operation interface, determine the above The accent point in the audio material is used as the above-mentioned initial music point.
  • the above-mentioned apparatus is further configured to: in response to detecting a third user operation on the above-mentioned operation interface, determine the target music point of the audio from the above-mentioned initial music point, wherein the above-mentioned third user operation includes the following At least one item: music point adding operation, music point deleting operation.
  • the display unit in the above-mentioned device is further configured to: display the audio waveform of the above-mentioned audio material on the above-mentioned editing time axis, and display the identification of the above-mentioned target music point at a corresponding position of the above-mentioned audio waveform.
  • some embodiments of the present disclosure provide an electronic device, including: one or more processors; a storage device, on which one or more programs are stored. When one or more programs are stored by one or more The processor executes such that one or more processors implement the method as in any one of the first aspect.
  • some embodiments of the present disclosure provide a computer-readable medium on which a computer program is stored, where the program is executed by a processor to implement the method as in any one of the first aspect.
  • some embodiments of the present disclosure provide a computer program, including program code.
  • the program code executes the method in any one of the first aspect.
  • One of the above-mentioned various embodiments of the present disclosure analyzes the beat and melody of the audio material to determine the music points and display them on the editing time axis, thus avoiding the user from marking the music points on the audio material.
  • the user can then perform video editing operations according to the marked music points, for example, select the switching point of the video clip according to the displayed music points. Therefore, the user operation is more convenient, while also ensuring the flexibility of the tool.
  • Fig. 1 is a schematic diagram of an application scenario of a method for displaying music points according to some embodiments of the present disclosure
  • FIG. 2A is a flowchart of some embodiments of a method for displaying music points according to the present disclosure
  • FIG. 2B is a schematic diagram of some application scenarios displayed by a control according to some embodiments of the present disclosure
  • 2C is a schematic diagram of some application scenarios of music spot display according to some embodiments of the present disclosure.
  • 3A-3B are schematic diagrams of another application scenario of the method for displaying music points according to some embodiments of the present disclosure.
  • FIG. 4A is a flowchart of other embodiments of the method for displaying music points according to the present disclosure.
  • 4B-4C are schematic diagrams of some application scenarios for adjusting video clips according to some embodiments of the present disclosure.
  • Fig. 5 is a schematic structural diagram of some embodiments of an apparatus for displaying music spots according to the present disclosure
  • Fig. 6 is a schematic structural diagram of an electronic device suitable for implementing some embodiments of the present disclosure.
  • Fig. 1 is a schematic diagram of an application scenario of a method for displaying music spots according to some embodiments of the present disclosure; as shown in the application scenario of Fig. 1, first, a terminal device 101 (shown as a mobile phone in Fig. 1) The audio material 1011 will be obtained.
  • the audio material 1011 may be an application default audio or an audio selected by the user.
  • the terminal device 101 analyzes the audio material 1011 to obtain the initial music points 1012-1014 in the audio material 1011.
  • the terminal device 101 generates a corresponding music point identifier 10121 for the music point 1012.
  • the terminal device 101 generates a corresponding music point identifier 10131 for the music point 1013.
  • the terminal device 101 generates a corresponding music point identifier 10141 for the music point 1014.
  • the terminal device 101 displays the music point identifier 10121, the music point identifier 10131, and the music point identifier 10141 on the editing timeline 102 of the audio material 1011 on the operation interface 103 of the video
  • the method for displaying music points may be executed by the terminal device 101, or may also be executed by other devices, or may also be executed by various software programs.
  • the terminal device 101 may be, for example, various electronic devices with a display screen, including but not limited to smart phones, tablet computers, e-book readers, laptop computers, desktop computers, and so on.
  • the execution subject may also be embodied as a server, software, and so on.
  • the execution subject is software, it can be installed in the electronic devices listed above. It can be implemented, for example, as multiple software or software modules for providing distributed services, or as a single software or software module. There is no specific limitation here.
  • FIG. 1 the number of mobile phones in FIG. 1 is only illustrative. According to implementation needs, there can be any number of mobile phones.
  • FIG. 2A shows a process 200 of some embodiments of a method for displaying music points according to the present disclosure.
  • the method for displaying music points includes the following steps:
  • Step 201 Obtain audio materials.
  • the executor of the method for displaying music spots may obtain audio materials through a wired connection or a wireless connection.
  • the above audio material may be music stored locally by the user, or music on the Internet.
  • Step 202 Analyze the initial music points in the aforementioned audio material.
  • the above-mentioned executive body may determine the initial music point of the audio material.
  • the initial music point includes the beat point and/or the starting point of the note in the audio material.
  • the initial music point is the position in the audio material that meets the set musicality change.
  • the position where the musicality changes may include the position where the tempo changes and the position where the melody changes.
  • the initial music point can be determined in the following way: the above-mentioned executive body can analyze the above-mentioned audio material to determine the beat point and the starting point of the note, where the beat point is the position where the beat changes, and the starting point of the note is the melody The location where the change occurred.
  • a deep learning-based beat analysis algorithm can be used to analyze the audio material to obtain the beat point in the audio material and the time stamp at which the beat point is located; on the other hand, perform short-time spectrum analysis on the audio material to obtain the audio material
  • the starting point of the note can be obtained by an onset detector. Then, the beat point and the note starting point obtained by the two methods are unified, and the beat point and the note starting point are combined and deduplicated to obtain the initial music point.
  • the retake point in the audio material is determined as the initial music point; the first control 212 It is usually used to trigger the determination of the retake of the audio material, wherein the music display interface is displayed in response to detecting a selection operation for the audio material on the operation interface.
  • the above-mentioned rebeat usually refers to a strong beat. In music, the beat is divided into a strong beat and a down beat, and a strong beat is usually a beat with a strong musical strength.
  • the beat strength is expressed as the first beat is a strong beat
  • the second beat is a down beat
  • the third beat is a second strong beat
  • the fourth beat is a down beat
  • the fourth beat is quarter notes
  • each measure has 4 beats.
  • the aforementioned third user operation may refer to a user's click operation on the aforementioned first control 212.
  • the beat point in the audio material is determined as the initial music point; the second control 213 is usually used to trigger the determination of the beat in the audio material .
  • the aforementioned fourth user operation may refer to a user's click operation on the aforementioned second control 213.
  • the accent point in the audio material is determined as the initial music point.
  • the third control 214 is usually used to trigger the determination of the accent in the audio material.
  • the above-mentioned accent may refer to a sound with greater intensity in the music, where the accent point may be the position where the melody becomes stronger in the starting point of the above-mentioned note, for example, a beat marked by an accent mark in the music score, the above-mentioned accent mark includes at least one of the following: ">” and " ⁇ ". Among them, when ">" and " ⁇ " appear at the same time, " ⁇ " means stronger accent.
  • the above-mentioned fifth user operation may refer to a user's click operation on the above-mentioned third control 214.
  • Step 203 On the operation interface of the video clip, according to the position of the audio material on the editing time axis and the position of the target music point in the audio material, the identification of the target music point is displayed on the editing time axis.
  • the above-mentioned execution subject may display the above-mentioned target on the above-mentioned editing timeline according to the position of the above-mentioned audio material on the editing timeline and the position of the target music point in the above-mentioned audio material on the operation interface of the video clip.
  • the identification of the music spot may display the above-mentioned target music points.
  • the analyzed initial music points can all be displayed as target music points.
  • the target music point may be the above-mentioned music point a, music point b, and music point c.
  • a target music point may be selected from the above-mentioned initial music point for display.
  • the above-mentioned third user operation includes at least one of the following: a music point adding operation, and a music point deleting operation.
  • a music point adding operation For example, suppose there are 3 initial music points, namely music point a, music point b, and music point c.
  • the target music point can be the aforementioned music point a, music point b, and music point c.
  • music point d For another example, when the user deletes music point b, the target music points can be music point a and music point c.
  • the above-mentioned identification may be a preset icon, for example, a triangle, a circle, a star, and so on.
  • the audio waveform of the audio material may be displayed on the editing timeline 225, and the identification 222 of the target music point may be displayed in the corresponding position of the audio waveform. -224.
  • the above-mentioned audio waveform usually refers to an image displayed by the audio in the form of a waveform diagram. The identifier of the target music point is displayed on the image according to the position of the corresponding music point on the image.
  • FIGS. 3A-3B are schematic diagrams of another application scenario of the method for displaying music points according to some embodiments of the present disclosure.
  • the user can upload on the terminal device 301 Select multiple image materials on page 3017. For example, upload the picture 3011, the video 3012, the picture 3013, and the picture 3014 shown in the page 3017.
  • the user clicks on the position shown by the reference numeral 3015 and the reference numeral 3018, and selects the picture 3011 and the video 3012.
  • the alignment music 307 is determined. At this time, the alignment music point 307 can actually make The audio material is divided into music segment A and music segment B. According to the duration of the music segment A and the duration of the music segment B, the video material 304 and the video material 305 are processed respectively, and the video segments 3041 and 3051 corresponding to the music segment A and the music segment B can be obtained. Then, according to the positions of the music segment A and the music segment B in the aforementioned audio material, the video segments 3041 and 3051 can be displayed on the editing timeline 312 of the video editing operation interface 313.
  • the method for displaying music points may be executed by the terminal device 301, or may also be executed by other devices, or may also be executed by various software programs.
  • the terminal device 301 may be, for example, various electronic devices with a display screen, including but not limited to smart phones, tablet computers, e-book readers, laptop computers, desktop computers, and so on.
  • the execution subject may also be embodied as a server, software, and so on.
  • the execution subject is software, it can be installed in the electronic devices listed above. It can be implemented, for example, as multiple software or software modules for providing distributed services, or as a single software or software module. There is no specific limitation here.
  • FIGS. 3A-3B the number of mobile phones in FIGS. 3A-3B is merely illustrative. According to implementation needs, there can be any number of mobile phones.
  • the method for displaying music points includes the following steps:
  • Step 401 Obtain audio materials.
  • Step 402 Analyze the initial music points in the aforementioned audio material.
  • Step 403 On the operation interface of the video clip, according to the position of the audio material on the editing time axis and the position of the target music point in the audio material, the identification of the target music point is displayed on the editing time axis.
  • steps 401-403 and the technical effects brought by it can refer to steps 201-203 in those embodiments corresponding to FIG. 2, which will not be repeated here.
  • Step 404 Obtain image materials for video editing.
  • the above-mentioned execution subject may first obtain image materials for video editing, and then obtain audio materials. At the same time, the above-mentioned execution subject may also obtain audio materials first, and then obtain image materials for video editing.
  • the above-mentioned execution subject may obtain image materials for video editing through a wired connection or a wireless connection.
  • the above-mentioned image material includes at least one of the following: image material and video material.
  • the aforementioned picture material may be a picture stored locally by the user, or a picture downloaded by the user from the Internet.
  • the above-mentioned video materials can be videos uploaded by users, videos stored locally by users, or videos downloaded by users from the Internet.
  • Step 405 Determine the aligned music point from the above-mentioned target music points.
  • the above-mentioned execution subject may first obtain the target music point in the audio material determined in step 203. Then, the above-mentioned execution subject can select a target number of aligned music points from each target music point that has been obtained.
  • the above-mentioned aligned music points may be all target music points or part of the target music points.
  • the target quantity may be determined according to the quantity of the acquired image material, or it may be determined according to the number of strong shots in the audio material, or it may be a quantity set by the user.
  • the above-mentioned execution subject is based on the determined alignment music point.
  • the above audio material is divided to obtain multiple audio clips. As an example, when 4 aligned music points are determined, the above audio material can be divided into 5 music fragments.
  • the same number of audio clips as the above-mentioned image material can be determined from the above-mentioned multiple audio clips according to the number of the above-mentioned image materials ; Then, according to the number of audio clips, the number of aligned music points is determined.
  • the execution subject may determine the number of audio clips corresponding to the number of image materials based on the number of image materials. For example, if the number of image materials is 5, the number of corresponding audio clips is also 5.
  • the first target number of aligned music points is determined. For example, if the number of audio clips is 5, then the first target number for aligning music points should be 4. Finally, according to the priority of the music points from high to low and/or the time when the music points appear in the audio material from first to last, select the first target number of music points from the target music points. As a point of alignment music.
  • the priority of the aforementioned music points may be preset. For example, it may be that the priority of the retake point is higher than the priority of the secondary retake point, and the priority of the secondary retake point is higher than the priority of the accent point.
  • the second target number of music points and the above-mentioned target music points may be determined as aligned music points, and the execution subject may first calculate The difference between the number of the plurality of audio clips and the number of the image material, and then the value of the second target quantity is determined according to the difference. For example, when 5 image materials are obtained but only 3 audio clips are obtained, 5 image materials need 4 aligned music points. According to only 3 audio clips, it can be determined that there are only 2 aligned music points. At this time, the above-mentioned first audio clip can be determined. The value of the target quantity is 2.
  • the two aligned music points here can be manually added by the user, or the executor selects a music beat other than the existing two aligned music points in the above audio material, such as inserting the middle between the existing beat points Shoot.
  • the number of image materials of the above-mentioned image material is more than the number of the above-mentioned multiple audio clips
  • the number of the above-mentioned multiple audio clips can be selected, and the number of the above-mentioned multiple audio clips can be determined from the above-mentioned image materials.
  • Video footage of the same number of clips For example, the selection can be made according to the order in which the above-mentioned image materials are acquired. Then, the number of the above-mentioned aligned music points is determined according to the number of the above-mentioned multiple audio clips. As an example, when 5 image materials and 4 audio clips are obtained, 4 image materials can be selected, and 3 alignment music points can be determined according to the 4 audio clips.
  • Step 406 In response to detecting a first user operation on the aforementioned operation interface, use the aforementioned image material to generate a video segment for each audio segment in the aforementioned audio material to obtain multiple video segments.
  • the video material when the execution subject detects a first user operation on the operation interface, the video material is used to generate a video clip for each music clip in the audio material to obtain multiple video clips.
  • the above-mentioned first user operation is usually used to trigger the above-mentioned multiple audio clips to be aligned with the above-mentioned multiple video clips.
  • the above-mentioned video material may be aligned with the above-mentioned audio clip according to the selection order of the above-mentioned video material. For example, if there are three image materials: image material 1, image material 2, and image material 3. It can be that image material 1 is aligned with the first audio segment that appears in the above audio material. It can also be aligned with the audio clip according to the duration of the video material. For example, the longest video material is aligned with the longest audio clip.
  • the above-mentioned execution subject may generate a video segment with the same duration as the audio segment for the audio segment based on the image material.
  • the duration of the 3 audio clips are 1 second, 2 seconds, and 3 seconds respectively
  • the duration of the video clips corresponding to the above music clips can also be 1 second respectively , 2 seconds and 3 seconds.
  • the above-mentioned execution subject may generate multiple video clips based on one image material. For example, suppose the above-mentioned executor obtains a 10-second image material and an 8-second audio material, the executor divides the audio material into 3 audio clips according to the alignment music points, and the duration is 2 seconds, 3 seconds, and 3 seconds. , The execution subject can cut out 3 different video clips from the image material, the durations are 2 seconds, 3 seconds, and 3 seconds respectively.
  • the execution subject may also determine the audio segment corresponding to the image material according to the duration of the image material, wherein the length of the image material corresponding to the music segment is not less than the length of the music segment.
  • the playback speed of the original video material can be slowed down to lengthen the duration, and then the variable speed video material can be used as a video clip to make the video clip
  • the duration of is equal to the duration of the audio clip. It is understandable that, for the picture material in the video material, the picture material can be generated into a fixed-length video material, such as 3 seconds, and then the video material is used to generate a video clip for the music clip.
  • the adjustment interface of the image material 419 corresponding to the first video segment 413 is displayed 415:
  • the above-mentioned first video segment 413 is usually a video segment in which a user operation is detected.
  • the above-mentioned second user operation 414 may be a user's click operation on the above-mentioned first video segment 413 to enter the adjustment interface 415 of the image material 419 corresponding to the above-mentioned first video segment 413.
  • the interception interval reference numeral 416 and the reference numeral 417 selected by the manual interception operation 418 in the image material 419 are The interval between; the interception interval is determined according to the duration of the audio segment corresponding to the first video segment 413.
  • the aforementioned manual interception operation 418 may be that the user slides the image material 419 corresponding to the aforementioned first video segment 413 to make the video segment 420 in the aforementioned interception interval become what the user needs.
  • the above-mentioned execution subject may align the above-mentioned video segment 420 to the position of the above-mentioned first video segment 413.
  • the material is intercepted from the above-mentioned video material as the second video segment 420.
  • the video segment within the aforementioned interception interval is taken as the second video segment 420.
  • Step 407 According to the position of the audio material on the editing time axis and the position of the audio clip corresponding to the video clip in the audio material, the video clip is displayed on the editing time axis.
  • the execution subject may display the video clip on the editing time axis according to the position of the audio material on the editing time axis and the position of the audio clip corresponding to the video clip in the audio material.
  • the above audio material can be divided into 3 segments in order according to music points.
  • segment A can be from 0 seconds to 2 seconds
  • segment B can be from 2 seconds to 5 seconds
  • segment C can be from 5 seconds to 5 seconds. 10 seconds.
  • the corresponding video segments are a segment, b segment, and c segment. Then, the video clips a, b, and c are displayed in order on the operation interface of the above video clip.
  • the video boundary of the video clip will be determined Is the position of the music spot.
  • the video boundary 4212 of the video clip 4210 will automatically be snapped to the position of the music point 4214, and the video clip 4210 will get The corresponding video segment 4217, and the video segment 4217 will also have a corresponding video boundary 4216.
  • multiple video clips and multiple music clips are obtained according to the acquired image material and the determined target music point, and the multiple video clips are displayed on the operation interface This allows users to intuitively see the playback sequence and duration of video clips, thereby increasing the speed at which users can edit videos.
  • the present disclosure provides some embodiments of a device for displaying music points. These device embodiments correspond to those of the above-mentioned method embodiments in FIG.
  • the device can be specifically applied to various electronic devices.
  • the apparatus 500 for displaying music points in some embodiments includes: an acquisition unit 501, an analysis unit 502, and a display unit 503.
  • the acquiring unit 501 is configured to acquire audio materials
  • the analyzing unit 502 is configured to analyze the initial music points in the audio materials
  • the initial music points include the beat points and/or the starting points of the notes in the audio materials
  • the unit 503 is configured to display the identification of the target music point on the editing time axis according to the position of the audio material on the editing time axis and the position of the target music point in the audio material on the operation interface of the video clip ;
  • the above-mentioned target music points are part or all of the above-mentioned initial music points.
  • the apparatus 500 for displaying music spots further includes: a first obtaining unit configured to obtain image materials for video editing, wherein the image materials include at least one of the following: image materials, video materials
  • the determining unit is configured to determine aligned music points from the target music points, wherein the aligned music points are used to divide the audio material into a plurality of audio segments;
  • the generating unit is configured to respond to detection of the operation
  • the first user operation of the interface uses the above-mentioned video material to generate a video clip for each music band in the above-mentioned audio material to obtain multiple video clips, wherein the corresponding audio clip and the video clip have the same duration;
  • a display unit configured to display the video clip on the editing time axis according to the position of the audio material on the editing time axis and the position of the audio clip corresponding to the video clip in the audio material.
  • the determining unit in the apparatus 500 for displaying music points further includes: a first determining sub-unit configured to respond to that the number of image materials of the above-mentioned image material is less than the number of the above-mentioned multiple audio clips, according to The number of the image material determines the number of audio segments of the image material from the plurality of audio segments; the second determining unit is configured to determine the number of the aligned music points according to the number of the audio segments.
  • the second determining unit in the determining unit in the apparatus 500 for displaying music points is further configured to: determine the number of audio segments corresponding to the number of image materials according to the number of image materials; Number, determine the first target number of aligning music points; according to the priority of the music points from high to low and/or the time when the music points appear in the above audio material, from first to last, from the above target music points
  • the first target number of music points mentioned above are selected as the aligned music points, where the priority of the rebeat point is higher than the priority of the second rebeat point, and the priority of the second rebeat point is higher than the priority of the accent point
  • the determining unit in the apparatus 500 for displaying music points is further configured to: in response to the number of image materials of the above-mentioned image material being more than the number of the above-mentioned multiple audio clips, the first number of music points and The target music points are determined as aligned music points, wherein the first number is determined according to the difference between the number of the multiple audio clips and the number of the image materials.
  • the generating unit in the apparatus 500 for displaying music points is further configured to determine the audio segment corresponding to the image material according to the duration of the image material, wherein the length of the image material corresponding to the music segment is not less than The length of the music segment; the video material is used to generate a video segment for the audio segment.
  • the generating unit in the apparatus 500 for displaying music points is further configured to: in response to the duration of the video material being less than the duration of the music segment corresponding to the video material, adjusting the playback speed of the video material to obtain the above A video clip of the duration of the music clip.
  • the apparatus 500 for displaying music spots is further configured to: in response to detecting a second user operation on the first video clip in the above operation interface, display the image material corresponding to the first video clip. Adjustment interface; in response to detecting a manual interception operation for the above-mentioned image material on the adjustment interface of the above-mentioned image material, determine the interception interval selected in the above-mentioned image material by the manual interception operation; and intercept the above-mentioned image material according to the above-mentioned interception interval Output material as the second video clip.
  • the analyzing unit 502 in the apparatus 500 for displaying music points is further configured to: in response to detecting a third user operation on the first control on the operation interface, determine the retake point in the audio material As the initial music point; in response to detecting a fourth user operation for the second control on the operation interface, determine the beat point in the audio material as the initial music point; in response to detecting that the third control on the operation interface is In a fifth user operation, the accent point in the audio material is determined as the initial music point.
  • the apparatus 500 for displaying music points is further configured to determine the target music point of the audio from the initial music points in response to detecting a third user operation on the above-mentioned operation interface.
  • the third user operation includes at least one of the following: music point adding operation, music point deleting operation.
  • the display unit 503 in the apparatus 500 for displaying music points is further configured to: display the audio waveform of the audio material on the editing time axis, and display the target music point in the corresponding position of the audio waveform. logo.
  • FIG. 6 shows a schematic structural diagram of an electronic device (for example, the server in FIG. 1) 600 suitable for implementing some embodiments of the present disclosure.
  • the terminal devices in some embodiments of the present disclosure may include, but are not limited to, mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablets), PMPs (portable multimedia players), vehicle-mounted terminals, etc. (E.g. car navigation terminals) and other mobile terminals and fixed terminals such as digital TVs, desktop computers, etc.
  • the terminal device shown in FIG. 6 is only an example, and should not bring any limitation to the function and scope of use of the embodiments of the present disclosure.
  • the electronic device 600 may include a processing device (such as a central processing unit, a graphics processor, etc.) 601, which may be loaded into a random access device according to a program stored in a read-only memory (ROM) 602 or from a storage device 608.
  • the program in the memory (RAM) 603 executes various appropriate actions and processing.
  • various programs and data required for the operation of the electronic device 600 are also stored.
  • the processing device 601, the ROM 602, and the RAM 603 are connected to each other through a bus 604.
  • An input/output (I/O) interface 605 is also connected to the bus 604.
  • the following devices can be connected to the I/O interface 605: including input devices 606 such as touch screens, touch pads, keyboards, mice, cameras, microphones, accelerometers, gyroscopes, etc.; including, for example, liquid crystal displays (LCD), speakers, vibration An output device 607 such as a device; a storage device 608 such as a memory card; and a communication device 609.
  • the communication device 609 may allow the electronic device 600 to perform wireless or wired communication with other devices to exchange data.
  • FIG. 6 shows an electronic device 600 having various devices, it should be understood that it is not required to implement or have all the illustrated devices. It may alternatively be implemented or provided with more or fewer devices. Each block shown in FIG. 6 may represent one device, or may represent multiple devices as needed.
  • the process described above with reference to the flowchart may be implemented as a computer software program.
  • some embodiments of the present disclosure include a computer program product, which includes a computer program carried on a computer-readable medium, and the computer program contains program code for executing the method shown in the flowchart.
  • the computer program may be downloaded and installed from the network through the communication device 609, or installed from the storage device 608, or installed from the ROM 602.
  • the processing device 601 the above-mentioned functions defined in the methods of some embodiments of the present disclosure are executed.
  • the aforementioned computer-readable medium in some embodiments of the present disclosure may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the two.
  • the computer-readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or a combination of any of the above.
  • Computer-readable storage media may include, but are not limited to: electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable removable Programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • the computer-readable storage medium may be any tangible medium that contains or stores a program, and the program may be used by or in combination with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in a baseband or as a part of a carrier wave, and a computer-readable program code is carried therein.
  • This propagated data signal can take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • the computer-readable signal medium may also be any computer-readable medium other than the computer-readable storage medium.
  • the computer-readable signal medium may send, propagate or transmit the program for use by or in combination with the instruction execution system, apparatus, or device .
  • the program code contained on the computer-readable medium can be transmitted by any suitable medium, including but not limited to: wire, optical cable, RF (Radio Frequency), etc., or any suitable combination of the above.
  • the client and server can communicate with any currently known or future developed network protocol such as HTTP (HyperText Transfer Protocol), and can communicate with digital data in any form or medium.
  • Communication e.g., communication network
  • Examples of communication networks include local area networks (“LAN”), wide area networks (“WAN”), the Internet (for example, the Internet), and end-to-end networks (for example, ad hoc end-to-end networks), as well as any currently known or future research and development network of.
  • the above-mentioned computer-readable medium may be included in the above-mentioned electronic device; or it may exist alone without being assembled into the electronic device.
  • the above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device: obtains audio materials; analyzes the initial music points in the above-mentioned audio materials; the above-mentioned initial music points Including the beat point and/or the starting point of the note in the above audio material; on the operation interface of the video clip, according to the position of the above audio material on the editing timeline and the position of the target music point in the above audio material, at the above editing time
  • the identifier of the above-mentioned target music point is displayed on the axis; the above-mentioned target music point is part or all of the above-mentioned initial music point.
  • the computer program code used to perform the operations of some embodiments of the present disclosure can be written in one or more programming languages or a combination thereof.
  • the above-mentioned programming languages include object-oriented programming languages such as Java, Smalltalk, C++, Also includes conventional procedural programming languages-such as "C" language or similar programming languages.
  • the program code can be executed entirely on the user's computer, partly on the user's computer, executed as an independent software package, partly on the user's computer and partly executed on a remote computer, or entirely executed on the remote computer or server.
  • the remote computer can be connected to the user’s computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (for example, using an Internet service provider to Connect via the Internet).
  • LAN local area network
  • WAN wide area network
  • each block in the flowchart or block diagram may represent a module, program segment, or part of code, and the module, program segment, or part of code contains one or more for realizing the specified logical function Executable instructions.
  • the functions marked in the block may also occur in a different order from the order marked in the drawings. For example, two blocks shown in succession can actually be executed substantially in parallel, and they can sometimes be executed in the reverse order, depending on the functions involved.
  • each block in the block diagram and/or flowchart, and the combination of the blocks in the block diagram and/or flowchart can be implemented by a dedicated hardware-based system that performs the specified functions or operations Or it can be realized by a combination of dedicated hardware and computer instructions.
  • the units described in some embodiments of the present disclosure may be implemented in software or hardware.
  • the described unit may also be provided in the processor, for example, it may be described as: a processor includes an acquisition unit, an analysis unit, and a display unit. Among them, the names of these units do not constitute a limitation on the unit itself under certain circumstances.
  • the acquiring unit can also be described as a "unit for acquiring audio material".
  • exemplary types of hardware logic components include: Field Programmable Gate Array (FPGA), Application Specific Integrated Circuit (ASIC), Application Specific Standard Product (ASSP), System on Chip (SOC), Complex Programmable Logical device (CPLD) and so on.
  • FPGA Field Programmable Gate Array
  • ASIC Application Specific Integrated Circuit
  • ASSP Application Specific Standard Product
  • SOC System on Chip
  • CPLD Complex Programmable Logical device
  • a method for displaying music points including: obtaining audio material; analyzing initial music points in the audio material; the initial music points include the beats in the audio material Point and/or the starting point of the note; on the operation interface of the video clip, according to the position of the above audio material on the editing timeline and the position of the target music point in the above audio material, the above target music point is displayed on the above editing timeline ,
  • the above-mentioned target music point is part or all of the above-mentioned initial music point.
  • the method further includes: acquiring image materials for video editing, wherein the image materials include at least one of the following: image materials, video materials; and determining the alignment from the above-mentioned target music points Music points, wherein the aligning music points are used to divide the audio material into a plurality of audio segments; in response to detecting a first user operation on the operation interface, the image material is used for each piece of music in the audio material Each frequency band generates a video clip to obtain multiple video clips.
  • the corresponding audio clip and the video clip have the same duration; according to the position of the above audio material on the editing time axis and the audio clip corresponding to the above video clip in the above audio The position in the material, and the above-mentioned video clip is displayed on the above-mentioned editing timeline.
  • determining the alignment music point from the above-mentioned target music points includes: responding to that the number of the image material of the above-mentioned image material is less than the number of the above-mentioned multiple audio clips, according to the number of the above-mentioned image material, The number of audio segments of the video material is determined from the plurality of audio segments; and the number of aligned music points is determined according to the number of audio segments.
  • determining the number of aligned music points according to the number of video materials includes: selecting a target number of music points from the target music points according to the priority of the music points from high to low.
  • the music point is used as the aligned music point, wherein the priority of the retake point is higher than the priority of the secondary retake point, and the priority of the secondary retake point is higher than the priority of the accent point.
  • determining the alignment music points from the above-mentioned target music points includes: in response to the number of image materials of the above-mentioned image material being greater than the number of the above-mentioned multiple audio clips, arranging the first number of music points The target music points are determined as aligned music points, wherein the first number is determined according to the difference between the number of the multiple audio clips and the number of the image materials.
  • using the above-mentioned image material to generate a video clip for each music band in the above-mentioned audio material to obtain multiple video clips including: determining the above-mentioned image material according to the duration of the above-mentioned image material The corresponding audio segment, wherein the length of the video material corresponding to the music segment is not less than the length of the music segment; the video material is used to generate a video segment for the audio segment.
  • the video material is used to generate a video clip for each music frequency band in the audio material to obtain multiple video clips, including: responding to the video material having a duration shorter than the video The duration of the music segment corresponding to the material is adjusted, and the playback speed of the video material is adjusted to obtain a video segment of the duration of the music segment.
  • the method further includes: in response to detecting a second user operation on the first video segment in the aforementioned operation interface, displaying an adjustment interface of the image material corresponding to the aforementioned first video segment; In response to detecting the manual interception operation for the above-mentioned image material on the adjustment interface of the above-mentioned image material, determine the interception interval selected in the above-mentioned image material by the manual interception operation; according to the above-mentioned interception interval, intercept the material from the above-mentioned image material as The second video clip.
  • analyzing the initial music point in the audio material includes: in response to detecting a third user operation on the first control on the operation interface, determining the retake point in the audio material as The initial music point; in response to detecting a fourth user operation for the second control on the operation interface, determine the beat point in the audio material as the initial music point; in response to detecting the first music point for the third control on the operation interface Five user operations determine the accent point in the audio material as the initial music point.
  • the method further includes: in response to detecting a third user operation on the operation interface, determining the target music point of the audio from the initial music point, wherein the third user The operation includes at least one of the following: music point adding operation, music point deleting operation.
  • the above is displayed on the editing timeline.
  • the identification of the target music point includes: displaying the audio waveform of the audio material on the editing time axis, and displaying the identification of the target music point in the corresponding position of the audio waveform.
  • the apparatus includes: an acquisition unit configured to acquire audio materials; an analysis unit configured to analyze initial music points in the audio materials; the initial music points include the audio materials The beat point and/or the starting point of the note; the display unit is configured to be on the operation interface of the video clip, in accordance with the position of the above audio material on the editing timeline and the position of the target music point in the above audio material, in the above clip
  • the identifier of the above-mentioned target music point is displayed on the time axis; the above-mentioned target music point is part or all of the above-mentioned initial music point.
  • an electronic device including: one or more processors; a storage device, on which one or more programs are stored, when one or more programs are stored by one or more Execution by two processors, so that one or more processors implement the method described in any of the foregoing embodiments.
  • a computer-readable medium on which a computer program is stored, where the program is executed by a processor to implement the method described in any of the above embodiments.
  • a computer program including program code.
  • the program code executes the method described in any of the above embodiments.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Computer Security & Cryptography (AREA)
  • Management Or Editing Of Information On Record Carriers (AREA)
  • Electrophonic Musical Instruments (AREA)
  • Television Signal Processing For Recording (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

本公开的实施例公开了用于展示音乐点的方法、装置、电子设备和介质。该方法的一具体实施方式包括:获取音频素材;分析该音频素材中的初始音乐点;该初始音乐点包括该音频素材中的节拍点和/或音符起始点;在视频剪辑的操作界面上,按照该音频素材在剪辑时间轴上的位置以及目标音乐点在该音频素材中的位置,在该剪辑时间轴上展示该目标音乐点的标识;该目标音乐点为部分或全部的该初始音乐点。该实施方式减少了用户处理音频素材标注音乐点的时间,同时也保证了工具的灵活性。

Description

用于展示音乐点的方法、装置、电子设备和介质
本公开要求于2019年11月04日提交中国专利局、申请号为201911067475.1、申请名称为“用于展示音乐点的方法、装置、电子设备和介质”的中国专利申请的优先权,其全部内容通过引用结合在本公开中。
技术领域
本公开涉及计算机技术领域,更为具体地,涉及一种用于展示音乐点的方法、装置、电子设备和介质。
背景技术
随着多媒体技术的发展,音乐处理应用层出不穷,用户对音乐处理应用的需求也与日俱增。目前,音乐处理应用大多需要用户自己标注音乐中的音乐点,但是大多数用户是无法听出音乐点的,即使用户可以听出音乐点,也需要花费很多时间。
发明内容
本公开的目的在于提供一种用于展示音乐点的方法、装置、电子设备和介质,来解决以上背景技术部分提到的技术问题。
第一方面,本公开公开了一种用于展示音乐点的方法,该方法包括:获取音频素材;分析上述音频素材中的初始音乐点;上述初始音乐点包括上述音频素材中的节拍点和/或音符起始点;在视频剪辑的操作界面上,按照上述音频素材在剪辑时间轴上的位置以及目标音乐点在上述音频素材中的位置,在上述剪辑时间轴上展示上述目标音乐点的标识;上述目标音乐点为部分或全部的上述初始音乐点。
在一些实施例中,上述方法还包括:获取用于视频剪辑的影像素材,其中,上述影像素材包括以下至少一项:图片素材,视频素材;从上述目标音乐点中确定对齐音乐点,其中,上述对齐音乐点用于将上述音频素材划分成多个音频片段;响应于检测到针对上述操作界面的第一用户操作,利用上述影像素材,为上述音频素材中的每个音乐频段分别生成一个视频片段,得到多个视频片段,其中,相对应的音频片段和视频片段具有相同的时长;按照上述音频素材在剪辑时间轴上的位置以及上述视频片段对应的音频片段在上述音频素材中的位置,在上述剪辑时间轴上展示上述视频片段。
在一些实施例中,从上述目标音乐点中确定对齐音乐点,包括:响应于上述影像素材的影像素材数量少于上述多个音频片段的数量,根据上述影像素材数量,从上述多个音频片段中确定出上述影像素材数量个音频片段;根据上述音频片段的数量,确定上述对齐音乐点的数量。
在一些实施例中,根据上述影像素材数量,确定上述对齐音乐点的数量,包括:根据上述影像素材数量,确定上述影像素材数量对应的音频片段数量;根据上述音频片段数量,确定对齐音乐点的第一目标数量;根据音乐点的优先级从高到低的顺序和/或音乐点在上述 音频素材中出现的时间从先到后的顺序,从上述目标音乐点中选择出上述第一目标数量个音乐点作为对齐音乐点,其中,重拍点的优先级高于次重拍点的优先级,次重拍点的优先级高于重音点的优先级。
在一些实施例中,从上述目标音乐点中确定对齐音乐点,包括:响应于上述影像素材的影像素材数量多于上述多个音频片段的数量,将第二目标数量个音乐点和上述目标音乐点确定为对齐音乐点,其中,上述第二目标数量是根据上述多个音频片段的数量与上述影像素材数量的差值确定的。
在一些实施例中,利用上述影像素材,为上述音频素材中的每个音乐频段分别生成一个视频片段,得到多个视频片段,包括:根据上述影像素材的时长确定上述影像素材对应的音频片段,其中,音乐片段对应的影像素材的长度不小于音乐片段的长度;利用上述影像素材为上述音频片段生成一个视频片段。
在一些实施例中,利用上述影像素材,为上述音频素材中的每个音乐频段分别生成一个视频片段,得到多个视频片段,包括:响应于上述影像素材的时长小于上述影像素材对应的音乐片段的时长,调整上述影像素材的播放速度得到上述音乐片段的时长的视频片段。
在一些实施例中,上述方法还包括:响应于检测到针对上述操作界面中的第一视频片段的第二用户操作,显示上述第一视频片段对应的影像素材的调整界面;响应于检测到在上述影像素材的调整界面上针对上述影像素材的手动截取操作,确定上述手动截取操作在上述影像素材中选中的截取区间;按照上述截取区间,从上述影像素材中截取出素材作为第二视频片段。
在一些实施例中,分析上述音频素材中的初始音乐点,包括:响应于检测到针对音乐展示界面上述操作界面上第一控件的第三用户操作,确定上述音频素材中重拍点作为上述初始音乐点,其中,上述音乐展示界面是响应于检测到针对上述操作界面上的上述音频素材的选择操作而显示的;响应于检测到针对上述音乐展示界面操作界面上第二控件的第四用户操作,确定上述音频素材中节拍点作为上述初始音乐点;响应于检测到针对上述音乐展示界面操作界面上第三控件的第五用户操作,确定上述音频素材中重音点作为上述初始音乐点。
在一些实施例中,上述方法还包括:响应于检测到针对上述操作界面的第三用户操作,从上述初始音乐点中确定上述音频的目标音乐点,其中,上述第三用户操作包括以下至少一项:音乐点增加操作,音乐点删除操作。
在一些实施例中,在视频剪辑的操作界面上,按照上述音频素材在剪辑时间轴上的位置以及目标音乐点在上述音频素材中的位置,在上述剪辑时间轴上展示上述目标音乐点的标识,包括:在上述剪辑时间轴上展示上述音频素材的音频波形,在上述音频波形的相应位置展示上述目标音乐点的标识。
第二方面,本公开的一些实施例提供了一种用于展示音乐点的装置,装置包括:获取单元,被配置成获取音频素材;分析单元,被配置成分析上述音频素材中的初始音乐点;上述初始音乐点包括上述音频素材中的节拍点和/或音符起始点;展示单元,被配置成在视频剪辑的操作界面上,按照上述音频素材在剪辑时间轴上的位置以及目标音乐点在上述音频素材中的位置,在上述剪辑时间轴上展示上述目标音乐点的标识;上述目标音乐点为部分或全部的上述初始音乐点。
在一些实施例中,上述装置还包括:第一获取单元,被配置成获取用于视频剪辑的影像素材,其中,上述影像素材包括以下至少一项:图片素材,视频素材;确定单元,被配置成从上述目标音乐点中确定对齐音乐点,其中,上述对齐音乐点用于将上述音频素材划分成多个音频片段;生成单元,被配置成响应于检测到针对上述操作界面的第一用户操作,利用上述影像素材,为上述音频素材中的每个音乐频段分别生成一个视频片段,得到多个视频片段,其中,相对应的音频片段和视频片段具有相同的时长;第一展示单元,被配置成按照上述音频素材在剪辑时间轴上的位置以及上述视频片段对应的音频片段在上述音频素材中的位置,在上述剪辑时间轴上展示上述视频片段。
在一些实施例中,上述装置中的确定单元还包括:第一确定子单元,被配置成响应于上述影像素材的影像素材数量少于上述多个音频片段的数量,根据上述影像素材数量,从上述多个音频片段中确定出上述影像素材数量个音频片段;第二确定单元,被配置成根据上述音频片段的数量,确定上述对齐音乐点的数量。
在一些实施例中,上述装置中的确定单元中的第二确定单元被进一步配置成:根据上述影像素材数量,确定上述影像素材数量对应的音频片段数量;根据上述音频片段数量,确定对齐音乐点的第一目标数量;根据音乐点的优先级从高到低的顺序和/或音乐点在上述音频素材中出现的时间从先到后的顺序,从上述目标音乐点中选择出上述第一目标数量个音乐点作为对齐音乐点,其中,重拍点的优先级高于次重拍点的优先级,次重拍点的优先级高于重音点的优先级。
在一些实施例中,上述装置中的确定单元被进一步配置成:响应于上述影像素材的影像素材数量多于上述多个音频片段的数量,将第一数量个音乐点和上述目标音乐点确定为对齐音乐点,其中,上述第一数量是根据上述多个音频片段的数量与上述影像素材数量的差值确定的。
在一些实施例中,上述装置中的生成单元被进一步配置成:根据上述影像素材的时长确定上述影像素材对应的音频片段,其中,音乐片段对应的影像素材的长度不小于音乐片段的长度;利用上述影像素材为上述音频片段生成一个视频片段。
在一些实施例中,上述装置中的生成单元被进一步配置成:响应于上述影像素材的时长小于上述影像素材对应的音乐片段的时长,调整上述影像素材的播放速度得到上述音乐片段的时长的视频片段。
在一些实施例中,上述装置被进一步配置成:响应于检测到针对上述操作界面中的第一视频片段的第二用户操作,显示上述第一视频片段对应的影像素材的调整界面;响应于检测到在上述影像素材的调整界面上针对上述影像素材的手动截取操作,确定上述手动截取操作在上述影像素材中选中的截取区间;按照上述截取区间,从上述影像素材中截取出素材作为第二视频片段。
在一些实施例中,上述装置中的分析单元被进一步配置成:响应于检测到针对上述操作界面上第一控件的第三用户操作,确定上述音频素材中重拍点作为上述初始音乐点;响应于检测到针对上述操作界面上第二控件的第四用户操作,确定上述音频素材中节拍点作为上述初始音乐点;响应于检测到针对上述操作界面上第三控件的第五用户操作,确定上述音频素材中重音点作为上述初始音乐点。
在一些实施例中,上述装置被进一步配置成:响应于检测到针对上述操作界面的第三 用户操作,从上述初始音乐点中确定上述音频的目标音乐点,其中,上述第三用户操作包括以下至少一项:音乐点增加操作,音乐点删除操作。
在一些实施例中,上述装置中的展示单元被进一步配置成:在上述剪辑时间轴上展示上述音频素材的音频波形,在上述音频波形的相应位置展示上述目标音乐点的标识。
第三方面,本公开的一些实施例提供了一种电子设备,包括:一个或多个处理器;存储装置,其上存储有一个或多个程序,当一个或多个程序被一个或多个处理器执行,使得一个或多个处理器实现如第一方面中任一的方法。
第四方面,本公开的一些实施例提供了一种计算机可读介质,其上存储有计算机程序,其中,程序被处理器执行时实现如第一方面中任一的方法。
第五方面,本公开的一些实施例提供了一种计算机程序,包括程序代码,当计算机运行所述计算机程序时,所述程序代码执行如第一方面中任一的方法。
本公开的上述各个实施例中的一个实施例通过对音频素材进行节拍、旋律的分析,确定出音乐点并在剪辑时间轴上展示出来,这样就避免了用户自己对音频素材标注音乐点。用户就可以根据标注出来的音乐点进行视频剪辑操作,例如,根据展示出来的音乐点选择视频片段的切换点。因此,用户操作更加便捷,同时也保证了工具的灵活性。
附图说明
图1是根据本公开的一些实施例的用于展示音乐点的方法的一个应用场景的示意图;
图2A是根据本公开的用于展示音乐点的方法的一些实施例的流程图;
图2B是根据本公开的一些实施例的控件展示的一些应用场景的示意图;
图2C是根据本公开的一些实施例的音乐点展示的一些应用场景的示意图;
图3A-3B是根据本公开的一些实施例的用于展示音乐点的方法的另一个应用场景的示意图;
图4A是根据本公开的用于展示音乐点的方法的另一些实施例的流程图;
图4B-4C是根据本公开的一些实施例的对视频片段进行调整的一些应用场景的示意图;
图5是根据本公开的用于展示音乐点的装置的一些实施例的结构示意图;
图6是适于用来实现本公开的一些实施例的电子设备的结构示意图。
具体实施方式
下面将参照附图更详细地描述本公开的实施例。虽然附图中显示了本公开的某些实施例,然而应当理解的是,本公开可以通过各种形式来实现,而且不应该被解释为限于这里阐述的实施例。相反,提供这些实施例是为了更加透彻和完整地理解本公开。应当理解的是,本公开的附图及实施例仅用于示例性作用,并非用于限制本公开的保护范围。
另外还需要说明的是,为了便于描述,附图中仅示出了与有关发明相关的部分。在不冲突的情况下,本公开中的实施例及实施例中的特征可以相互组合。
需要注意,本公开中提及的“第一”、“第二”等概念仅用于对不同的装置、模块或单元进行区分,并非用于限定这些装置、模块或单元所执行的功能的顺序或者相互依存关系。
需要注意,本公开中提及的“一个”、“多个”的修饰是示意性而非限制性的,本领域 技术人员应当理解,除非在上下文另有明确指出,否则应该理解为“一个或多个”。
本公开实施方式中的多个装置之间所交互的消息或者信息的名称仅用于说明性的目的,而并不是用于对这些消息或信息的范围进行限制。
下面将参考附图并结合实施例来详细说明本公开。
图1是根据本公开的一些实施例的用于展示音乐点的方法的一个应用场景的示意图;在如图1的应用场景中所示,首先,终端设备101(图1中示出为手机)将得到音频素材1011。音频素材1011可以是应用默认音频,也可以是用户选择的音频。终端设备101对音频素材1011进行分析得到音频素材1011中的初始音乐点1012-1014。终端设备101给音乐点1012生成对应的音乐点标识10121。终端设备101给音乐点1013生成对应的音乐点标识10131。终端设备101给音乐点1014生成对应的音乐点标识10141。终端设备101将音乐点标识10121,音乐点标识10131和音乐点标识10141显示到视频剪辑的操作界面103上的音频素材1011的剪辑时间轴102上。
可以理解的是,用于展示音乐点的方法可以是由终端设备101来执行,或者也可以是由其它设备来执行,或者还可以是各种软件程序来执行。其中,终端设备101例如可以是具有显示屏的各种电子设备,包括但不限于智能手机、平板电脑、电子书阅读器、膝上型便携计算机和台式计算机等等。此外,执行主体也可以体现为服务器、软件等。当执行主体为软件时,可以安装在上述所列举的电子设备中。其可以实现成例如用来提供分布式服务的多个软件或软件模块,也可以实现成单个软件或软件模块。在此不做具体限定。
应该理解,图1中的手机的数目仅仅是示意性的。根据实现需要,可以具有任意数目的手机。
在视频剪辑过程中用户需要确定视频片段之间的切换点,为了使音乐能够与视频配合,通常可以使用节拍点、旋律点等音乐点作为视频片段的切换点。这就需要用户自己听音乐来找到音乐点。但是,很多用户自己是难以听出音乐点的,即使能听出音乐点,反复听音乐查找音乐点也是非常费时费力的。为了提高视频剪辑的速度,节省用户的时间,可以参考图2A。图2A示出了根据本公开的用于展示音乐点的方法的一些实施例的流程200。该用于展示音乐点的方法,包括以下步骤:
步骤201,获取音频素材。
在一些实施例中,用于展示音乐点的方法的执行主体(例如,图1所示的终端设备101)可以通过有线连接方式或者无线连接方式,获取音频素材。作为示例,上述音频素材可以是用户存储在本地的音乐,也可以是网络上的音乐。
步骤202,分析上述音频素材中的初始音乐点。
在一些实施例中,上述执行主体可以确定音频素材的初始音乐点。在这里,上述初始音乐点包括上述音频素材中的节拍点和/或音符起始点。
作为示例,当初始音乐点为音频素材中满足设定的音乐性发生变换的位置。上述音乐性发生变换的位置可以包括节拍发生变换的位置和旋律发生变换的位置。基于此,初始音乐点可以通过如下方式来确定:上述执行主体可以对上述音频素材进行分析,确定其中的节拍点和音符起始点,其中,节拍点为节拍发生变换的位置,音符起始点为旋律发生变换的位置。具体地,一方面可以采用基于深度学习的节拍分析算法对音频素材进行分析,得 到音频素材中的节拍点以及节拍点所在的时间戳,另一方面对音频素材进行短时频谱分析,得到音频素材中的音符起始点以及音符起始点所在的时间戳。在这里,音符起始点可以是通过起始点检测器(onset detector)得到。然后,统一通过两种方式得到的节拍点和音符起始点,对节拍点和音符起始点进行合并及去重,从而得到初始音乐点。
作为另一示例,如图2B所示,响应于检测到针对音乐展示界面211上第一控件212的第三用户操作,确定上述音频素材中重拍点作为上述初始音乐点;上述第一控件212通常是用于触发确定上述音频素材中重拍,其中,上述音乐展示界面是响应于检测到针对上述操作界面上的上述音频素材的选择操作而显示的。上述重拍通常指的是强拍,在音乐中节拍分为强拍和弱拍,强拍则通常是音乐力度强的节拍。作为示例,在四四拍中,节拍力度表现为,第一拍为强拍,第二拍为弱拍,第三拍为次强拍,第四拍为弱拍,四四拍以四分音符为一拍,每一小节有4拍的节拍。上述第三用户操作可以是指用户对上述第一控件212的点击操作。响应于检测到针对上述音乐展示界面211上第二控件213的第四用户操作,确定上述音频素材中节拍点作为上述初始音乐点;上述第二控件213通常是用于触发确定上述音频素材中节拍。上述第四用户操作可以是指用户对上述第二控件213的点击操作。响应于检测到针对上述音乐展示界面211上第三控件214的第五用户操作,确定上述音频素材中重音点作为上述初始音乐点。上述第三控件214通常是用于触发确定上述音频素材中重音。上述重音可以指乐曲中强度较大的音,其中,重音点可以是上述音符起始点中旋律变强的位置,例如,在乐谱中有重音记号标识的节拍,上述重音记号包括以下至少一项:“>”和“^”。其中,当“>”和“^”同时出现时“^”表示更强的重音。上述第五用户操作可以是指用户对上述第三控件214的点击操作。
步骤203,在视频剪辑的操作界面上,按照上述音频素材在剪辑时间轴上的位置以及目标音乐点在上述音频素材中的位置,在上述剪辑时间轴上展示上述目标音乐点的标识。
在一些实施例中,上述执行主体可以在视频剪辑的操作界面上,按照上述音频素材在剪辑时间轴上的位置以及目标音乐点在上述音频素材中的位置,在上述剪辑时间轴上展示上述目标音乐点的标识。其中,上述目标音乐点为部分或全部的上述初始音乐点。
作为示例,分析出的初始音乐点可以都作为目标音乐点进行展示。例如,假设初始音乐点有3个,分别是音乐点a,音乐点b和音乐点c,目标音乐点可以是上述音乐点a,音乐点b和音乐点c。
作为另一示例,响应于检测到针对上述操作界面的第三用户操作,可以从上述初始音乐点中选出目标音乐点进行展示。上述第三用户操作包括以下至少一项:音乐点增加操作,音乐点删除操作。例如,假设初始音乐点有3个,分别是音乐点a,音乐点b和音乐点c,当用户添加了音乐点d,那么目标音乐点可以是上述音乐点a,音乐点b,音乐点c和音乐点d。又如,当用户删除了音乐点b,那么目标音乐点可以是音乐点a和音乐点c。上述标识可以是预先设置的图标,例如,三角形,圆形,星形等。
在一些实施例的一些可选的实现方式中,如图2C所示,可以在上述剪辑时间轴225上展示上述音频素材的音频波形,在上述音频波形的相应位置展示上述目标音乐点的标识222-224。上述音频波形通常是指音频以波形图的形式展示出来的图像。将上述目标音乐点的标识按照对应的音乐点在上述图像上的位置,在上述图像上展示上述目标音乐点的标识。
由上述示例可以看出,如果用户手动确定音频素材中的音乐点,会花费大量的时间。 而通过分析音频素材中的初始音乐点,可以提高确定音乐点的效率。在剪辑时间轴上展示音乐点的标识,方便了用户对音乐点的选择。
参考图3A-3B,是根据本公开的一些实施例的用于展示音乐点的方法的另一个应用场景的示意图;如图3A的应用场景中所示,首先,用户可以在终端设备301的上传页面3017上选择多条影像素材。例如,上传页面3017中所示的图片3011,视频3012,图片3013,图片3014。用户单击附图标记3015和附图标记3018所示的位置,选中图片3011和视频3012。用户点击“下一步”按键3016,上述终端设备301基于选中的图片3011生成影像素材304,将视频3012作为影像素材305。根据得到的影像素材的数量302(图中示出为2),从目标音乐点10121,目标音乐点10131,目标音乐点10141中确定出对齐音乐的307,此时对齐音乐点307实际上可以使音频素材划分成音乐片段A和音乐片段B。根据音乐片段A的时长和音乐片段B的时长,分别对影像素材304、影像素材305进行处理,可以得到分别与音乐片段A和音乐片段B对应的视频片段3041和3051。然后,按照音乐片段A和音乐片段B在上述音频素材中的位置,可以将视频片段3041和3051在视频剪辑的操作界面313的剪辑时间轴312上进行展示。
可以理解的是,用于展示音乐点的方法可以是由终端设备301来执行,或者也可以是由其他设备来执行,或者还可以是各种软件程序来执行。其中,终端设备301例如可以是具有显示屏的各种电子设备,包括但不限于智能手机、平板电脑、电子书阅读器、膝上型便携计算机和台式计算机等等。此外,执行主体也可以体现为服务器、软件等。当执行主体为软件时,可以安装在上述所列举的电子设备中。其可以实现成例如用来提供分布式服务的多个软件或软件模块,也可以实现成单个软件或软件模块。在此不做具体限定。
应该理解,图3A-3B中的手机的数目仅仅是示意性的。根据实现需要,可以具有任意数目的手机。
继续参考图4A,示出了根据本公开的用于展示音乐点的方法的一些实施例的流程400。该用于展示音乐点的方法,包括以下步骤:
步骤401,获取音频素材。
步骤402,分析上述音频素材中的初始音乐点。
步骤403,在视频剪辑的操作界面上,按照上述音频素材在剪辑时间轴上的位置以及目标音乐点在上述音频素材中的位置,在上述剪辑时间轴上展示上述目标音乐点的标识。
在一些实施例中,步骤401-403的具体实现及所带来的技术效果可以参考图2对应的那些实施例中的步骤201-203,在此不再赘述。
步骤404,获取用于视频剪辑的影像素材。
在一些实施例中,上述执行主体可以先获取用于视频剪辑的影像素材,再获取音频素材。同时上述执行主体也可以先获取音频素材,再获取用于视频剪辑的影像素材。
在一些实施例中,上述执行主体可以通过有线连接方式或者无线连接方式,获取用于视频剪辑的影像素材。其中,上述影像素材包括以下至少一项:图片素材,视频素材。作为示例,上述图片素材可以是用户存储在本地的图片,还可以是用户从网上下载的图片。上述视频素材可以是用户上传的视频,也可以是用户存储在本地的视频,还可以是用户从 网上下载的视频。
步骤405,从上述目标音乐点中确定对齐音乐点。
在一些实施例中,上述执行主体可以首先得到步骤203中确定的音频素材中的目标音乐点。然后,上述执行主体可以从已经得到的各个目标音乐点中选取出目标数量的对齐音乐点。上述对齐音乐点可以是全部的目标音乐点,也可以是部分目标音乐点。上述目标数量可以根据获取的上述影像素材的数量来确定,或者也可以是根据上述音频素材中具有的强拍数量来确定,或者还可以是用户设定的数量。上述执行主体基于确定的对齐音乐点。对上述音频素材进行划分,得到多个音频片段。作为示例,当确定4个对齐音乐点,上述音频素材可以被划分成5个音乐片段。
作为一种示例,当上述影像素材的影像素材数量少于上述多个音频片段的数量时,可以根据上述影像素材的数量,从上述多个音频片段中确定出与上述影像素材相同数量的音频片段;然后根据上述音频片段的数量,确定上述对齐音乐点的数量。作为示例,当获取到5个影像素材时,可以确定需要5个音频片段,从而可以确定需要4个对齐音乐点。在这里,首先,上述执行主体可以根据上述影像素材数量,确定上述影像素材数量对应的音频片段数量。例如,影像素材数量是5个,那么对应的音频片段数量也是5个。然后,根据上述音频片段数量,确定对齐音乐点的第一目标数量。例如,音频片段数量是5个,那么对齐音乐点的第一目标数量就应该是4个。最后,根据音乐点的优先级从高到低的顺序和/或音乐点在上述音频素材中出现的时间从先到后的顺序,从上述目标音乐点中选择出上述第一目标数量个音乐点作为对齐音乐点。上述音乐点的优先级可以是预先设定的。例如,可以是重拍点的优先级高于次重拍点的优先级,次重拍点的优先级高于重音点的优先级。
作为另一种示例,当上述影像素材的影像素材数量多于上述多个音频片段的数量时,可以将第二目标数量个音乐点和上述目标音乐点确定为对齐音乐点,执行主体可以先计算上述多个音频片段的数量与上述影像素材数量的差值,然后根据上述差值确定上述第二目标数量的值。例如,当得到5个影像素材,但是只有3个音频片段时,5个影像素材需要4个对齐音乐点,根据只有3个音频片段可以确定现在只有2个对齐音乐点,这时可以确定上述第二目标数量的值是2。也就是还需要再确定2个对齐音乐点。这里的2个对齐音乐点可以是用户手动添加的,也可以是执行主体在上述音频素材中选择除已有的2个对齐音乐点以外的音乐节拍,如在已有的节拍点之间插入中间拍。
作为又一种示例,当上述影像素材的影像素材数量多于上述多个音频片段的数量时,可以选择上述多个音频片段的数量个影像素材,从上述影像素材中确定出与上述多个音频片段相同数量的影像素材。例如,可以根据上述影像素材获取到的先后顺序进行选择。然后根据上述多个音频片段的数量,确定上述对齐音乐点的数量。作为示例,当获取到5个影像素材,4个音频片段时,可以选择4个影像素材,根据4个音频片段可以确定需要3个对齐音乐点。
步骤406,响应于检测到针对上述操作界面的第一用户操作,利用上述影像素材,为上述音频素材中的每个音频片段分别生成一个视频片段,得到多个视频片段。
在一些实施例中,当上述执行主体检测到针对上述操作界面的第一用户操作,利用上述影像素材,为上述音频素材中的每个音乐片段分别生成一个视频片段,得到多个视频片段。上述第一用户操作通常是用于触发上述多个音频片段和上述多个视频片段进行对齐。
作为示例,可以根据上述影像素材的选择顺序与上述音频片段进行对齐。例如,有三个影像素材分别是影像素材1,影像素材2,影像素材3,就可以是影像素材1与在上述音频素材中第一个出现的音频片段进行对齐。也可以根据上述影像素材的时长与上述音频片段进行对齐。例如,时长最长的影像素材与时长最长的音频片段进行对齐。
针对音频素材中的每一个音频片段,上述执行主体可以基于影像素材为该音频片段生成一个与该音频片段时长相同的视频片段。作为示例,假设音频素材被划分成3个音频片段,3个音频片段的时长分别是1秒、2秒和3秒时,那么与上述音乐片段相对应的视频片段的时长也可以分别是1秒、2秒和3秒。
作为一种示例,上述执行主体可以根据一个影像素材生成多个视频片段。例如,假设上述执行主体获取到一个10秒影像素材和一个8秒的音频素材,该执行主体根据对齐音乐点将该音频素材划分成3个音频片段,时长分别是2秒、3秒和3秒,则该执行主体可以从该影像素材中裁剪出3个不同的视频片段,时长分别是2秒、3秒和3秒。
作为另一种示例,上述执行主体也可以根据上述影像素材的时长确定上述影像素材对应的音频片段,其中,音乐片段对应的影像素材的长度不小于音乐片段的长度。利用上述影像素材为上述音频片段生成一个视频片段。例如,当使用一个影像素材为一个音频片段生成一个视频片段时,在该影像素材的时长大于该音频片段的时长时,可以在该原影像素材中截取与该音频片段的时长相等的视频片段,也可以将该原影像素材的播放速度加快来减短时长,再将变速后的影像素材作为视频片段,使视频片段的时长与音频片段的时长相等。
作为又一种示例,在该影像素材的时长小于该音频片段的时长时,则可以将该原影像素材的播放速度变慢来加长时长,再将变速后的影像素材作为视频片段,使视频片段的时长与音频片段的时长相等。可以理解的是,对于影像素材中的图片素材,可以将图片素材生成一个固定时长的视频素材,如3秒,然后再利用该视频素材为音乐片段生成视频片段。
作为还一种示例,如图4B所示,响应于检测到针对上述操作界面411中的第一视频片段413的第二用户操作414,显示上述第一视频片段413对应的影像素材419的调整界面415;上述第一视频片段413通常是检测到用户操作的视频片段。上述第二用户操作414可以是用户对上述第一视频片段413的点击操作,进而进入上述第一视频片段413对应的影像素材419的调整界面415。响应于检测到在上述影像素材419的调整界面415上针对上述影像素材419的手动截取操作418,确定上述手动截取操作418在上述影像素材419中选中的截取区间附图标记416与附图标记417之间的区间;上述截取区间是根据上述第一视频片段413对应的音频片段的时长确定的。上述手动截取操作418可以是用户滑动上述第一视频片段413对应的影像素材419,使上述截取区间中的视频片段420变成用户需要的。上述执行主体可以将上述视频片段420对齐到上述第一视频片段413的位置上。按照上述截取区间,从上述影像素材中截取出素材作为第二视频片段420。将上述截取区间范围内的视频片段作为第二视频片段420。
步骤407,按照上述音频素材在剪辑时间轴上的位置以及上述视频片段对应的音频片段在上述音频素材中的位置,在上述剪辑时间轴上展示上述视频片段。
在一些实施例中,上述执行主体可以根据上述音频素材在剪辑时间轴上的位置以及上述视频片段对应的音频片段在上述音频素材中的位置,在上述剪辑时间轴上展示上述视频 片段。作为示例,可以根据音乐点将上述音频素材按照顺序划分成3段,例如,A段可以是从0秒到2秒,B段可以是从2秒到5秒,C段可以是从5秒到10秒。对应的视频片段分别是a段,b段,c段。那么在上述视频剪辑的操作界面上按照顺序展示视频片段a,b,c。
作为一种示例,用户在剪辑轴上拖动视频片段的视频边界时,当视频边界拖动到的位置与某个音乐点之间的距离小于预设阈值时,视频片段的视频边界会被确定为该音乐点的位置。例如,如图4C所示,当用户拖动视频片段4210的视频边界4212到界面422所示的位置时,视频片段4210的视频边界4212会自动吸附到音乐点4214的位置,视频片段4210会得到对应的视频片段4217,同时视频片段4217的也会有对应的视频边界4216。
本公开的一些实施例公开的用于展示音乐点的方法,通过根据获取到影像素材和确定的目标音乐点,得到多个视频片段和多个音乐片段,将上述多个视频片段展示到操作界面上,可以让用户直观的看到视频片段的播放顺序和时长,进而提高了用户剪辑视频的速度。
进一步参考图5,作为对上述各图上述方法的实现,本公开提供了一种用于展示音乐点的装置的一些实施例,这些装置实施例与图2上述的那些方法实施例相对应,该装置具体可以应用于各种电子设备中。
如图5所示,一些实施例的用于展示音乐点的装置500包括:获取单元501、分析单元502和展示单元503。其中,获取单元501,被配置成获取音频素材;分析单元502,被配置成分析上述音频素材中的初始音乐点;上述初始音乐点包括上述音频素材中的节拍点和/或音符起始点;展示单元503,被配置成在视频剪辑的操作界面上,按照上述音频素材在剪辑时间轴上的位置以及目标音乐点在上述音频素材中的位置,在上述剪辑时间轴上展示上述目标音乐点的标识;上述目标音乐点为部分或全部的上述初始音乐点。
在一些实施例中,用于展示音乐点的装置500还包括:第一获取单元,被配置成获取用于视频剪辑的影像素材,其中,上述影像素材包括以下至少一项:图片素材,视频素材;确定单元,被配置成从上述目标音乐点中确定对齐音乐点,其中,上述对齐音乐点用于将上述音频素材划分成多个音频片段;生成单元,被配置成响应于检测到针对上述操作界面的第一用户操作,利用上述影像素材,为上述音频素材中的每个音乐频段分别生成一个视频片段,得到多个视频片段,其中,相对应的音频片段和视频片段具有相同的时长;第一展示单元,被配置成按照上述音频素材在剪辑时间轴上的位置以及上述视频片段对应的音频片段在上述音频素材中的位置,在上述剪辑时间轴上展示上述视频片段。
在一些实施例中,用于展示音乐点的装置500中的确定单元还包括:第一确定子单元,被配置成响应于上述影像素材的影像素材数量少于上述多个音频片段的数量,根据上述影像素材数量,从上述多个音频片段中确定出上述影像素材数量个音频片段;第二确定单元,被配置成根据上述音频片段的数量,确定上述对齐音乐点的数量。
在一些实施例中,用于展示音乐点的装置500中的确定单元中的第二确定单元被进一步配置成:根据上述影像素材数量,确定上述影像素材数量对应的音频片段数量;根据上述音频片段数量,确定对齐音乐点的第一目标数量;根据音乐点的优先级从高到低的顺序和/或音乐点在上述音频素材中出现的时间从先到后的顺序,从上述目标音乐点中选择出上述第一目标数量个音乐点作为对齐音乐点,其中,重拍点的优先级高于次重拍点的优先级, 次重拍点的优先级高于重音点的优先级
在一些实施例中,用于展示音乐点的装置500中的确定单元被进一步配置成:响应于上述影像素材的影像素材数量多于上述多个音频片段的数量,将第一数量个音乐点和上述目标音乐点确定为对齐音乐点,其中,上述第一数量是根据上述多个音频片段的数量与上述影像素材数量的差值确定的。
在一些实施例中,用于展示音乐点的装置500中的生成单元被进一步配置成:根据上述影像素材的时长确定上述影像素材对应的音频片段,其中,音乐片段对应的影像素材的长度不小于音乐片段的长度;利用上述影像素材为上述音频片段生成一个视频片段。
在一些实施例中,用于展示音乐点的装置500中的生成单元被进一步配置成:响应于上述影像素材的时长小于上述影像素材对应的音乐片段的时长,调整上述影像素材的播放速度得到上述音乐片段的时长的视频片段。
在一些实施例中,用于展示音乐点的装置500被进一步配置成:响应于检测到针对上述操作界面中的第一视频片段的第二用户操作,显示上述第一视频片段对应的影像素材的调整界面;响应于检测到在上述影像素材的调整界面上针对上述影像素材的手动截取操作,确定上述手动截取操作在上述影像素材中选中的截取区间;按照上述截取区间,从上述影像素材中截取出素材作为第二视频片段。
在一些实施例中,用于展示音乐点的装置500中的分析单元502被进一步配置成:响应于检测到针对上述操作界面上第一控件的第三用户操作,确定上述音频素材中重拍点作为上述初始音乐点;响应于检测到针对上述操作界面上第二控件的第四用户操作,确定上述音频素材中节拍点作为上述初始音乐点;响应于检测到针对上述操作界面上第三控件的第五用户操作,确定上述音频素材中重音点作为上述初始音乐点。
在一些实施例中,用于展示音乐点的装置500被进一步配置成:响应于检测到针对上述操作界面的第三用户操作,从上述初始音乐点中确定上述音频的目标音乐点,其中,上述第三用户操作包括以下至少一项:音乐点增加操作,音乐点删除操作。
在一些实施例中,用于展示音乐点的装置500中的展示单元503被进一步配置成:在上述剪辑时间轴上展示上述音频素材的音频波形,在上述音频波形的相应位置展示上述目标音乐点的标识。
由上述示例可以看出,如果用户手动确定音频素材中的音乐点,会花费大量的时间。而通过分析音频素材中的初始音乐点,可以提高确定音乐点的效率。在剪辑时间轴上展示音乐点的标识,方便了用户对音乐点的选择。
下面参考图6,其示出了适于用来实现本公开的一些实施例的电子设备(例如图1中的服务器)600的结构示意图。本公开的一些实施例中的终端设备可以包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、PDA(个人数字助理)、PAD(平板电脑)、PMP(便携式多媒体播放器)、车载终端(例如车载导航终端)等等的移动终端以及诸如数字TV、台式计算机等等的固定终端。图6示出的终端设备仅仅是一个示例,不应对本公开的实施例的功能和使用范围带来任何限制。
如图6所示,电子设备600可以包括处理装置(例如中央处理器、图形处理器等)601,其可以根据存储在只读存储器(ROM)602中的程序或者从存储装置608加载到随机访问 存储器(RAM)603中的程序而执行各种适当的动作和处理。在RAM 603中,还存储有电子设备600操作所需的各种程序和数据。处理装置601、ROM 602以及RAM 603通过总线604彼此相连。输入/输出(I/O)接口605也连接至总线604。
通常,以下装置可以连接至I/O接口605:包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置606;包括例如液晶显示器(LCD)、扬声器、振动器等的输出装置607;包括例如存储卡等的存储装置608;以及通信装置609。通信装置609可以允许电子设备600与其他设备进行无线或有线通信以交换数据。虽然图6示出了具有各种装置的电子设备600,但是应理解的是,并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。图6中示出的每个方框可以代表一个装置,也可以根据需要代表多个装置。
特别地,根据本公开的一些实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的一些实施例包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的一些实施例中,该计算机程序可以通过通信装置609从网络上被下载和安装,或者从存储装置608被安装,或者从ROM 602被安装。在该计算机程序被处理装置601执行时,执行本公开的一些实施例的方法中限定的上述功能。
需要说明的是,本公开的一些实施例上述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开的一些实施例中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开的一些实施例中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、RF(射频)等等,或者上述的任意合适的组合。
在一些实施方式中,客户端、服务器可以利用诸如HTTP(HyperText Transfer Protocol,超文本传输协议)之类的任何当前已知或未来研发的网络协议进行通信,并且可以与任意形式或介质的数字数据通信(例如,通信网络)互连。通信网络的示例包括局域网(“LAN”),广域网(“WAN”),网际网(例如,互联网)以及端对端网络(例如,ad hoc端对端网络),以及任何当前已知或未来研发的网络。
上述计算机可读介质可以是上述电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备执行时,使得该电子设备:获取音频素材;分析上述音频素材中的初始音 乐点;上述初始音乐点包括上述音频素材中的节拍点和/或音符起始点;在视频剪辑的操作界面上,按照上述音频素材在剪辑时间轴上的位置以及目标音乐点在上述音频素材中的位置,在上述剪辑时间轴上展示上述目标音乐点的标识;上述目标音乐点为部分或全部的上述初始音乐点。
可以以一种或多种程序设计语言或其组合来编写用于执行本公开的一些实施例的操作的计算机程序代码,上述程序设计语言包括面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)——连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
附图中的流程图和框图,图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本公开的一些实施例中的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。所描述的单元也可以设置在处理器中,例如,可以描述为:一种处理器包括获取单元、分析单元和展示单元。其中,这些单元的名称在某种情况下并不构成对该单元本身的限定,例如,获取单元还可以被描述为“获取音频素材的单元”。
本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如,非限制性地,可以使用的示范类型的硬件逻辑部件包括:现场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、片上系统(SOC)、复杂可编程逻辑设备(CPLD)等等。
根据本公开的一个或多个实施例,提供了一种用于展示音乐点的方法,包括:获取音频素材;分析上述音频素材中的初始音乐点;上述初始音乐点包括上述音频素材中的节拍点和/或音符起始点;在视频剪辑的操作界面上,按照上述音频素材在剪辑时间轴上的位置以及目标音乐点在上述音频素材中的位置,在上述剪辑时间轴上展示上述目标音乐点的标识;上述目标音乐点为部分或全部的上述初始音乐点。
根据本公开的一个或多个实施例,该方法还包括:获取用于视频剪辑的影像素材,其中,上述影像素材包括以下至少一项:图片素材,视频素材;从上述目标音乐点中确定对齐音乐点,其中,上述对齐音乐点用于将上述音频素材划分成多个音频片段;响应于检测到针对上述操作界面的第一用户操作,利用上述影像素材,为上述音频素材中的每个音乐频段分别生成一个视频片段,得到多个视频片段,其中,相对应的音频片段和视频片段具 有相同的时长;按照上述音频素材在剪辑时间轴上的位置以及上述视频片段对应的音频片段在上述音频素材中的位置,在上述剪辑时间轴上展示上述视频片段。
根据本公开的一个或多个实施例,从上述目标音乐点中确定对齐音乐点,包括:响应于上述影像素材的影像素材数量少于上述多个音频片段的数量,根据上述影像素材数量,从上述多个音频片段中确定出上述影像素材数量个音频片段;根据上述音频片段的数量,确定上述对齐音乐点的数量。
根据本公开的一个或多个实施例,根据上述影像素材数量,确定上述对齐音乐点的数量,包括:根据音乐点的优先级从高到低的顺序从上述目标音乐点中选择出目标数量个音乐点作为对齐音乐点,其中,重拍点的优先级高于次重拍点的优先级,次重拍点的优先级高于重音点的优先级。
根据本公开的一个或多个实施例,从上述目标音乐点中确定对齐音乐点,包括:响应于上述影像素材的影像素材数量多于上述多个音频片段的数量,将第一数量个音乐点和上述目标音乐点确定为对齐音乐点,其中,上述第一数量是根据上述多个音频片段的数量与上述影像素材数量的差值确定的。
根据本公开的一个或多个实施例,利用上述影像素材,为上述音频素材中的每个音乐频段分别生成一个视频片段,得到多个视频片段,包括:根据上述影像素材的时长确定上述影像素材对应的音频片段,其中,音乐片段对应的影像素材的长度不小于音乐片段的长度;利用上述影像素材为上述音频片段生成一个视频片段。
根据本公开的一个或多个实施例,利用上述影像素材,为上述音频素材中的每个音乐频段分别生成一个视频片段,得到多个视频片段,包括:响应于上述影像素材的时长小于上述影像素材对应的音乐片段的时长,调整上述影像素材的播放速度得到上述音乐片段的时长的视频片段。
根据本公开的一个或多个实施例,该方法还包括:响应于检测到针对上述操作界面中的第一视频片段的第二用户操作,显示上述第一视频片段对应的影像素材的调整界面;响应于检测到在上述影像素材的调整界面上针对上述影像素材的手动截取操作,确定上述手动截取操作在上述影像素材中选中的截取区间;按照上述截取区间,从上述影像素材中截取出素材作为第二视频片段。
根据本公开的一个或多个实施例,分析上述音频素材中的初始音乐点,包括:响应于检测到针对上述操作界面上第一控件的第三用户操作,确定上述音频素材中重拍点作为上述初始音乐点;响应于检测到针对上述操作界面上第二控件的第四用户操作,确定上述音频素材中节拍点作为上述初始音乐点;响应于检测到针对上述操作界面上第三控件的第五用户操作,确定上述音频素材中重音点作为上述初始音乐点。
根据本公开的一个或多个实施例,该方法还包括:响应于检测到针对上述操作界面的第三用户操作,从上述初始音乐点中确定上述音频的目标音乐点,其中,上述第三用户操作包括以下至少一项:音乐点增加操作,音乐点删除操作。
根据本公开的一个或多个实施例,在视频剪辑的操作界面上,按照上述音频素材在剪辑时间轴上的位置以及目标音乐点在上述音频素材中的位置,在上述剪辑时间轴上展示上述目标音乐点的标识,包括:在上述剪辑时间轴上展示上述音频素材的音频波形,在上述音频波形的相应位置展示上述目标音乐点的标识。
根据本公开的一个或多个实施例,该装置包括:获取单元,被配置成获取音频素材;分析单元,被配置成分析上述音频素材中的初始音乐点;上述初始音乐点包括上述音频素材中的节拍点和/或音符起始点;展示单元,被配置成在视频剪辑的操作界面上,按照上述音频素材在剪辑时间轴上的位置以及目标音乐点在上述音频素材中的位置,在上述剪辑时间轴上展示上述目标音乐点的标识;上述目标音乐点为部分或全部的上述初始音乐点。
根据本公开的一个或多个实施例,提供了一种电子设备,包括:一个或多个处理器;存储装置,其上存储有一个或多个程序,当一个或多个程序被一个或多个处理器执行,使得一个或多个处理器实现如上述任一实施例描述的方法。
根据本公开的一个或多个实施例,提供了一种计算机可读介质,其上存储有计算机程序,其中,程序被处理器执行时实现如上述任一实施例描述的方法。
根据本公开的一个或多个实施例,提供了一种计算机程序,包括程序代码,当计算机运行上述计算机程序时,上述程序代码执行如上述任一实施例描述的方法。
以上描述仅为本公开的一些较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本公开的实施例中所涉及的发明范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离上述发明构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本公开的实施例中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。

Claims (15)

  1. 一种用于展示音乐点的方法,包括:
    获取音频素材;
    分析所述音频素材中的初始音乐点;所述初始音乐点包括所述音频素材中的节拍点和/或音符起始点;
    在视频剪辑的操作界面上,按照所述音频素材在剪辑时间轴上的位置以及目标音乐点在所述音频素材中的位置,在所述剪辑时间轴上展示所述目标音乐点的标识;所述目标音乐点为部分或全部的所述初始音乐点。
  2. 根据权利要求1所述的方法,其中,所述方法还包括:
    获取用于视频剪辑的影像素材,其中,所述影像素材包括以下至少一项:图片素材,视频素材;
    从所述目标音乐点中确定对齐音乐点,其中,所述对齐音乐点用于将所述音频素材划分成多个音频片段;
    响应于检测到针对所述操作界面的第一用户操作,利用所述影像素材,为所述音频素材中的每个音乐频段分别生成一个视频片段,得到多个视频片段,其中,相对应的音频片段和视频片段具有相同的时长;
    按照所述音频素材在剪辑时间轴上的位置以及所述视频片段对应的音频片段在所述音频素材中的位置,在所述剪辑时间轴上展示所述视频片段。
  3. 根据权利要求2所述的方法,其中,所述从所述目标音乐点中确定对齐音乐点,包括:
    根据所述影像素材的影像素材数量,确定对齐音乐点的第一目标数量;
    根据音乐点的优先级从高到低的顺序和/或音乐点在所述音频素材中出现的时间从先到后的顺序,从所述目标音乐点中选择出所述第一目标数量个音乐点作为对齐音乐点,其中,重拍点的优先级高于次重拍点的优先级,次重拍点的优先级高于重音点的优先级。
  4. 根据权利要求3所述的方法,其中,所述根据所述影像素材的影像素材数量,确定对齐音乐点的第一目标数量,包括:
    响应于所述影像素材的影像素材数量少于所述多个音频片段的数量,根据所述影像素材数量,从所述多个音频片段中确定出所述影像素材数量个音频片段;
    根据所述音频片段的数量,确定对齐音乐点的第一目标数量。
  5. 根据权利要求2所述的方法,其中,所述从所述目标音乐点中确定对齐音乐点,包括:
    响应于所述影像素材的影像素材数量多于所述多个音频片段的数量,将第二目标数量个音乐点和所述目标音乐点确定为对齐音乐点,其中,所述第二目标数量是根据所述多个音频片段的数量与所述影像素材数量的差值确定的。
  6. 根据权利要求2所述的方法,其中,所述利用所述影像素材,为所述音频素材中的每个音乐频段分别生成一个视频片段,得到多个视频片段,包括:
    根据所述影像素材的时长确定所述影像素材对应的音频片段,其中,音乐片段对 应的影像素材的长度不小于音乐片段的长度;
    利用所述影像素材为所述音频片段生成一个视频片段。
  7. 根据权利要求2所述的方法,其中,所述利用所述影像素材,为所述音频素材中的每个音乐频段分别生成一个视频片段,得到多个视频片段,包括:
    响应于所述影像素材的时长小于所述影像素材对应的音乐片段的时长,调整所述影像素材的播放速度得到所述音乐片段的时长的视频片段。
  8. 根据权利要求2所述的方法,其中,所述方法还包括:
    响应于检测到针对所述操作界面中的第一视频片段的第二用户操作,显示所述第一视频片段对应的影像素材的调整界面;
    响应于检测到在所述影像素材的调整界面上针对所述影像素材的手动截取操作,确定所述手动截取操作在所述影像素材中选中的截取区间;
    按照所述截取区间,从所述影像素材中截取出素材作为第二视频片段。
  9. 根据权利要求1-8之一所述的方法,其中,所述分析所述音频素材中的初始音乐点,包括:
    响应于检测到针对音乐展示界面上第一控件的第三用户操作,确定所述音频素材中重拍点作为所述初始音乐点,其中,所述音乐展示界面是响应于检测到针对所述操作界面上的所述音频素材的选择操作而显示的;
    响应于检测到针对所述音乐展示界面上第二控件的第四用户操作,确定所述音频素材中节拍点作为所述初始音乐点;
    响应于检测到针对所述音乐展示界面上第三控件的第五用户操作,确定所述音频素材中重音点作为所述初始音乐点。
  10. 根据权利要求1-8之一所述的方法,其中,所述方法还包括:
    响应于检测到针对所述操作界面的第三用户操作,从所述初始音乐点中确定所述音频的目标音乐点,其中,所述第三用户操作包括以下至少一项:音乐点增加操作,音乐点删除操作。
  11. 根据权利要求1-8之一所述的方法,其中,所述在视频剪辑的操作界面上,按照所述音频素材在剪辑时间轴上的位置以及目标音乐点在所述音频素材中的位置,在所述剪辑时间轴上展示所述目标音乐点的标识,包括:
    在所述剪辑时间轴上展示所述音频素材的音频波形,在所述音频波形的相应位置展示所述目标音乐点的标识。
  12. 一种用于展示音乐点的装置,包括:
    获取单元,被配置成获取音频素材;
    分析单元,被配置成分析所述音频素材中的初始音乐点;所述初始音乐点包括所述音频素材中的节拍点和/或音符起始点;
    展示单元,被配置成在视频剪辑的操作界面上,按照所述音频素材在剪辑时间轴上的位置以及目标音乐点在所述音频素材中的位置,在所述剪辑时间轴上展示所述目标音乐点的标识;所述目标音乐点为部分或全部的所述初始音乐点。
  13. 一种电子设备,包括:
    一个或多个处理器;
    存储装置,其上存储有一个或多个程序,
    当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-11中任一所述的方法。
  14. 一种计算机可读介质,其上存储有计算机程序,其中,所述程序被处理器执行时实现如权利要求1-11中任一所述的方法。
  15. 一种计算机程序,其特征在于,包括程序代码,当计算机运行所述计算机程序时,所述程序代码执行如权利要求1~11任一项所述的方法。
PCT/CN2020/126261 2019-11-04 2020-11-03 用于展示音乐点的方法、装置、电子设备和介质 WO2021088830A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
KR1020227015287A KR20220091500A (ko) 2019-11-04 2020-11-03 음악 포인트를 표시하는 방법 및 장치, 전자 장치 및 매체
EP20884477.9A EP4044611A4 (en) 2019-11-04 2020-11-03 MUSICAL DOT DISPLAY METHOD AND APPARATUS AND ELECTRONIC DEVICE AND MEDIA
JP2022525690A JP7508552B2 (ja) 2019-11-04 2020-11-03 音楽点を表示するための方法及び装置、並びに電子デバイス及び媒体
US17/735,962 US11587593B2 (en) 2019-11-04 2022-05-03 Method and apparatus for displaying music points, and electronic device and medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911067475.1 2019-11-04
CN201911067475.1A CN110769309B (zh) 2019-11-04 2019-11-04 用于展示音乐点的方法、装置、电子设备和介质

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/735,962 Continuation US11587593B2 (en) 2019-11-04 2022-05-03 Method and apparatus for displaying music points, and electronic device and medium

Publications (1)

Publication Number Publication Date
WO2021088830A1 true WO2021088830A1 (zh) 2021-05-14

Family

ID=69336209

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/126261 WO2021088830A1 (zh) 2019-11-04 2020-11-03 用于展示音乐点的方法、装置、电子设备和介质

Country Status (5)

Country Link
US (1) US11587593B2 (zh)
EP (1) EP4044611A4 (zh)
KR (1) KR20220091500A (zh)
CN (1) CN110769309B (zh)
WO (1) WO2021088830A1 (zh)

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108320730B (zh) * 2018-01-09 2020-09-29 广州市百果园信息技术有限公司 音乐分类方法及节拍点检测方法、存储设备及计算机设备
CN110769309B (zh) * 2019-11-04 2023-03-31 北京字节跳动网络技术有限公司 用于展示音乐点的方法、装置、电子设备和介质
CN112822541B (zh) * 2019-11-18 2022-05-20 北京字节跳动网络技术有限公司 视频生成方法、装置、电子设备和计算机可读介质
CN113497970B (zh) * 2020-03-19 2023-04-11 字节跳动有限公司 视频处理方法、装置、电子设备及存储介质
CN111432141B (zh) * 2020-03-31 2022-06-17 北京字节跳动网络技术有限公司 一种混剪视频确定方法、装置、设备及存储介质
CN111741233B (zh) * 2020-07-16 2021-06-15 腾讯科技(深圳)有限公司 视频配乐方法、装置、存储介质以及电子设备
CN111857923B (zh) * 2020-07-17 2022-10-28 北京字节跳动网络技术有限公司 特效展示方法、装置、电子设备及计算机可读介质
CN111862936A (zh) * 2020-07-28 2020-10-30 游艺星际(北京)科技有限公司 生成及发布作品的方法、装置、电子设备和存储介质
CN111901626B (zh) * 2020-08-05 2021-12-14 腾讯科技(深圳)有限公司 背景音频确定方法、视频剪辑方法、装置和计算机设备
CN112259062B (zh) * 2020-10-20 2022-11-04 北京字节跳动网络技术有限公司 特效展示方法、装置、电子设备及计算机可读介质
CN112579818B (zh) * 2020-12-29 2021-08-13 玖月音乐科技(北京)有限公司 一种五线谱语音批注方法和系统
CN112822543A (zh) * 2020-12-30 2021-05-18 北京达佳互联信息技术有限公司 视频处理方法及装置、电子设备、存储介质
CN113727038B (zh) * 2021-07-28 2023-09-05 北京达佳互联信息技术有限公司 一种视频处理方法、装置、电子设备及存储介质
US20230421841A1 (en) * 2022-06-27 2023-12-28 Rovi Guides, Inc. Methods for conforming audio and short-form video

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007066399A (ja) * 2005-08-30 2007-03-15 Ricoh Co Ltd 映像音声編集システム
CN107393569A (zh) * 2017-08-16 2017-11-24 成都品果科技有限公司 音视频剪辑方法及装置
CN107483843A (zh) * 2017-08-16 2017-12-15 成都品果科技有限公司 音视频匹配剪辑方法及装置
CN110769309A (zh) * 2019-11-04 2020-02-07 北京字节跳动网络技术有限公司 用于展示音乐点的方法、装置、电子设备和介质

Family Cites Families (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7027124B2 (en) * 2002-02-28 2006-04-11 Fuji Xerox Co., Ltd. Method for automatically producing music videos
US7512886B1 (en) * 2004-04-15 2009-03-31 Magix Ag System and method of automatically aligning video scenes with an audio track
JP2006127367A (ja) * 2004-11-01 2006-05-18 Sony Corp 情報管理方法、情報管理プログラムおよび情報管理装置
US7825321B2 (en) * 2005-01-27 2010-11-02 Synchro Arts Limited Methods and apparatus for use in sound modification comparing time alignment data from sampled audio signals
GB2422755A (en) * 2005-01-27 2006-08-02 Synchro Arts Ltd Audio signal processing
WO2006079813A1 (en) * 2005-01-27 2006-08-03 Synchro Arts Limited Methods and apparatus for use in sound modification
US20070044639A1 (en) * 2005-07-11 2007-03-01 Farbood Morwaread M System and Method for Music Creation and Distribution Over Communications Network
WO2008024486A2 (en) * 2006-08-24 2008-02-28 Fliptrack, Inc. Beat and text based editing and composing systems and methods
US7569761B1 (en) * 2007-09-21 2009-08-04 Adobe Systems Inc. Video editing matched to musical beats
CN101587706A (zh) 2009-07-08 2009-11-25 沈阳蓝火炬软件有限公司 流媒体实时音乐节拍分析与舞蹈控制系统及方法
US9613605B2 (en) * 2013-11-14 2017-04-04 Tunesplice, Llc Method, device and system for automatically adjusting a duration of a song
CN104103300A (zh) 2014-07-04 2014-10-15 厦门美图之家科技有限公司 一种根据音乐节拍自动处理视频的方法
CN108040497B (zh) * 2015-06-03 2022-03-04 思妙公司 用于自动产生协调的视听作品的方法和系统
CN107436921B (zh) * 2017-07-03 2020-10-16 李洪海 视频数据处理方法、装置、设备及存储介质
CN109429078B (zh) * 2017-08-24 2022-02-22 北京搜狗科技发展有限公司 视频处理方法和装置、用于视频处理的装置
US10971121B2 (en) * 2018-07-09 2021-04-06 Tree Goat Media, Inc. Systems and methods for transforming digital audio content into visual topic-based segments
US20220208155A1 (en) * 2018-07-09 2022-06-30 Tree Goat Media, INC Systems and methods for transforming digital audio content
CN108600825B (zh) * 2018-07-12 2019-10-25 北京微播视界科技有限公司 选择背景音乐拍摄视频的方法、装置、终端设备和介质
CN109545177B (zh) 2019-01-04 2023-08-22 平安科技(深圳)有限公司 一种旋律配乐方法及装置
US10825221B1 (en) * 2019-04-23 2020-11-03 Adobe Inc. Music driven human dancing video synthesis
CN110233976B (zh) * 2019-06-21 2022-09-09 广州酷狗计算机科技有限公司 视频合成的方法及装置
CN110265057B (zh) * 2019-07-10 2024-04-26 腾讯科技(深圳)有限公司 生成多媒体的方法及装置、电子设备、存储介质
CN110336960B (zh) * 2019-07-17 2021-12-10 广州酷狗计算机科技有限公司 视频合成的方法、装置、终端及存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007066399A (ja) * 2005-08-30 2007-03-15 Ricoh Co Ltd 映像音声編集システム
CN107393569A (zh) * 2017-08-16 2017-11-24 成都品果科技有限公司 音视频剪辑方法及装置
CN107483843A (zh) * 2017-08-16 2017-12-15 成都品果科技有限公司 音视频匹配剪辑方法及装置
CN110769309A (zh) * 2019-11-04 2020-02-07 北京字节跳动网络技术有限公司 用于展示音乐点的方法、装置、电子设备和介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4044611A4

Also Published As

Publication number Publication date
CN110769309B (zh) 2023-03-31
KR20220091500A (ko) 2022-06-30
JP2022554338A (ja) 2022-12-28
EP4044611A1 (en) 2022-08-17
CN110769309A (zh) 2020-02-07
US11587593B2 (en) 2023-02-21
US20220293136A1 (en) 2022-09-15
EP4044611A4 (en) 2022-11-23

Similar Documents

Publication Publication Date Title
WO2021088830A1 (zh) 用于展示音乐点的方法、装置、电子设备和介质
WO2021093737A1 (zh) 生成视频的方法、装置、电子设备和计算机可读介质
WO2021073315A1 (zh) 视频文件的生成方法、装置、终端及存储介质
KR102207208B1 (ko) 음악 정보 시각화 방법 및 장치
WO2021196903A1 (zh) 视频处理方法、装置、可读介质及电子设备
WO2021098670A1 (zh) 视频生成方法、装置、电子设备和计算机可读介质
CN111050203B (zh) 一种视频处理方法、装置、视频处理设备及存储介质
WO2021135626A1 (zh) 菜单项选择方法、装置、可读介质及电子设备
WO2020200173A1 (zh) 文档输入内容的处理方法、装置、电子设备和存储介质
JP7334362B2 (ja) 文書内テーブル閲覧方法、装置、電子機器及び記憶媒体
WO2020259133A1 (zh) 录制热门片段方法、装置、电子设备和可读介质
WO2020156055A1 (zh) 显示界面切换方法、电子设备及计算机可读存储介质
WO2023169356A1 (zh) 图像处理方法、装置、设备及存储介质
US20230307004A1 (en) Audio data processing method and apparatus, and device and storage medium
WO2024008184A1 (zh) 一种信息展示方法、装置、电子设备、计算机可读介质
US20240054157A1 (en) Song recommendation method and apparatus, electronic device, and storage medium
US20240103802A1 (en) Method, apparatus, device and medium for multimedia processing
US20240094883A1 (en) Message selection method, apparatus and device
WO2022194038A1 (zh) 音乐的延长方法、装置、电子设备和存储介质
JP7508552B2 (ja) 音楽点を表示するための方法及び装置、並びに電子デバイス及び媒体
CN114520928B (zh) 显示信息生成方法、信息显示方法、装置和电子设备
WO2022068496A1 (zh) 搜索目标内容的方法、装置、电子设备及存储介质
CN112153439A (zh) 互动视频处理方法、装置、设备及可读存储介质
CN108460128A (zh) 文档管理方法及装置、电子装置及可读存储介质
WO2021018176A1 (zh) 文字特效处理方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20884477

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2022525690

Country of ref document: JP

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 20227015287

Country of ref document: KR

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2020884477

Country of ref document: EP

Effective date: 20220502

NENP Non-entry into the national phase

Ref country code: DE