WO2022104917A1 - 节拍识别方法、装置及存储介质 - Google Patents

节拍识别方法、装置及存储介质 Download PDF

Info

Publication number
WO2022104917A1
WO2022104917A1 PCT/CN2020/133192 CN2020133192W WO2022104917A1 WO 2022104917 A1 WO2022104917 A1 WO 2022104917A1 CN 2020133192 W CN2020133192 W CN 2020133192W WO 2022104917 A1 WO2022104917 A1 WO 2022104917A1
Authority
WO
WIPO (PCT)
Prior art keywords
beat
audio
feature
information
estimated
Prior art date
Application number
PCT/CN2020/133192
Other languages
English (en)
French (fr)
Inventor
郑亚军
Original Assignee
瑞声声学科技(深圳)有限公司
瑞声光电科技(常州)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 瑞声声学科技(深圳)有限公司, 瑞声光电科技(常州)有限公司 filed Critical 瑞声声学科技(深圳)有限公司
Publication of WO2022104917A1 publication Critical patent/WO2022104917A1/zh

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/031Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
    • G10H2210/076Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for extraction of timing, tempo; Beat detection

Definitions

  • the present invention relates to the technical field of audio recognition, in particular to a beat recognition method, device and storage medium.
  • music expresses people's thoughts and emotions and social real life through elements such as beats, pitches, melody, lyrics and other elements on the basis of following certain music theories. Since ancient times, human beings have been inseparable from music. In addition to recording and disseminating music in traditional forms, in modern society, with the development of science and technology, it is more recorded, played and disseminated in the form of digital signals.
  • recording music in the form of digital music can not only fully record the information of a piece of music, but also facilitates direct playback in electronic devices.
  • the current method of recording music in the form of digital music cannot identify information such as the rhythm of the music. Therefore, it is impossible to further analyze the music according to the rhythm (for example, matching the vibration effect for the music rhythm and melody) to increase the performance of music playback. interesting, poor user experience.
  • the present invention provides a beat recognition method, the method includes:
  • Extract the feature information of the audio signal wherein, the feature information includes the time information and energy information of the first feature audio point set, and the beat duration;
  • the beat points are identified according to the actual beat time sequence.
  • the step of performing calculation processing on the feature information to obtain an actual beat moment sequence includes:
  • the time information and energy information of the first feature audio point set identify a plurality of second feature audio points in the first feature audio point set, and extract the time points of a plurality of the second feature audio points;
  • the estimated beat point includes the estimated time of a plurality of estimated beat points
  • the estimated beat time sequence with the largest probability value is selected as the actual beat time sequence.
  • the step of identifying a plurality of second feature audio points in the first feature audio point set includes:
  • the preset energy threshold is the first feature audio point set One-fifth of the largest energy value in the energy information of .
  • the step of performing a probability calculation according to the characteristic time sequence and a plurality of the estimated beat time sequence to obtain the probability value of each estimated beat time sequence becoming the actual beat time sequence include:
  • each described estimated beat time sequence and the described characteristic time sequence obtain the error sequence of each described estimated beat time sequence; wherein, each described error sequence includes the time error value of a plurality of estimated beat points;
  • the estimated beat point corresponding to the time error value less than the preset error threshold in each described error sequence is marked as an effective beat point;
  • Calculation is performed according to the number of valid beat points and the number of estimated beat points in each of the estimated beat time sequences, and a probability value that each of the estimated beat time sequences becomes the actual beat time sequence is obtained.
  • the preset error threshold is one tenth of the maximum duration value of the beat durations.
  • the step of extracting the time information of the first feature audio point set includes:
  • the first feature audio point set is identified, and the time information of the first feature audio point set is extracted.
  • the step of extracting the energy information of the first feature audio point set includes:
  • the energy information of the first feature audio point set is extracted.
  • the method further includes:
  • the audio file When the audio file includes a plurality of audio tracks respectively used for transmitting the audio signal, the audio file is preprocessed by track division, and the audio signal is played through at least one of the audio tracks.
  • the present invention provides a beat recognition device, which includes:
  • an audio processing module configured to extract feature information of the audio signal, wherein the feature information includes the time information and energy information of the first feature audio point set, and the beat duration;
  • the calculation processing module is configured to perform calculation processing on the feature information to obtain an actual beat moment sequence; and is also configured to identify a beat point according to the actual beat moment sequence.
  • the present invention provides a beat recognition device, which includes a memory and a processor, the memory stores a computer program, and the processor implements the following steps when executing the computer program:
  • Extract the feature information of the audio signal wherein, the feature information includes the time information and energy information of the first feature audio point set, and the beat duration;
  • the beat points are identified according to the actual beat time sequence.
  • the present invention provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the following steps are implemented:
  • Extract the feature information of the audio signal wherein, the feature information includes the time information and energy information of the first feature audio point set, and the beat duration;
  • the beat points are identified according to the actual beat time sequence.
  • the above beat recognition method, device and storage medium in the beat recognition method, by extracting the feature information of the audio signal (the feature information includes the time information and energy information of the first feature audio point set, and the beat duration), according to the feature information Calculated and obtained
  • the actual beat time sequence identify the beat point according to the actual beat time sequence, automatically identify the accurate audio beat point from the actual beat time sequence, and realize the automatic and high-accuracy identification of the audio beat.
  • the audio beat The automatic recognition provides a basis for people to further analyze and utilize the music according to the beat to increase the interest of music playback, and improve the user experience.
  • Fig. 1 is the application environment diagram of the beat recognition method of the present invention
  • Fig. 2 is the schematic flow chart of the beat identification method of the present invention.
  • Fig. 3 is the schematic flow chart of step S0 in Fig. 2;
  • Fig. 4 is the schematic flow chart of step S4 in Fig. 2;
  • FIG. 5 is a schematic flowchart of step S44 in FIG. 4;
  • FIG. 6 is a schematic structural diagram of the beat identification device of the present invention.
  • the present invention provides a beat identification method, which can be applied to the application environment shown in FIG. 1 .
  • the terminal 1 communicates with the server 2 through a network, or implements data transmission with other terminals or electronic devices through other wired or wireless means.
  • the terminal 1 can be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers and portable wearable devices, and the server 2 can be implemented by an independent server or a server cluster composed of multiple servers.
  • a beat recognition method is provided, and the method is applied to the terminal in FIG. 1 as an example for description, including the following steps:
  • Step S2 extract the feature information of the audio signal.
  • the feature information includes time information, energy information, and beat duration of the first feature audio point set
  • the above-mentioned first feature audio point set includes a plurality of first feature audio points
  • the above step S2 respectively includes
  • the accent point is first identified from the original audio signal and recorded as the first feature audio point, and The set of multiple accent points is recorded as the first feature audio point set, and then the time and energy value of each accent point are extracted respectively.
  • the energy values of the points together form the energy information Es of the first feature audio point set.
  • the beat duration L b of the audio signal is to be extracted.
  • Step S4 perform calculation processing on the feature information to obtain an actual beat time sequence.
  • calculation processing is performed according to the time information T s , energy information Es and beat duration L b of the first feature audio point set, and the actual beat time sequence T b is obtained according to the result of the calculation processing,
  • the actual beat time sequence T b includes a plurality of beat points (the beat points are local burst points of energy in the audio signal).
  • Step S6 identifying the beat point according to the actual beat time sequence.
  • the feature information of the audio signal is extracted, the actual beat time sequence is obtained by calculation according to the feature information, the beat point is identified according to the actual beat time sequence, and the accurate audio beat point is automatically identified from the actual beat time sequence.
  • Automatic and high-accuracy recognition of audio beats In practical applications, automatic recognition of audio beats provides a basis for people to further analyze and utilize music according to beats to increase the fun of music playback and improve user experience.
  • a step S0 may be added to perform audio signal preprocessing according to actual needs, and the step S0 specifically includes:
  • Step S01 acquiring an audio file.
  • the terminal 1 can download audio files (such as music files) from the server 2 through the network, or can be connected through wireless communication (such as WiFi hotspot connection, Bluetooth connection, etc.) or wired communication connection (such as a data transmission line connection) to receive audio files transmitted by other terminals or electronic devices, the manner in which it can be acquired is not limited, and it can be determined according to the actual situation.
  • wireless communication such as WiFi hotspot connection, Bluetooth connection, etc.
  • wired communication connection Such as a data transmission line connection
  • Step S02 determine whether the audio file includes multiple audio tracks. Specifically, the audio track is used to transmit the audio signal.
  • Step S03 when the audio file includes multiple audio tracks, the audio file is preprocessed by track division, the audio signal is output through at least one of the audio tracks, and the signal sampling rate of the audio signal by the terminal 1 is obtained. fs, that is, selecting at least one audio signal played by one of the audio tracks to perform beat identification.
  • the terminal 1 can simultaneously perform beat recognition on the audio signal output by the music file of a single audio track or the music file of multiple audio tracks, which improves the applicability of the beat recognition method of the present invention, and can Meet different application scenarios.
  • Step S2 extracting feature information of the audio signal, the step S2 specifically includes:
  • the step of extracting the time information of the first feature audio point set includes:
  • the first feature audio point set is identified, and the time information T s of the first feature audio point set is extracted.
  • the step of extracting the energy information of the first feature audio point set includes:
  • the energy information Es of the first feature audio point set is extracted.
  • Step S4 performing calculation processing on the feature information to obtain an actual beat time sequence, and the step S4 specifically includes:
  • Step S41 according to the time information T s and energy information Es of the first feature audio point set, identify multiple second feature audio points in the first feature audio point set, and extract multiple second feature audio points The moment of the audio point.
  • the step of identifying a plurality of second feature audio points in the first feature audio point set includes:
  • the first feature audio point (that is, the accent point) whose energy value in the first feature audio point concentration is higher than the preset energy threshold is recorded as the second feature audio point; wherein, the identification of the second feature audio point
  • the number can be determined according to the actual situation, and the value of the preset energy threshold is not limited, and it can be set according to the actual use situation.
  • the preset energy threshold is specifically set to be one-fifth of the maximum energy value in the energy information of the first feature audio point set. Let the energy threshold value be expressed as 0.2 ⁇ max(E s ), and the preset energy threshold value 0.2 ⁇ max(E s ) as the threshold value is helpful for eliminating the interference of prelude to beat recognition and improving the recognition accuracy.
  • the judgment starts from the first accent point of the energy information Es of the first feature audio point set.
  • the first energy point is identified
  • the first accent point whose value is higher than the preset energy threshold 0.2 ⁇ max(E s ) is re-marked as the first second audio feature point, and relabel the moment of the first second audio feature point as t 1 , then delete the accent point whose moment is before the moment t 1 of the second audio feature point, and then remove the accent point from the moment t 1 of the second audio feature point
  • the second energy value is higher than the preset energy threshold 0.2 ⁇ max(E s )
  • the second energy value is higher than the preset energy threshold 0.2 ⁇ max(E s )
  • Step S42 according to the times of a plurality of the second characteristic audio points, generate a characteristic time sequence Tr .
  • Step S43 according to the moment of a plurality of the second characteristic audio points and the beat duration L b , generate a plurality of estimated beat moment sequences;
  • the estimated beat point includes the estimation of a plurality of estimated beat points time.
  • an estimated beat time sequence T 1b is generated according to the time t 1 of the above-mentioned first second characteristic audio point and the beat duration L b , and according to the above-mentioned second second characteristic audio point
  • the time t 2 and the described beat duration L b generate the estimated beat time sequence T 2b , and so on, generate the estimated beat time sequence T 3b and the estimated beat time sequence T 4b respectively;
  • the initial values of the sequence T 1b , T 2b , T 3b , and T 4b are the times t 1 , t 2 , t 3 , and t 4 of the second feature audio point, respectively, and the sequence of each estimated beat time is a sequence with a tolerance of L b Arithmetic sequence.
  • Step S44 perform probability calculation according to the characteristic time sequence Tr and a plurality of the estimated beat time sequence, and obtain the probability value that each estimated beat time sequence becomes the actual beat time sequence Tb .
  • step S24 further includes:
  • Step S441 according to each described estimated beat time sequence and the described characteristic time sequence, obtain the error sequence of each described estimated beat time sequence; wherein, each described error sequence includes the time error value of a plurality of estimated beat points .
  • each value T 1b(k) of the estimated beat time sequence T 1b , each value T 2b (k) of the estimated beat time sequence T 2b, and the estimated beat time sequence T 2b(k) are respectively subtracted from the characteristic time sequence Tr , and after the subtraction is calculated, the sequence of the estimated beat time and the estimated beat time sequence are obtained respectively.
  • T 3b and T 4b corresponds to the time error value of each estimated beat point of the error sequence Err 1(k) , Err 2(k) , Err 3(k) and Err 4(k) , wherein, k is the index number of the estimated tick time sequence.
  • step S442 the estimated beat point corresponding to the time error value smaller than the preset error threshold in each of the error sequences is recorded as an effective beat point.
  • the value of the preset error threshold is not limited.
  • the preset error threshold is set to the maximum value in the beat duration.
  • the preset error threshold is expressed as 0.1 ⁇ max(L b ), and this setting reserves a certain fluctuation space, which makes the extraction of high-pitched points more reasonable.
  • the absolute value of the time error value of each estimated beat point of the error sequence Err 1(k) , Err 2(k) , Err 3(k) and Err 4(k) is determined Whether it is less than 0.1 ⁇ max(L b ), set the error sequence Err 1(k) , Err 2(k) , Err 3(k) , Err 4(k) to be smaller than the preset error threshold 0.1 ⁇ max(L b
  • the estimated beat point corresponding to the time error value of ) is recorded as the effective beat point, and the number of valid beats of the error sequence Err 1(k) , Err 2(k) , Err 3(k) and Err 4(k) are respectively are n 1 , n 2 , n 3 , and n 4 .
  • Step S443 Calculate according to the number of valid beat points and the number of estimated beat points in each of the estimated beat time sequences, and obtain the probability value that each of the estimated beat time sequences becomes the actual beat time sequence T b .
  • the probability values that the estimated beat time sequence T 1b , T 2b , T 3b , and T 4b become the actual beat time sequence T b are respectively p 1 , p 2 , p 3 , and p 4 ; further, according to the following calculation rules, the probability values of the four estimated beat time sequences are calculated as the actual beat time sequence: the number of valid beats is divided by the total number of estimated beats to obtain the probability of the possible beat time sequence as the real beat point.
  • N 1 is the number of units of the first possible beat sequence
  • N 2 is the number of units of the second possible beat sequence
  • N 3 is the number of units of the third possible beat sequence
  • N 4 is the number of units of the fourth possible beat sequence.
  • Step S45 selecting the estimated beat time sequence with the largest probability value as the actual beat time sequence T b .
  • the estimated beat time sequence corresponding to the maximum value among the probability values p 1 , p 2 , p 3 , and p 4 is taken, and recorded as the actual beat time sequence T b .
  • the value of p 1 is the largest, then select the first estimated beat time sequence T 1b and record it as the actual beat time sequence T b ,
  • Step S6 identifying the beat point according to the actual beat time sequence.
  • the first estimated beat time sequence T 1b is selected and recorded as the actual beat time sequence T b .
  • the time of the estimated beat point in the estimated beat time sequence T 1b is extracted as The moment of the beat point is determined as the specific position of the beat point.
  • steps in the flowcharts of FIGS. 2-5 are shown in sequence according to the arrows, these steps are not necessarily executed in the sequence shown by the arrows. Unless explicitly stated herein, the execution of these steps is not strictly limited to the order, and these steps may be performed in other orders. Moreover, at least a part of the steps in FIGS. 2-5 may include multiple steps or multiple stages. These steps or stages are not necessarily executed and completed at the same time, but may be executed at different times. The execution of these steps or stages The order is also not necessarily sequential, but may be performed alternately or alternately with other steps or at least a portion of the steps or phases within the other steps.
  • the present invention provides a beat recognition device 100, which is applied to a terminal and includes: an audio processing module 11 and a calculation processing module 12 connected to the audio processing module, wherein:
  • the audio processing module 11 is used to extract feature information of the audio signal, wherein the feature information includes the time information and energy information of the first feature audio point set, and the beat duration;
  • the calculation processing module 12 is configured to perform calculation processing on the feature information to obtain an actual beat time sequence; and is also configured to identify a beat point according to the actual beat time sequence.
  • the computing and processing module 12 is further configured to identify a plurality of second feature audio points in the first feature audio point set according to the time information and energy information of the first feature audio point set, And extract the moments of a plurality of the second characteristic audio points; be used to generate a sequence of characteristic moments according to the moments of the plurality of the second characteristic audio points; be used for according to the moments of the plurality of the second characteristic audio points and Describe the beat duration, generate a plurality of estimated beat time sequence; For carrying out probability calculation according to the described characteristic time sequence and a plurality of described estimated beat time sequence, obtain each described estimated beat time sequence to become the described actual beat moment The probability value of the sequence; it is used to select the estimated takt time sequence with the largest probability value as the actual takt time sequence.
  • the calculation processing module 12 is further configured to obtain the error sequence of each estimated beat time sequence according to each of the estimated beat time sequence and the characteristic time sequence; wherein, each of the The error sequence includes the time error values of a plurality of estimated beat points; it is used to record the estimated beat point corresponding to the time error value smaller than the preset error threshold in each of the error sequences as an effective beat point; Calculate the number of valid beat points and the number of estimated beat points in the estimated beat time sequence, and obtain the probability value that each estimated beat time sequence becomes the actual beat time sequence.
  • the calculation processing module 12 is further configured to acquire energy information of the music signal; to perform calculation processing on the energy information to obtain an energy change curve; to identify the energy change curve according to the energy change curve the first feature audio point set, and extract the time information of the first feature audio point set.
  • the calculation processing module 12 is further configured to extract the energy information of the first feature audio point set according to the energy change curve and the time information of the first feature audio point set.
  • each module in the above-mentioned beat recognition device can be implemented by software, hardware and combinations thereof.
  • the above modules can be embedded in or independent of the processor in the computer device in the form of hardware, or stored in the memory in the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.
  • the present invention provides a beat identification device, which includes a memory and a processor, wherein the memory stores a computer program, and when the processor executes the computer program, any step of the above beat identification method is implemented.
  • the present invention provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, implements any step of the above beat identification method.
  • any reference to memory, storage, database or other media used in the various embodiments provided by the present invention may include at least one of non-volatile and volatile memory.
  • Non-volatile memory may include read-only memory (Read-Only Memory, ROM), magnetic tape, floppy disk, flash memory or optical memory, etc.
  • Volatile memory may include random access memory (Random Access Memory, RAM) or external cache memory.
  • RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (Dynamic Random Access Memory). Access Memory, DRAM), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Auxiliary Devices For Music (AREA)

Abstract

一种节拍识别方法、装置、计算机设备和存储介质。所述方法包括:提取音频信号的特征信息(S2),其中,所述特征信息包括第一特征音频点集的时刻信息和能量信息、以及节拍时长;对所述特征信息进行计算处理,以获得实际节拍时刻数列(S4);根据所述实际节拍时刻数列识别节拍点(S6)。采用本方法能够全自动、高准确度的识别节拍从而改善用户体验。

Description

节拍识别方法、装置及存储介质 技术领域
本发明涉及音频识别技术领域,特别是涉及一种节拍识别方法、装置及存储介质。
背景技术
音乐作为一种艺术形式,在遵循一定的乐理基础上,通过节拍、音调、旋律、歌词等元素表达人们的思想情感与社会现实生活。自古以来,人类都是离不开音乐的。音乐除了通过传统形式的乐谱记录和传播,在现代社会中,随着科技的发展,更多的是以数字信号的形式记录、播放和传播。
相较于传统的乐谱形式记录音乐,数字音乐形式记录音乐不仅可以充分记录一首音乐的信息、也便于直接在电子设备中播放。
然而,目前的以数字音乐形式记录音乐的方法,并不能识别音乐的节拍等信息,因此,无法根据节拍对音乐进行进一步分析(例如,为音乐节拍、旋律匹配振感效果)以增添音乐播放的趣味性,用户体验差。
技术解决方案 有益效果
基于此,有必要针对上述技术问题,提供一种能够全自动、高准确度的识别节拍从而改善用户体验的节拍识别方法、装置、计算机设备和存储介质。
本发明提供一种节拍识别方法,所述方法包括:
提取音频信号的特征信息;其中,所述特征信息包括第一特征音频点集的时刻信息和能量信息、以及节拍时长;
对所述特征信息进行计算处理,以获得实际节拍时刻数列;
根据所述实际节拍时刻数列识别节拍点。
在其中一个实施例中,所述对所述特征信息进行计算处理,以获得实际节拍时刻数列的步骤包括:
根据所述第一特征音频点集的时刻信息和能量信息,识别所述第一特征音频点集中的多个第二特征音频点,并提取多个所述第二特征音频点的时刻;
根据多个所述第二特征音频点的时刻,生成特征时刻数列;
根据多个所述第二特征音频点的时刻和所述节拍时长,生成多个预估节拍时刻数列; 其中,所述预估节拍点包括多个预估节拍点的预估时刻;
根据所述特征时刻数列和多个所述预估节拍时刻数列进行概率运算,获得各所述预估节拍时刻数列成为所述实际节拍时刻数列的概率值;
选取概率值最大的所述预估节拍时刻数列作为所述实际节拍时刻数列。
在其中一个实施例中,所述识别所述第一特征音频点集中的多个第二特征音频点的步骤包括:
将所述第一特征音频点集中能量值高于预设能量阈值的第一特征音频点记为所述第二特征音频点;其中,所述预设能量阈值为所述第一特征音频点集的能量信息中最大的能量值的五分之一。
在其中一个实施例中,所述根据所述特征时刻数列和多个所述预估节拍时刻数列进行概率运算,获得各所述预估节拍时刻数列成为所述实际节拍时刻数列的概率值的步骤包括:
根据各所述预估节拍时刻数列和所述特征时刻数列,获取各所述预估节拍时刻数列的误差数列;其中,各所述误差数列包括多个预估节拍点的时刻误差值;
将各所述误差数列中小于预设误差阈值的时刻误差值所对应的预估节拍点记为有效节拍点;
根据各所述预估节拍时刻数列中有效节拍点的数量与预估节拍点的数量进行计算,获取各所述预估节拍时刻数列成为所述实际节拍时刻数列的概率值。
在其中一个实施例中,所述预设误差阈值为所述节拍时长中的最大时长值的十分之一。
在其中一个实施例中,提取所述第一特征音频点集的时刻信息的步骤包括:
获取音乐信号的能量信息;
对所述能量信息进行计算处理,以获得能量变化曲线;
根据所述能量变化曲线,识别所述第一特征音频点集,并提取所述第一特征音频点集的时刻信息。
在其中一个实施例中,提取所述第一特征音频点集的能量信息的步骤包括:
根据所述能量变化曲线和所述第一特征音频点集的时刻信息,提取所述第一特征音频点集的能量信息。
在其中一个实施例中,所述方法还包括:
当音频文件包括多个分别用于传输所述音频信号的音频轨道时,对所述音频文件进行分轨预处理,通过至少一个所述音频轨道播放所述音频信号。
本发明提供一种节拍识别装置,其包括:
音频处理模块,用于提取音频信号的特征信息,其中,所述特征信息包括第一特征音频点集的时刻信息和能量信息、以及节拍时长;以及,
计算处理模块,用于对所述特征信息进行计算处理,以获得实际节拍时刻数列;还用于根据所述实际节拍时刻数列识别节拍点。
本发明提供一种节拍识别装置,其包括存储器和处理器,所述存储器存储有计算机程序,所述处理器执行所述计算机程序时实现以下步骤:
提取音频信号的特征信息;其中,所述特征信息包括第一特征音频点集的时刻信息和能量信息、以及节拍时长;
对所述特征信息进行计算处理,以获得实际节拍时刻数列;
根据所述实际节拍时刻数列识别节拍点。
本发明提供一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现以下步骤:
提取音频信号的特征信息;其中,所述特征信息包括第一特征音频点集的时刻信息和能量信息、以及节拍时长;
对所述特征信息进行计算处理,以获得实际节拍时刻数列;
根据所述实际节拍时刻数列识别节拍点。
上述节拍识别方法、装置以及存储介质,在节拍识别方法中,通过提取音频信号的特征信息(特征信息包括第一特征音频点集的时刻信息和能量信息、以及节拍时长),根据特征信息计算获得实际节拍时刻数列,根据实际节拍时刻数列识别节拍点,从实际节拍时刻数列中自动识别准确的音频节拍点,实现了对音频节拍的全自动、高准确度的识别,在实际应用中,音频节拍的自动识别为人们根据节拍对音乐进行进一步分析及利用时以增添音乐播放的趣味性提供了基础,改善了用户体验。
附图说明
图1为本发明的节拍识别方法的应用环境图;
图2为本发明的节拍识别方法的流程示意图;
图3为图2中步骤S0的流程示意图;
图4为图2中步骤S4的流程示意图;
图5为图4中步骤S44的流程示意图;
图6为本发明的节拍识别装置的结构示意图。
本发明的实施方式
为了使本发明的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本发明进行进一步详细说明。应当理解,此处描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。
本发明提供一种节拍识别方法,可以应用于如图1所示的应用环境中。其中,终端1通过网络与服务器2进行通信,或者通过其他有线或无线的方式与其他终端或电子设备进行实现数据传输。其中,终端1可以但不限于是各种个人计算机、笔记本电脑、智能手机、平板电脑和便携式可穿戴设备,服务器2可以用独立的服务器或者是多个服务器组成的服务器集群来实现。
在一个实施例中,如图2所示,提供了一种节拍识别方法,以该方法应用于图1中的终端为例进行说明,包括以下步骤:
步骤S2,提取音频信号的特征信息。
具体的,所述特征信息包括第一特征音频点集的时刻信息和能量信息、以及节拍时长,上述的第一特征音频点集包括了多个第一特征音频点;上述的步骤S2中分别包括了对第一特征音频点集的时刻信息、能量信息和节拍时长的提取的子步骤,在本实施方式中,首先从原始的音频信号中识别出重音点并记为第一特征音频点,而多个重音点的集合记为第一特征音频点集,然后分别提取各个重音点的时刻和能量值,多个重音点的时刻共同组成第一特征音频点集的时刻信息T s,多个重音点的能量值共同组成第一特征音频点集的能量信息E s,另外,要提取音频信号的节拍时长L b
步骤S4,对所述特征信息进行计算处理,以获得实际节拍时刻数列。
具体的,在本实施方式中,根据所述第一特征音频点集的时刻信息T s、能量信息E s和节拍时长L b进行计算处理,根据计算处理的结果获取实际节拍时刻数列T b,实质上,所述实际节拍时刻数列T b包括了多个节拍点(节拍点为音频信号中的局部的能量爆发点)。
步骤S6,根据所述实际节拍时刻数列识别节拍点。
上述节拍识别方法中,通过提取音频信号的特征信息,根据特征信息计算获得实际节拍时刻数列,根据实际节拍时刻数列识别节拍点,从实际节拍时刻数列中自动识别准确的音频节拍点,实现了对音频节拍的全自动、高准确度的识别,在实际应用中,音频节拍的自动识别为人们根据节拍对音乐进行进一步分析及利用时以增添音乐播放的趣味性提供了基础,改善了用户体验。
为了更进一步理解上述方法,请同时参阅图1-5所示,下面对上述方法的各步骤进行详细描述:
在一个实施例中,可以根据实际使用的需要增加步骤S0,进行音频信号预处理,该步骤S0具体包括:
步骤S01,获取音频文件。具体的,在该步骤S11中,所述终端1可以通过网络从服务器2中下载音频文件(如音乐文件),也可以通过无线通信连接(如WiFi热点连接、蓝牙连接等)或有线通信连接(如数据传输线连接)接收其他的终端或者电子设备传输的音频文件,其可以获取的方式是不限的,其可以根据实际情况来确定。
步骤S02,判断音频文件是否包括多个音频轨道。具体的,所述音频轨道用于传输所述音频信号。
步骤S03,当音频文件包括多个音频轨道时,对所述音频文件进行分轨预处理,通过至少一个所述音频轨道输出音频信号,并获取所述终端1对所述音频信号的信号采样率fs,即至少选择其中一个所述音频轨道播放的音频信号进行节拍识别。
通过上述步骤S0的设置,为所述终端1能够同时满足对单音频轨道的音乐文件或多音频轨道的音乐文件所输出的音频信号进行节拍识别,提高了本发明节拍识别方法的适用性,能够满足不同应用于场景。
步骤S2,提取音频信号的特征信息,该步骤S2具体包括:
在其中一个实施例中,提取所述第一特征音频点集的时刻信息的步骤包括:
获取音乐信号的能量信息;
对所述能量信息进行计算处理,以获得能量变化曲线;
根据所述能量变化曲线,识别所述第一特征音频点集,并提取所述第一特征音频点集的时刻信息T s
在其中一个实施例中,提取所述第一特征音频点集的能量信息的步骤包括:
根据所述能量变化曲线和所述第一特征音频点集的时刻信息T s,提取所述第一特征音频点集的能量信息E s
步骤S4,对所述特征信息进行计算处理,以获得实际节拍时刻数列,该步骤S4具体包括:
步骤S41,根据所述第一特征音频点集的时刻信息T s和能量信息E s,识别所述第一特征音频点集中的多个第二特征音频点,并提取多个所述第二特征音频点的时刻。
所述识别所述第一特征音频点集中的多个第二特征音频点的步骤中包括:
将所述第一特征音频点集中能量值高于预设能量阈值的第一特征音频点(亦即重音点)记为所述第二特征音频点;其中,所述第二特征音频点的识别数量可以根据实际情况进行确定,而所述预设能量阈值的数值是不限的,其可以根据实际使用的情况进行设置,比如,在本实施方式中,考虑到部分音频信号的前奏节拍不明显,前奏重音标识点对整体节拍分析会造成较大干扰,因此将所述预设能量阈值具体设置为所述第一特征音频点集的能量信息中最大的能量值的五分之一,该预设能量阈值表示为0.2·max(E s),该预设能量阈值0.2·max(E s)作为门限值有利于排除前奏对节拍识别的干扰,有利于提高识别的准确度。
具体的,在本实施方式中,需要识别四个第二音频特征点,首先从所述第一特征音频点集的能量信息E s的第一个重音点开始判断,当识别到第一个能量值高于预设能量阈值0.2·max(E s)的重音点时,将第一个能量值高于预设能量阈值0.2·max(E s)的重音点重新标记为第一个第二音频特征点,并将第一个第二音频特征点的时刻重新标记为t 1,然后删除时刻在第二音频特征点的时刻t 1之前的重音点,然后从第二音频特征点的时刻t 1继续往后开始判断,当识别到第二个能量值高于预设能量阈值0.2·max(E s)的重音点时,将第二个能量值高于预设能量阈值0.2·max(E s)的重音点重新标记为第二个第二音频特征点,并将第二个第二音频特征点的时刻重新标记为t 2,然后删除时刻在第二音频特征点的时刻t 1和第二音频特征点的时刻t 2之间的重音点,根据第二个第二音频特征点的时刻t 2的获取,以此类推,识别到第三个第二音频特征点的时刻t 3、第四个第二音频特征点的时刻t 4;需要说明的是,根据基本的乐理,常见一节拍内出现四个音符,而四个音符都可能具有较大能量,因此对应的,此处标记了四个第二音频特征点(即重新标记的重音点),对该四个第二音频特征点进行概率判断,以提高对节拍识别的准确性。
步骤S42,根据多个所述第二特征音频点的时刻,生成特征时刻数列T r
步骤S43,根据多个所述第二特征音频点的时刻和所述节拍时长L b,生成多个预估节拍时刻数列; 其中,所述预估节拍点包括多个预估节拍点的预估时刻。
具体的,在本实施方式中,根据上述第一个第二特征音频点的时刻t 1和所述节拍时长L b,生成预估节拍时刻数列T 1b,根据上述第二个第二特征音频点的时刻t 2和所述节拍时长L b,生成预估节拍时刻数列T 2b,以此类推,生成分别与预估节拍时刻数列T 3b和预估节拍时刻数列T 4b;上述的预估节拍时刻数列T 1b、T 2b、T 3b、T 4b的初值分别为第二特征音频点的时刻t 1、t 2、t 3、t 4,且各个预估节拍时刻数列均为公差为L b的等差数列。
步骤S44,根据所述特征时刻数列T r和多个所述预估节拍时刻数列进行概率运算,获得各所述预估节拍时刻数列成为所述实际节拍时刻数列T b的概率值。
更具体的,所述步骤S24还包括:
步骤S441,根据各所述预估节拍时刻数列和所述特征时刻数列,获取各所述预估节拍时刻数列的误差数列;其中,各所述误差数列包括多个预估节拍点的时刻误差值。
具体的,在所述步骤S241中,将预估节拍时刻数列T 1b的每一个值T 1b(k)、预估节拍时刻数列T 2b的每一个值T 2b(k)、预估节拍时刻数列T 3b的每一个值T 3b(k)、预估节拍时刻数列T 4b的每一个值T 4b(k)分别减去特征时刻数列T r,通过减法计算后,获得分别与预估节拍时刻数列T 1b、T 2b、T 3b、T 4b对应的误差数列Err 1(k)、Err 2(k)、Err 3(k)、Err 4(k),预估节拍时刻数列T 1b、T 2b、T 3b、T 4b的每一个值与误差数列Err 1(k)、Err 2(k)、Err 3(k)、Err 4(k)的每一个预估节拍点的时刻误差值对应,其中,k为预估节拍时刻数列索引号。
步骤S442,将各所述误差数列中小于预设误差阈值的时刻误差值所对应的预估节拍点记为有效节拍点。
其中,所述预设误差阈值的数值是不限的,在本实施方式中,由于重音点的提取不能保证百分百准确,因此将所述预设误差阈值设置为所述节拍时长中的最大时长值的十分之一,该预设误差阈值表示为0.1·max(L b),该设置预留了一定的波动空间,使得高音点的提取更加合理。
具体的,在所述步骤S242中,判断误差数列Err 1(k)、Err 2(k)、Err 3(k)、Err 4(k)的每一个预估节拍点的时刻误差值的绝对值是否小于0.1·max(L b),将所述误差数列Err 1(k)、Err 2(k)、Err 3(k)、Err 4(k)中小于预设误差阈值0.1·max(L b)的时刻误差值所对应的预估节拍点记为有效节拍点,误差数列Err 1(k)、Err 2(k)、Err 3(k)、Err 4(k)的有效节拍数量分别别是为n 1、n 2、n 3、n 4
步骤S443,根据各所述预估节拍时刻数列中有效节拍点的数量与预估节拍点的数量进行计算,获取各所述预估节拍时刻数列成为所述实际节拍时刻数列T b的概率值。
具体的,所述预估节拍时刻数列T 1b、T 2b、T 3b、T 4b成为所述实际节拍时刻数列T b的概率值分别为p 1、p 2、p 3、p 4;进一步的,根据下述计算规则计算四个预估节拍时刻数列为实际节拍时刻数列的概率值:有效节拍的个数除以预估节拍的总个数,得到可能节拍时刻数列为真实节拍点的概率。
即:
p 1 = n 1/N 1,N 1为第1个可能节拍数列的单元个数;
p 2 = n 2/N 2,N 2为第2个可能节拍数列的单元个数;
p 3 = n 3/N 3,N 3为第3个可能节拍数列的单元个数;
p 4 = n 4/N 4,N 4为第4个可能节拍数列的单元个数。
步骤S45,选取概率值最大的所述预估节拍时刻数列作为所述实际节拍时刻数列T b
具体的,取概率值p 1、p 2、p 3、p 4中最大值对应的预估节拍时刻数列,记为实际节拍时刻数列T b,譬如,本实施方式中,四个概率值当中,p 1的数值最大,则选择第一个预估节拍时刻数列T 1b记为实际节拍时刻数列T b
步骤S6,根据所述实际节拍时刻数列识别节拍点。
具体的,在本实施方式中,选取了第一个预估节拍时刻数列T 1b记为实际节拍时刻数列T b,在此,提取预估节拍时刻数列T 1b中的预估节拍点的时刻作为节拍点的时刻,并确定为节拍点的具体位置。
应该理解的是,虽然图2-5的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,图2-5中的至少一部分步骤可以包括多个步骤或者多个阶段,这些步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤中的步骤或者阶段的至少一部分轮流或者交替地执行。
请参图6所示,本发明提供一种节拍识别装置100,其应用于终端,其包括:音频处理模块11以及所述音频处理模块连接的计算处理模块12,其中:
所述音频处理模块11,用于提取音频信号的特征信息,其中,所述特征信息包括第一特征音频点集的时刻信息和能量信息、以及节拍时长;
所述计算处理模块12,用于对所述特征信息进行计算处理,以获得实际节拍时刻数列;还用于根据所述实际节拍时刻数列识别节拍点。
在一个实施方式中,所述计算处理模块12,还用于根据所述第一特征音频点集的时刻信息和能量信息,识别所述第一特征音频点集中的多个第二特征音频点,并提取多个所述第二特征音频点的时刻;用于根据多个所述第二特征音频点的时刻,生成特征时刻数列;用于根据多个所述第二特征音频点的时刻和所述节拍时长,生成多个预估节拍时刻数列;用于根据所述特征时刻数列和多个所述预估节拍时刻数列进行概率运算,获得各所述预估节拍时刻数列成为所述实际节拍时刻数列的概率值;用于选取概率值最大的所述预估节拍时刻数列作为所述实际节拍时刻数列。
在一个实施方式中,所述计算处理模块12,还用于根据各所述预估节拍时刻数列和所述特征时刻数列,获取各所述预估节拍时刻数列的误差数列;其中,各所述误差数列包括多个预估节拍点的时刻误差值;用于将各所述误差数列中小于预设误差阈值的时刻误差值所对应的预估节拍点记为有效节拍点;用于根据各所述预估节拍时刻数列中有效节拍点的数量与预估节拍点的数量进行计算,获取各所述预估节拍时刻数列成为所述实际节拍时刻数列的概率值。
在一个实施方式中,所述计算处理模块12,还用于获取音乐信号的能量信息;用于对所述能量信息进行计算处理,以获得能量变化曲线;用于根据所述能量变化曲线,识别所述第一特征音频点集,并提取所述第一特征音频点集的时刻信息。
在一个实施方式中,所述计算处理模块12,还用于根据所述能量变化曲线和所述第一特征音频点集的时刻信息,提取所述第一特征音频点集的能量信息。
关于节拍识别装置的具体限定可以参见上文中对于节拍识别方法的限定,在此不再赘述。上述节拍识别装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。
在一个实施例中,本发明提供一种节拍识别装置,其包括存储器和处理器,所述存储器存储有计算机程序,所述处理器执行所述计算机程序时实现上述节拍识别方法的任一步骤。
在一个实施例中,本发明提供一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现上述节拍识别方法的任一步骤。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一非易失性计算机可读取存储介质中,该计算机程序在执行时,可包括如上述各方法的实施例的流程。其中,本发明所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和易失性存储器中的至少一种。非易失性存储器可包括只读存储器(Read-Only Memory,ROM)、磁带、软盘、闪存或光存储器等。易失性存储器可包括随机存取存储器(Random Access Memory,RAM)或外部高速缓冲存储器。作为说明而非局限,RAM可以是多种形式,比如静态随机存取存储器(Static Random Access Memory,SRAM)或动态随机存取存储器(Dynamic Random Access Memory,DRAM)等。
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上所述实施例仅表达了本发明的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本发明构思的前提下,还可以做出若干变形和改进,这些都属于本发明的保护范围。因此,本发明专利的保护范围应以所附权利要求为准。

Claims (11)

  1. 一种节拍识别方法,其特征在于,所述方法包括:
    提取音频信号的特征信息;其中,所述特征信息包括第一特征音频点集的时刻信息和能量信息、以及节拍时长;
    对所述特征信息进行计算处理,以获得实际节拍时刻数列;
    根据所述实际节拍时刻数列识别节拍点。
  2. 根据权利要求1所述的节拍识别方法,其特征在于,所述对所述特征信息进行计算处理,以获得实际节拍时刻数列的步骤包括:
    根据所述第一特征音频点集的时刻信息和能量信息,识别所述第一特征音频点集中的多个第二特征音频点,并提取多个所述第二特征音频点的时刻;
    根据多个所述第二特征音频点的时刻,生成特征时刻数列;
    根据多个所述第二特征音频点的时刻和所述节拍时长,生成多个预估节拍时刻数列; 其中,所述预估节拍点包括多个预估节拍点的预估时刻;
    根据所述特征时刻数列和多个所述预估节拍时刻数列进行概率运算,获得各所述预估节拍时刻数列成为所述实际节拍时刻数列的概率值;
    选取概率值最大的所述预估节拍时刻数列作为所述实际节拍时刻数列。
  3. 根据权利要求2所述的节拍识别方法,其特征在于,所述识别所述第一特征音频点集中的多个第二特征音频点的步骤包括:
    将所述第一特征音频点集中能量值高于预设能量阈值的第一特征音频点记为所述第二特征音频点;其中,所述预设能量阈值为所述第一特征音频点集的能量信息中最大的能量值的五分之一。
  4. 根据权利要求2所述的节拍识别方法,其特征在于,所述根据所述特征时刻数列和多个所述预估节拍时刻数列进行概率运算,获得各所述预估节拍时刻数列成为所述实际节拍时刻数列的概率值的步骤包括:
    根据各所述预估节拍时刻数列和所述特征时刻数列,获取各所述预估节拍时刻数列的误差数列;其中,各所述误差数列包括多个预估节拍点的时刻误差值;
    将各所述误差数列中小于预设误差阈值的时刻误差值所对应的预估节拍点记为有效节拍点;
    根据各所述预估节拍时刻数列中有效节拍点的数量与预估节拍点的数量进行计算,获取各所述预估节拍时刻数列成为所述实际节拍时刻数列的概率值。
  5. 根据权利要求4所述的节拍识别方法,其特征在于,所述预设误差阈值为所述节拍时长中的最大时长值的十分之一。
  6. 根据权利要求1所述的节拍识别方法,其特征在于,提取所述第一特征音频点集的时刻信息的步骤包括:
    获取音乐信号的能量信息;
    对所述能量信息进行计算处理,以获得能量变化曲线;
    根据所述能量变化曲线,识别所述第一特征音频点集,并提取所述第一特征音频点集的时刻信息。
  7. 根据权利要求6所述的节拍识别方法,其特征在于,提取所述第一特征音频点集的能量信息的步骤包括:
    根据所述能量变化曲线和所述第一特征音频点集的时刻信息,提取所述第一特征音频点集的能量信息。
  8. 根据权利要求1所述的节拍识别方法,其特征在于,所述方法还包括:
    当音频文件包括多个分别用于传输所述音频信号的音频轨道时,对所述音频文件进行分轨预处理,通过至少一个所述音频轨道播放所述音频信号。
  9. 一种节拍识别装置,其特征在于,所述节拍识别装置包括:
    音频处理模块,用于提取音频信号的特征信息,其中,所述特征信息包括第一特征音频点集的时刻信息和能量信息、以及节拍时长;以及,
    计算处理模块,用于对所述特征信息进行计算处理,以获得实际节拍时刻数列;还用于根据所述实际节拍时刻数列识别节拍点。
  10. 一种节拍识别装置,其包括存储器和处理器,所述存储器存储有计算机程序,其特征在于,所述处理器执行所述计算机程序时实现权利要求1至8中任一项所述的节拍识别方法的步骤。
  11. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现权利要求1至8中任一项所述的节拍识别方法的步骤。
     
PCT/CN2020/133192 2020-11-23 2020-12-02 节拍识别方法、装置及存储介质 WO2022104917A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011320049.7A CN112489681A (zh) 2020-11-23 2020-11-23 节拍识别方法、装置及存储介质
CN202011320049.7 2020-11-23

Publications (1)

Publication Number Publication Date
WO2022104917A1 true WO2022104917A1 (zh) 2022-05-27

Family

ID=74933393

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/133192 WO2022104917A1 (zh) 2020-11-23 2020-12-02 节拍识别方法、装置及存储介质

Country Status (2)

Country Link
CN (1) CN112489681A (zh)
WO (1) WO2022104917A1 (zh)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103824555A (zh) * 2012-11-19 2014-05-28 腾讯科技(深圳)有限公司 音频段提取方法及提取装置
CN104599663A (zh) * 2014-12-31 2015-05-06 华为技术有限公司 歌曲伴奏音频数据处理方法和装置
CN104766045A (zh) * 2014-01-07 2015-07-08 富士通株式会社 评价方法和评价装置
CN109712600A (zh) * 2018-12-30 2019-05-03 北京经纬恒润科技有限公司 一种节拍识别的方法及装置
CN110853677A (zh) * 2019-11-20 2020-02-28 北京雷石天地电子技术有限公司 歌曲的鼓声节拍识别方法、装置、终端和非临时性计算机可读存储介质
US20200357369A1 (en) * 2018-01-09 2020-11-12 Guangzhou Baiguoyuan Information Technology Co., Ltd. Music classification method and beat point detection method, storage device and computer device

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4465626B2 (ja) * 2005-11-08 2010-05-19 ソニー株式会社 情報処理装置および方法、並びにプログラム
JP5625235B2 (ja) * 2008-11-21 2014-11-19 ソニー株式会社 情報処理装置、音声解析方法、及びプログラム
CN103578478B (zh) * 2013-11-11 2016-08-17 科大讯飞股份有限公司 实时获取音乐节拍信息的方法及系统
KR101808810B1 (ko) * 2013-11-27 2017-12-14 한국전자통신연구원 음성/무음성 구간 검출 방법 및 장치
CN108335688B (zh) * 2017-12-28 2021-07-06 广州市百果园信息技术有限公司 音乐中主节拍点检测方法及计算机存储介质、终端
CN109920449B (zh) * 2019-03-18 2022-03-04 广州市百果园网络科技有限公司 节拍分析方法、音频处理方法及装置、设备、介质
CN110278388B (zh) * 2019-06-19 2022-02-22 北京字节跳动网络技术有限公司 展示视频的生成方法、装置、设备及存储介质
CN110890083B (zh) * 2019-10-31 2022-09-02 北京达佳互联信息技术有限公司 音频数据的处理方法、装置、电子设备及存储介质
CN111128232B (zh) * 2019-12-26 2022-11-15 广州酷狗计算机科技有限公司 音乐的小节信息确定方法、装置、存储介质及设备

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103824555A (zh) * 2012-11-19 2014-05-28 腾讯科技(深圳)有限公司 音频段提取方法及提取装置
CN104766045A (zh) * 2014-01-07 2015-07-08 富士通株式会社 评价方法和评价装置
CN104599663A (zh) * 2014-12-31 2015-05-06 华为技术有限公司 歌曲伴奏音频数据处理方法和装置
US20200357369A1 (en) * 2018-01-09 2020-11-12 Guangzhou Baiguoyuan Information Technology Co., Ltd. Music classification method and beat point detection method, storage device and computer device
CN109712600A (zh) * 2018-12-30 2019-05-03 北京经纬恒润科技有限公司 一种节拍识别的方法及装置
CN110853677A (zh) * 2019-11-20 2020-02-28 北京雷石天地电子技术有限公司 歌曲的鼓声节拍识别方法、装置、终端和非临时性计算机可读存储介质

Also Published As

Publication number Publication date
CN112489681A (zh) 2021-03-12

Similar Documents

Publication Publication Date Title
US20200357427A1 (en) Voice Activity Detection Using A Soft Decision Mechanism
WO2020078098A1 (zh) 一种基于梯度提升决策树的模型训练方法及装置
CN112396182B (zh) 脸部驱动模型的训练和脸部口型动画的生成方法
WO2019136909A1 (zh) 基于深度学习的语音活体检测方法、服务器及存储介质
CN106157979B (zh) 一种获取人声音高数据的方法和装置
CN110610698B (zh) 一种语音标注方法及装置
US20210125628A1 (en) Method and device for audio recognition
WO2019196301A1 (zh) 电子装置、基于深度学习的乐谱识别方法、系统及存储介质
CN108711415B (zh) 纠正伴奏和干音之间的时延的方法、装置及存储介质
JP2018081169A (ja) 話者属性推定システム、学習装置、推定装置、話者属性推定方法、およびプログラム
CN111400542A (zh) 音频指纹的生成方法、装置、设备及存储介质
CN109410972B (zh) 生成音效参数的方法、装置及存储介质
CN113055751B (zh) 数据处理方法、装置、电子设备和存储介质
CN112351047B (zh) 基于双引擎的声纹身份认证方法、装置、设备及存储介质
WO2021190660A1 (zh) 音乐和弦识别方法及装置、电子设备、存储介质
CN111986698A (zh) 音频片段的匹配方法、装置、计算机可读介质及电子设备
CN112151038B (zh) 语音重放攻击检测方法、装置、可读存储介质及电子设备
CN110070891B (zh) 一种歌曲识别方法、装置以及存储介质
WO2022104917A1 (zh) 节拍识别方法、装置及存储介质
CN112489678B (zh) 一种基于信道特征的场景识别方法及装置
JPWO2019187107A1 (ja) 情報処理装置、制御方法、及びプログラム
US20220277761A1 (en) Impression estimation apparatus, learning apparatus, methods and programs for the same
CN116543796B (zh) 一种音频处理方法、装置及计算机设备、存储介质
CN113555037B (zh) 篡改音频的篡改区域检测方法、装置及存储介质
CN113436633B (zh) 说话人识别方法、装置、计算机设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20962208

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20962208

Country of ref document: EP

Kind code of ref document: A1