WO2022088242A1 - Audio stress recognition method, apparatus and device, and medium - Google Patents

Audio stress recognition method, apparatus and device, and medium Download PDF

Info

Publication number
WO2022088242A1
WO2022088242A1 PCT/CN2020/127679 CN2020127679W WO2022088242A1 WO 2022088242 A1 WO2022088242 A1 WO 2022088242A1 CN 2020127679 W CN2020127679 W CN 2020127679W WO 2022088242 A1 WO2022088242 A1 WO 2022088242A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio signal
target
original audio
energy change
energy
Prior art date
Application number
PCT/CN2020/127679
Other languages
French (fr)
Chinese (zh)
Inventor
郑亚军
Original Assignee
瑞声声学科技(深圳)有限公司
瑞声光电科技(常州)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 瑞声声学科技(深圳)有限公司, 瑞声光电科技(常州)有限公司 filed Critical 瑞声声学科技(深圳)有限公司
Publication of WO2022088242A1 publication Critical patent/WO2022088242A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1807Speech classification or search using natural language modelling using prosody or stress
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique

Definitions

  • the present application relates to the technical field of audio processing, and in particular, to an audio accent recognition method, apparatus, device and medium.
  • Audio signal is an important medium for information dissemination.
  • Accent is the sound with greater intensity in music, the most prominent in the impact of sound, and is the main factor that constitutes the rhythm of music. By identifying the accent in music, the speed of the music rhythm can be judged.
  • stress often contains certain subjective emotions or key information.
  • the subjective emotions and key information in the audio can be distinguished. Therefore, it can be said that analyzing and identifying the audio stress can more fully understand the meaning to be expressed by the audio signal.
  • a method for audio accent recognition comprising:
  • Obtain a target Gaussian window function process the original audio signal according to the target Gaussian window function, and obtain an energy change curve corresponding to the original audio signal;
  • the processing of the original audio signal according to the target Gaussian window function to obtain an energy change curve corresponding to the original audio signal including:
  • the weighted calculation is performed on the original audio signal according to the target Gaussian function to obtain an energy curve corresponding to the original audio signal, including:
  • Carry out weighted calculation with the truncated audio signal and the target Gaussian window function obtain the target energy value of the original audio signal at the target moment, and obtain the original audio according to the target energy value at each target moment.
  • the corresponding energy curve of the signal obtain the target energy value of the original audio signal at the target moment, and obtain the original audio according to the target energy value at each target moment.
  • the truncated audio signal of the original audio signal at the target moment is determined according to the target Gaussian window function, including:
  • a Gaussian window is added to the original audio signal
  • performing numerical conversion processing on the energy curve to obtain an energy change curve corresponding to the original audio signal including:
  • a second derivative process is performed on the logarithmic function to obtain an energy change curve corresponding to the original audio signal.
  • the determining of the stress moment in the energy change curve according to the target sliding window includes:
  • the target sliding window is added to the energy change curve, the energy change peak value of the energy change curve in the target sliding window is obtained, and the time corresponding to the energy change peak value is taken as the accent time; wherein, the target slide
  • the starting point of the window at the starting position is the starting point of the energy change curve;
  • the accent time before the time corresponding to all the energy change peaks is regarded as the accent time, it further includes:
  • the energy change peak value is greater than or equal to the energy change threshold value, then continue to perform the step of using the time corresponding to the energy change peak value as the accent time;
  • the step of sliding the target sliding window according to a preset step size is continued.
  • An audio stress recognition device the device comprises:
  • an energy variation curve acquisition module used to acquire an original audio signal; acquire a target Gaussian window function, process the original audio signal according to the target Gaussian window function, and obtain an energy variation curve corresponding to the original audio signal;
  • the accent recognition module is configured to obtain a target sliding window, determine the accent moment in the energy change curve according to the target sliding window, and mark the original audio signal at the accent moment as audio accent.
  • a computer-readable storage medium storing a computer program, when the computer program is executed by a processor, the processor causes the processor to perform the following steps:
  • Obtain a target Gaussian window function process the original audio signal according to the target Gaussian window function, and obtain an energy change curve corresponding to the original audio signal;
  • An audio accent recognition device comprising a memory and a processor, wherein the memory stores a computer program, and when the computer program is executed by the processor, the processor is caused to perform the following steps:
  • Obtain a target Gaussian window function process the original audio signal according to the target Gaussian window function, and obtain an energy change curve corresponding to the original audio signal;
  • the present application provides an audio stress recognition method, device, device and medium.
  • the original audio signal is processed based on a Gaussian window function, and the temporal correlation of the audio signal is fully considered.
  • the result of subsequent stress recognition is more accurate. precise.
  • the most intense point of local energy change is also dynamically identified based on the sliding window, and it is marked as the stress moment to identify the audio stress. More scientific and practical.
  • Fig. 1 is the schematic flow chart of the audio stress recognition method in the first embodiment
  • Fig. 2 is the schematic diagram of target Gaussian window function in one embodiment
  • Fig. 3 is a schematic diagram of determining an accent moment according to a target sliding window in one embodiment
  • FIG. 5 is a schematic flowchart of the audio stress recognition method in the second embodiment
  • FIG. 6 is a schematic diagram of an energy curve in one embodiment
  • FIG. 7 is a schematic diagram of weighting processing on an original audio signal in one embodiment
  • FIG. 8 is a schematic diagram of an energy change curve in one embodiment
  • FIG. 9 is a schematic structural diagram of an audio accent recognition device in one embodiment
  • FIG. 10 is a structural block diagram of an audio accent recognition device in one embodiment.
  • FIG. 1 is a schematic flowchart of the audio stress recognition method in the first embodiment.
  • the steps provided by the audio stress recognition method in the first embodiment include:
  • Step 102 acquiring the original audio signal.
  • the original audio signal is the audio signal of the accent to be identified.
  • the original audio signal may be an audio signal pre-recorded and stored in a local storage medium, or may be a piece of audio signal collected in real time, which is not specifically limited here.
  • Step 104 Obtain a target Gaussian window function, process the original audio signal according to the target Gaussian window function, and obtain an energy change curve corresponding to the original audio signal.
  • the target Gaussian window function is used to weight the original audio signal.
  • the energy change curve is a curve that reflects the change of the energy value of the original audio signal at different target times.
  • the characteristic of the stress in the energy change curve is that there is a large energy change value, and based on this characteristic, the original audio signal can be identified in the subsequent steps.
  • the audio accent of the audio signal is used to weight the original audio signal.
  • the expression of the target Gaussian window function is:
  • n is a time variable, n ⁇ L, L is a parameter characterizing the width of the Gaussian window function, and a is a parameter characterizing the shape of the Gaussian window function.
  • the setting of the parameters of the Gaussian window function in this embodiment has a certain influence on the energy calculation, but the automatic identification method does not emphasize their optimization of the algorithm effect, and the parameters of the Gaussian window function are not further limited.
  • weighted calculation is performed on the original audio signal based on the target Gaussian window function to obtain an energy curve corresponding to the original audio signal.
  • derivation processing is performed on the energy curve to obtain the energy change curve corresponding to the original audio signal.
  • Step 106 Obtain the target sliding window, determine the accent moment in the energy change curve according to the target sliding window, and mark the original audio signal at the accent moment as audio accent.
  • the target sliding window is a window without longitudinal boundary, and the target sliding window is used to provide a dynamic judgment boundary of an energy change curve at a specific time.
  • the target sliding window slides continuously, and it is necessary to determine the stress moment of the energy change curve in the target sliding window at each specific moment.
  • a target sliding window is added to the energy change curve, and the window width of the target sliding window is specifically set to 0.06 seconds. It is worth noting that the sliding window width is selected as 0.06 seconds, which is just an example, and can also be 0.05 seconds, 0.07 seconds, or others.
  • the selection of the window width of the target sliding window is based on the phenomenon that "the accent interval of most music audio is between 0.02 and 1 second". If the width of the sliding window is too large or too small, errors will be introduced.
  • the energy change peak value of the energy change curve in the target sliding window is obtained (that is, the maximum value of the energy change value in the target sliding window is determined), and the time corresponding to the energy change peak value is taken as the accent time.
  • FIG. 4 is a schematic diagram of all the stress moments determined in the energy change curve. These stress moments are marked in the original audio signal, thereby obtaining the audio stress in the original audio signal.
  • the accent moment is also determined by combining the energy change threshold. Specifically, it is determined whether the peak value of the energy change at a specific time is greater than or equal to the energy change threshold, and the energy change threshold can be set to different values according to requirements such as identification accuracy, which is not specifically limited here. If the energy change peak value is greater than or equal to the energy change threshold value, the time corresponding to the energy change peak value is regarded as the accent moment; if the energy change peak value is less than the energy change threshold value, the target sliding window will continue to slide according to the preset step size until the next step is found. Accent moment that satisfies the energy change threshold condition.
  • the above-mentioned audio stress recognition method processes the original audio signal based on a Gaussian window function, and fully considers the temporal correlation of the audio signal. Compared with the traditional algorithm, the result of subsequent stress recognition is more accurate. Further, the most intense point of local energy change is also dynamically identified based on the sliding window, and it is marked as the stress moment to identify the audio stress. More scientific and practical.
  • FIG. 5 is a schematic flowchart of the audio stress recognition method in the second embodiment.
  • the steps provided by the audio stress recognition method in the second embodiment include:
  • Step 502 acquiring the original audio signal.
  • step 502 is basically the same as step 102 in the audio stress recognition method in the first embodiment, and details are not repeated here.
  • Step 504 Obtain a target Gaussian window function, perform weighted calculation on the original audio signal according to the target Gaussian function, and obtain an energy curve corresponding to the original audio signal.
  • the setting of the target Gaussian window function is the same as that in step 104, which is not repeated here.
  • the energy curve is a curve reflecting the energy value of the original audio signal at different target moments.
  • the step of weighting calculation specifically includes: first, determining the truncated audio signal of the original audio signal at the target moment according to the target Gaussian window function.
  • the target moment is any moment in the original audio signal; the width of the Gaussian window corresponding to the truncated audio signal and the Gaussian window function is the same, and both include the target moment.
  • the windowing calculation in the time domain is specifically expressed as point multiplication.
  • the calculation of the target energy value E(t) at the target time t is expressed as:
  • n is the time variable of the fixed domain T
  • t is the time domain variable of the original audio signal.
  • the energy curve corresponding to the original audio signal can be obtained according to these target energy values.
  • a Gaussian window is added to the original audio signal; the audio signal in the Gaussian window is used as the truncated audio at the target moment.
  • the Gaussian window exceeds the audio length of the original audio signal, there is no need to consider the weighting of the excess. That is, when t takes a small value, the left half of the Gaussian window may exceed the audio length of the original audio signal, and no weighting calculation is required for this excess. Correspondingly, when t takes a larger value, the right half of the Gaussian window may exceed the length of the original audio signal, and no weighting calculation is required for this excess.
  • Step 506 Perform numerical conversion processing on the energy curve to obtain an energy change curve corresponding to the original audio signal.
  • the numerical conversion processing specifically includes: first, performing logarithmic processing on the energy curve, so as to obtain a logarithmic function corresponding to the original audio signal.
  • the logarithmic processing of the energy curve can eliminate the directionality (that is, the positive and negative) of energy changes, thereby reducing the rapid change of energy.
  • the effect of large or rapidly small which in turn better reflects the rate of energy change.
  • a second derivation process is performed on the logarithmic function, so as to obtain an energy change curve corresponding to the original audio signal. Please refer to FIG. 8 for the energy change curve.
  • This embodiment proposes a method of taking the logarithm and the quadratic derivation of the energy curve, which can effectively reduce the influence of background noise and fully reflect the energy change characteristics of the energy change curve.
  • Step 508 Acquire the target sliding window, determine the accent moment in the energy change curve according to the target sliding window, and mark the original audio signal at the accent moment as audio accent.
  • step 508 is basically the same as step 106 in the audio stress recognition method in the first embodiment, and details are not repeated here.
  • an audio stress recognition device is proposed, the device includes:
  • the energy change curve obtaining module 902 is used to obtain the original audio signal; obtain the target Gaussian window function, process the original audio signal according to the target Gaussian window function, and obtain the energy change curve corresponding to the original audio signal;
  • the accent recognition module 904 is configured to acquire the target sliding window, determine the accent moment in the energy change curve according to the target sliding window, and mark the original audio signal at the accent moment as audio accent.
  • the above audio stress recognition device processes the original audio signal based on a Gaussian window function, and fully considers the temporal correlation of the audio signal. Compared with the traditional algorithm, the result of subsequent stress recognition is more accurate. Further, the most intense point of local energy change is also dynamically identified based on the sliding window, and it is marked as the stress moment to identify the audio stress. More scientific and practical.
  • the energy change curve acquisition module 902 is further specifically configured to: perform weighted calculation on the original audio signal according to the target Gaussian function to obtain an energy curve corresponding to the original audio signal; perform numerical conversion processing on the energy curve to obtain the original audio signal The energy curve corresponding to the signal.
  • the energy change curve acquisition module 902 is further specifically configured to: determine the truncated audio signal of the original audio signal at the target moment according to the target Gaussian window function; wherein, the target moment is any moment in the original audio signal; The truncated audio signal is weighted with the target Gaussian window function to obtain the target energy value of the original audio signal at the target time, and the energy curve corresponding to the original audio signal is obtained according to the target energy value at each target time.
  • the energy change curve acquisition module 902 is further specifically configured to: take the target moment as the middle moment of the Gaussian window corresponding to the target Gaussian window function, add a Gaussian window to the original audio signal; as the truncated audio signal at the target time.
  • the energy change curve obtaining module 902 is further specifically configured to: perform logarithmic processing on the energy curve to obtain a logarithmic function corresponding to the original audio signal; perform secondary derivation processing on the logarithmic function to obtain the original The energy change curve corresponding to the audio signal.
  • the accent recognition module 904 is also specifically configured to: add a target sliding window to the energy variation curve, obtain the energy variation peak value of the energy variation curve in the target sliding window, and use the moment corresponding to the energy variation peak value as the accent moment; Among them, the starting point of the target sliding window at the starting position is the starting point of the energy change curve; slide the target sliding window according to the preset step size, return to obtain the energy change peak value of the energy change curve in the target sliding window, and set the energy change peak value corresponding to The moment is used as the step of the accent moment.
  • the accent recognition module 904 is further specifically configured to: determine whether the peak value of the energy change is greater than or equal to the energy change threshold; if the peak value of the energy change is greater than or equal to the energy change threshold, continue to execute the time corresponding to the energy change peak as the Steps at the time of stress; if the energy change peak value is less than the energy change threshold, continue to perform the step of sliding the target sliding window according to the preset step size.
  • FIG. 10 shows an internal structure diagram of an audio accent recognition device in one embodiment.
  • the audio accent recognition device includes a processor, a memory and a network interface connected through a system bus.
  • the memory includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium of the audio stress recognition device stores an operating system, and also stores a computer program, which, when executed by the processor, enables the processor to implement the audio stress recognition method.
  • a computer program may also be stored in the internal memory, and when executed by the processor, the computer program can cause the processor to execute the audio accent recognition method.
  • the accent recognition device may include more or fewer components than shown in the figures, or combine certain components, or have a different arrangement of components.
  • An audio stress recognition device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implements the following steps when executing the computer program: acquiring an original audio signal; acquiring a target Gaussian Window function, process the original audio signal according to the target Gaussian window function, and obtain the energy change curve corresponding to the original audio signal; obtain the target sliding window, determine the stress time in the energy change curve according to the target sliding window, and convert the original audio at the stress time. Signals are marked as audio accents.
  • processing the original audio signal according to the target Gaussian window function to obtain an energy change curve corresponding to the original audio signal includes: performing weighted calculation on the original audio signal according to the target Gaussian function to obtain an energy curve corresponding to the original audio signal ; Perform numerical conversion processing on the energy curve to obtain the energy change curve corresponding to the original audio signal.
  • weighted calculation is performed on the original audio signal according to the target Gaussian function to obtain an energy curve corresponding to the original audio signal, including: determining the truncated audio signal of the original audio signal at the target time according to the target Gaussian window function; wherein, the target time is any moment in the original audio signal; the truncated audio signal and the target Gaussian window function are weighted to obtain the target energy value of the original audio signal at the target moment, and the corresponding original audio signal is obtained according to the target energy value at each target moment. energy curve.
  • determine the truncated audio signal of the original audio signal at the target moment according to the target Gaussian window function comprising: taking the target moment as the middle moment of the Gaussian window corresponding to the target Gaussian window function, adding a Gaussian window on the original audio signal; Take the audio signal within the Gaussian window as the truncated audio signal at the target time.
  • performing numerical conversion processing on the energy curve to obtain an energy change curve corresponding to the original audio signal including: performing logarithmic processing on the energy curve to obtain a logarithmic function corresponding to the original audio signal; The secondary derivation process is used to obtain the energy change curve corresponding to the original audio signal.
  • determining the stress moment in the energy change curve according to the target sliding window includes: adding a target sliding window to the energy change curve, obtaining the energy change peak value of the energy change curve in the target sliding window, and converting the energy change peak value corresponding to the energy change peak value time as the accent time; wherein, the starting point of the target sliding window at the starting position is the starting point of the energy change curve; sliding the target sliding window according to the preset step size, and returning to obtain the energy change peak value of the energy change curve in the target sliding window, The step of taking the time corresponding to the energy change peak as the accent time.
  • the method before taking the time corresponding to all the energy change peaks as the stress time, the method further includes: judging whether the energy change peak value is greater than or equal to the energy change threshold; if the energy change peak value is greater than or equal to the energy change threshold, continue to execute the The time corresponding to the change peak is regarded as the step of the stress time; if the energy change peak is smaller than the energy change threshold, the step of sliding the target sliding window according to the preset step size is continued.
  • a computer-readable storage medium which stores a computer program, and when the computer program is executed by a processor, the following steps are implemented: obtaining an original audio signal; The signal is processed to obtain the energy change curve corresponding to the original audio signal; the target sliding window is obtained, the stress time in the energy change curve is determined according to the target sliding window, and the original audio signal at the stress time is marked as audio stress.
  • processing the original audio signal according to the target Gaussian window function to obtain an energy change curve corresponding to the original audio signal includes: performing weighted calculation on the original audio signal according to the target Gaussian function to obtain an energy curve corresponding to the original audio signal ; Perform numerical conversion processing on the energy curve to obtain the energy change curve corresponding to the original audio signal.
  • weighted calculation is performed on the original audio signal according to the target Gaussian function to obtain an energy curve corresponding to the original audio signal, including: determining the truncated audio signal of the original audio signal at the target time according to the target Gaussian window function; wherein, the target time is any moment in the original audio signal; the truncated audio signal and the target Gaussian window function are weighted to obtain the target energy value of the original audio signal at the target moment, and the corresponding original audio signal is obtained according to the target energy value at each target moment. energy curve.
  • determine the truncated audio signal of the original audio signal at the target moment according to the target Gaussian window function comprising: taking the target moment as the middle moment of the Gaussian window corresponding to the target Gaussian window function, adding a Gaussian window on the original audio signal; Take the audio signal within the Gaussian window as the truncated audio signal at the target time.
  • performing numerical conversion processing on the energy curve to obtain an energy change curve corresponding to the original audio signal including: performing logarithmic processing on the energy curve to obtain a logarithmic function corresponding to the original audio signal; The secondary derivation process is used to obtain the energy change curve corresponding to the original audio signal.
  • determining the stress moment in the energy change curve according to the target sliding window includes: adding a target sliding window to the energy change curve, obtaining the energy change peak value of the energy change curve in the target sliding window, and converting the energy change peak value corresponding to the energy change peak value time as the accent time; wherein, the starting point of the target sliding window at the starting position is the starting point of the energy change curve; sliding the target sliding window according to the preset step size, and returning to obtain the energy change peak value of the energy change curve in the target sliding window, The step of taking the time corresponding to the energy change peak as the accent time.
  • the method before taking the time corresponding to all the energy change peaks as the stress time, the method further includes: judging whether the energy change peak value is greater than or equal to the energy change threshold; if the energy change peak value is greater than or equal to the energy change threshold, continue to execute the The time corresponding to the change peak is regarded as the step of the stress time; if the energy change peak is smaller than the energy change threshold, the step of sliding the target sliding window according to the preset step size is continued.
  • Nonvolatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in various forms such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Road (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM) and so on.
  • SRAM static RAM
  • DRAM dynamic RAM
  • SDRAM synchronous DRAM
  • DDRSDRAM double data rate SDRAM
  • ESDRAM enhanced SDRAM
  • SLDRAM synchronous chain Road (Synchlink) DRAM
  • SLDRAM synchronous chain Road (Synchlink) DRAM
  • Rambus direct RAM
  • DRAM direct memory bus dynamic RAM
  • RDRAM memory bus dynamic RAM

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Artificial Intelligence (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)
  • Auxiliary Devices For Music (AREA)

Abstract

Disclosed is an audio stress recognition method, the method comprising: acquiring an original audio signal; acquiring a target Gaussian window function, and processing the original audio signal according to the target Gaussian window function, so as to obtain an energy change curve corresponding to the original audio signal; and acquiring a target sliding window, determining a stress moment in the energy change curve according to the target sliding window, and marking the original audio signal at the stress moment as an audio stress. In the present application, the temporal correlation of an audio signal is taken into full consideration. Compared with a traditional algorithm, the result of subsequent stress recognition is more accurate. Furthermore, by means of the present application, the influence of an excessive local intensity fluctuation of audio on the overall audio recognition is eliminated, such that the present application is more scientific and practical. Also provided are an audio stress recognition apparatus and device, and a storage medium.

Description

音频重音识别方法、装置、设备和介质Audio accent recognition method, apparatus, device and medium 技术领域technical field
本申请涉及音频处理技术领域,尤其涉及一种音频重音识别方法、装置、设备和介质。The present application relates to the technical field of audio processing, and in particular, to an audio accent recognition method, apparatus, device and medium.
背景技术Background technique
不管是日常的说话交流、音乐影音、还是语音通话,都可以通过录音将声音保存为一段或多段音频信号。音频信号作为可保存的数据,是信息传播的重要媒介。重音是音乐中强度较大的音,在声音的冲击上最为突出,是构成音乐节奏的主要因素,通过对音乐中的重音进行识别,可以判断出音乐节奏的快慢。Whether it is daily speech communication, music video, or voice calls, the sound can be saved as one or more audio signals through recording. Audio signal, as data that can be saved, is an important medium for information dissemination. Accent is the sound with greater intensity in music, the most prominent in the impact of sound, and is the main factor that constitutes the rhythm of music. By identifying the accent in music, the speed of the music rhythm can be judged.
技术问题technical problem
此外重音常常会包含一定的主观情绪或者重点信息,通过对音频中的重音进行识别,可因分辨出音频中的主观情绪和重点信息。因此可以说,对于音频重音进行分析识别可以更加充分的了解该段音频信号所要表达的含义。In addition, stress often contains certain subjective emotions or key information. By identifying the stress in the audio, the subjective emotions and key information in the audio can be distinguished. Therefore, it can be said that analyzing and identifying the audio stress can more fully understand the meaning to be expressed by the audio signal.
技术解决方案technical solutions
基于此,有必要针对上述问题,提供可准确识别的音频重音识别方法、装置、设备和介质。Based on this, it is necessary to provide an audio accent recognition method, apparatus, device and medium that can be accurately identified in order to address the above problems.
一种音频重音识别的方法,所述方法包括:A method for audio accent recognition, the method comprising:
获取原始音频信号;Get the original audio signal;
获取目标高斯窗函数,根据所述目标高斯窗函数对所述原始音频信号进行处理,得到所述原始音频信号对应的能量变化曲线;Obtain a target Gaussian window function, process the original audio signal according to the target Gaussian window function, and obtain an energy change curve corresponding to the original audio signal;
获取目标滑动窗,根据所述目标滑动窗确定所述能量变化曲线中的重音时刻,将在所述重音时刻的所述原始音频信号标示为音频重音。Acquire a target sliding window, determine the accent moment in the energy change curve according to the target sliding window, and mark the original audio signal at the accent moment as audio accent.
在其中一个实施例中,所述根据所述目标高斯窗函数对所述原始音频信号进行处理,得到所述原始音频信号对应的能量变化曲线,包括:In one embodiment, the processing of the original audio signal according to the target Gaussian window function to obtain an energy change curve corresponding to the original audio signal, including:
根据所述目标高斯函数对所述原始音频信号进行加权计算,得到所述原始音频信号对应的能量曲线;Perform weighted calculation on the original audio signal according to the target Gaussian function to obtain an energy curve corresponding to the original audio signal;
对所述能量曲线进行数值转换处理,得到所述原始音频信号对应的能量变化曲线。Perform numerical conversion processing on the energy curve to obtain an energy change curve corresponding to the original audio signal.
在其中一个实施例中,所述根据所述目标高斯函数对所述原始音频信号进行加权计算,得到所述原始音频信号对应的能量曲线,包括:In one embodiment, the weighted calculation is performed on the original audio signal according to the target Gaussian function to obtain an energy curve corresponding to the original audio signal, including:
根据所述目标高斯窗函数确定所述原始音频信号在目标时刻的截断音频信号;其中,所述目标时刻为所述原始音频信号中的任意一个时刻;Determine the truncated audio signal of the original audio signal at the target moment according to the target Gaussian window function; wherein, the target moment is any moment in the original audio signal;
将所述截断音频信号与所述目标高斯窗函数进行加权计算,获取所述原始音频信号在所述目标时刻的目标能量值,根据在每一目标时刻的所述目标能量值得到所述原始音频信号对应的能量曲线。Carry out weighted calculation with the truncated audio signal and the target Gaussian window function, obtain the target energy value of the original audio signal at the target moment, and obtain the original audio according to the target energy value at each target moment. The corresponding energy curve of the signal.
在其中一个实施例中,所述根据所述目标高斯窗函数确定所述原始音频信号在目标时刻的截断音频信号,包括:In one embodiment, the truncated audio signal of the original audio signal at the target moment is determined according to the target Gaussian window function, including:
以所述目标时刻为所述目标高斯窗函数对应的高斯窗口的中间时刻,在所述原始音频信号上添加高斯窗口;Taking the target moment as the middle moment of the Gaussian window corresponding to the target Gaussian window function, a Gaussian window is added to the original audio signal;
将所述高斯窗口内的音频信号作为在所述目标时刻的截断音频信号。Taking the audio signal in the Gaussian window as the truncated audio signal at the target time.
在其中一个实施例中,所述对所述能量曲线进行数值转换处理,得到所述原始音频信号对应的能量变化曲线,包括:In one of the embodiments, performing numerical conversion processing on the energy curve to obtain an energy change curve corresponding to the original audio signal, including:
对所述能量曲线进行取对数处理,获取所述原始音频信号对应的对数函数;Perform logarithmic processing on the energy curve to obtain a logarithmic function corresponding to the original audio signal;
对所述对数函数进行二次求导处理,获取所述原始音频信号对应的能量变化曲线。A second derivative process is performed on the logarithmic function to obtain an energy change curve corresponding to the original audio signal.
在其中一个实施例中,所述根据所述目标滑动窗确定所述能量变化曲线中的重音时刻,包括:In one embodiment, the determining of the stress moment in the energy change curve according to the target sliding window includes:
在所述能量变化曲线中添加所述目标滑动窗,获取所述目标滑动窗内所述能量变化曲线的能量变化峰值,将所述能量变化峰值对应的时刻作为重音时刻;其中,所述目标滑动窗在起始位置的起始点为所述能量变化曲线的起始点;The target sliding window is added to the energy change curve, the energy change peak value of the energy change curve in the target sliding window is obtained, and the time corresponding to the energy change peak value is taken as the accent time; wherein, the target slide The starting point of the window at the starting position is the starting point of the energy change curve;
按照预设步长滑动所述目标滑动窗,返回执行所述获取所述目标滑动窗内所述能量变化曲线的能量变化峰值,将所述能量变化峰值对应的时刻作为重音时刻的步骤。Sliding the target sliding window according to a preset step size, returning to the step of obtaining the energy change peak value of the energy change curve in the target sliding window, and taking the time corresponding to the energy change peak value as the accent time.
在其中一个实施例中,在所述将所有能量变化峰值对应的时刻作为重音时刻之前还包括:In one of the embodiments, before the time corresponding to all the energy change peaks is regarded as the accent time, it further includes:
判断所述能量变化峰值是否大于或等于能量变化阈值;Determine whether the energy change peak value is greater than or equal to the energy change threshold;
若所述能量变化峰值大于或等于能量变化阈值,则继续执行所述将所述能量变化峰值对应的时刻作为重音时刻的步骤;If the energy change peak value is greater than or equal to the energy change threshold value, then continue to perform the step of using the time corresponding to the energy change peak value as the accent time;
若所述能量变化峰值小于能量变化阈值,则继续执行所述按照预设步长滑动所述目标滑动窗的步骤。If the energy change peak value is less than the energy change threshold, the step of sliding the target sliding window according to a preset step size is continued.
一种音频重音识别装置,所述装置包括:An audio stress recognition device, the device comprises:
能量变化曲线获取模块,用于获取原始音频信号;获取目标高斯窗函数,根据所述目标高斯窗函数对所述原始音频信号进行处理,得到所述原始音频信号对应的能量变化曲线;an energy variation curve acquisition module, used to acquire an original audio signal; acquire a target Gaussian window function, process the original audio signal according to the target Gaussian window function, and obtain an energy variation curve corresponding to the original audio signal;
重音识别模块,用于获取目标滑动窗,根据所述目标滑动窗确定所述能量变化曲线中的重音时刻,将在所述重音时刻的所述原始音频信号标示为音频重音。The accent recognition module is configured to obtain a target sliding window, determine the accent moment in the energy change curve according to the target sliding window, and mark the original audio signal at the accent moment as audio accent.
一种计算机可读存储介质,存储有计算机程序,所述计算机程序被处理器执行时,使得所述处理器执行如下步骤:A computer-readable storage medium storing a computer program, when the computer program is executed by a processor, the processor causes the processor to perform the following steps:
获取原始音频信号;Get the original audio signal;
获取目标高斯窗函数,根据所述目标高斯窗函数对所述原始音频信号进行处理,得到所述原始音频信号对应的能量变化曲线;Obtain a target Gaussian window function, process the original audio signal according to the target Gaussian window function, and obtain an energy change curve corresponding to the original audio signal;
获取目标滑动窗,根据所述目标滑动窗确定所述能量变化曲线中的重音时刻,将在所述重音时刻的所述原始音频信号标示为音频重音。Acquire a target sliding window, determine the accent moment in the energy change curve according to the target sliding window, and mark the original audio signal at the accent moment as audio accent.
一种音频重音识别设备,包括存储器和处理器,所述存储器存储有计算机程序,所述计算机程序被所述处理器执行时,使得所述处理器执行如下步骤:An audio accent recognition device, comprising a memory and a processor, wherein the memory stores a computer program, and when the computer program is executed by the processor, the processor is caused to perform the following steps:
获取原始音频信号;Get the original audio signal;
获取目标高斯窗函数,根据所述目标高斯窗函数对所述原始音频信号进行处理,得到所述原始音频信号对应的能量变化曲线;Obtain a target Gaussian window function, process the original audio signal according to the target Gaussian window function, and obtain an energy change curve corresponding to the original audio signal;
获取目标滑动窗,根据所述目标滑动窗确定所述能量变化曲线中的重音时刻,将在所述重音时刻的所述原始音频信号标示为音频重音。Acquire a target sliding window, determine the accent moment in the energy change curve according to the target sliding window, and mark the original audio signal at the accent moment as audio accent.
有益效果beneficial effect
本申请提供了音频重音识别方法、装置、设备和介质,基于高斯窗函数对原始音频信号进行处理,充分考虑音频信号在时间上的相关性,相较于传统算法,后续重音识别的结果更为准确。进一步的,还基于滑动窗动态识别局部能量变化的最强烈点,并将其标记为重音时刻从而识别出音频重音,本申请排除了音频局部强度波动过大对整体音频识别造成的影响,因此也更具科学性及实用性。The present application provides an audio stress recognition method, device, device and medium. The original audio signal is processed based on a Gaussian window function, and the temporal correlation of the audio signal is fully considered. Compared with the traditional algorithm, the result of subsequent stress recognition is more accurate. precise. Further, the most intense point of local energy change is also dynamically identified based on the sliding window, and it is marked as the stress moment to identify the audio stress. More scientific and practical.
附图说明Description of drawings
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the following briefly introduces the accompanying drawings required for the description of the embodiments or the prior art. Obviously, the drawings in the following description are only These are some embodiments of the present application. For those of ordinary skill in the art, other drawings can also be obtained based on these drawings without any creative effort.
其中:in:
图1为第一实施例中音频重音识别方法的流程示意图;Fig. 1 is the schematic flow chart of the audio stress recognition method in the first embodiment;
图2为一个实施例中目标高斯窗函数的示意图;Fig. 2 is the schematic diagram of target Gaussian window function in one embodiment;
图3为一个实施例中根据目标滑动窗确定重音时刻的示意图;Fig. 3 is a schematic diagram of determining an accent moment according to a target sliding window in one embodiment;
图4为一个实施例中确定的所有重音时刻的示意图;4 is a schematic diagram of all stress moments determined in one embodiment;
图5为第二实施例中音频重音识别分法的流程示意图;5 is a schematic flowchart of the audio stress recognition method in the second embodiment;
图6为一个实施例中能量曲线的示意图;6 is a schematic diagram of an energy curve in one embodiment;
图7为一个实施例中对原始音频信号做加权处理的示意图;7 is a schematic diagram of weighting processing on an original audio signal in one embodiment;
图8为一个实施例中能量变化曲线的示意图;8 is a schematic diagram of an energy change curve in one embodiment;
图9为一个实施例中音频重音识别装置的结构示意图;9 is a schematic structural diagram of an audio accent recognition device in one embodiment;
图10为一个实施例中音频重音识别设备的结构框图。FIG. 10 is a structural block diagram of an audio accent recognition device in one embodiment.
本发明的实施方式Embodiments of the present invention
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments are only a part of the embodiments of the present application, but not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative work fall within the protection scope of the present application.
如图1所示,图1为第一实施例中音频重音识别方法的流程示意图,本第一实施例中音频重音识别方法提供的步骤包括:As shown in FIG. 1, FIG. 1 is a schematic flowchart of the audio stress recognition method in the first embodiment. The steps provided by the audio stress recognition method in the first embodiment include:
步骤102,获取原始音频信号。Step 102, acquiring the original audio signal.
其中,原始音频信号为待识别重音的音频信号。该原始音频信号可以是预先录制并存储于本地存储介质的音频信号,也可以是实时采集得到的一段音频信号,在此不做具体限定。Wherein, the original audio signal is the audio signal of the accent to be identified. The original audio signal may be an audio signal pre-recorded and stored in a local storage medium, or may be a piece of audio signal collected in real time, which is not specifically limited here.
步骤104,获取目标高斯窗函数,根据目标高斯窗函数对原始音频信号进行处理,得到原始音频信号对应的能量变化曲线。Step 104: Obtain a target Gaussian window function, process the original audio signal according to the target Gaussian window function, and obtain an energy change curve corresponding to the original audio signal.
其中,目标高斯窗函数用于对原始音频信号做加权处理。能量变化曲线是反应原始音频信号在不同目标时刻能量值变化快慢情况的曲线,重音在能量变化曲线中的特点为有一个较大的能量变化值,而基于该特点即可在后续步骤识别得到原始音频信号的音频重音。Among them, the target Gaussian window function is used to weight the original audio signal. The energy change curve is a curve that reflects the change of the energy value of the original audio signal at different target times. The characteristic of the stress in the energy change curve is that there is a large energy change value, and based on this characteristic, the original audio signal can be identified in the subsequent steps. The audio accent of the audio signal.
本实施例中,目标高斯窗函数的表达式为:In this embodiment, the expression of the target Gaussian window function is:
Gw(n)=e-n^2/(2•a^2)Gw(n)=e-n^2/(2•a^2)
其中,n是时间变量,n∈L,L是表征高斯窗函数宽度的参数,a是表征高斯窗函数形状的参数。示例性的,参见图2,图2为目标高斯窗函数的示意图,该目标高斯窗函数的参数a = 0.003,高斯窗宽度L=[-0.01,0.01](单位:秒)。本实施例中高斯窗函数参数的设置对能量计算存在一定的影响,但该自动识别方法,不强调它们对算法效果的优化,对于高斯窗函数的参数不做进一步限定。where n is a time variable, n∈L, L is a parameter characterizing the width of the Gaussian window function, and a is a parameter characterizing the shape of the Gaussian window function. Exemplarily, referring to Fig. 2, Fig. 2 is a schematic diagram of a target Gaussian window function, and the parameter a of the target Gaussian window function is 0.003, the width of the Gaussian window L=[-0.01,0.01] (unit: seconds). The setting of the parameters of the Gaussian window function in this embodiment has a certain influence on the energy calculation, but the automatic identification method does not emphasize their optimization of the algorithm effect, and the parameters of the Gaussian window function are not further limited.
进一步的,基于上述目标高斯窗函数对原始音频信号进行加权计算,以得到原始音频信号对应的能量曲线。并对该能量曲线进行求导处理,以得到原始音频信号对应的能量变化曲线。该具体实现方法在后文详述,在此不做赘述。Further, weighted calculation is performed on the original audio signal based on the target Gaussian window function to obtain an energy curve corresponding to the original audio signal. And derivation processing is performed on the energy curve to obtain the energy change curve corresponding to the original audio signal. The specific implementation method will be described in detail later, and will not be repeated here.
步骤106,获取目标滑动窗,根据目标滑动窗确定能量变化曲线中的重音时刻,将在重音时刻的原始音频信号标示为音频重音。Step 106: Obtain the target sliding window, determine the accent moment in the energy change curve according to the target sliding window, and mark the original audio signal at the accent moment as audio accent.
其中,目标滑动窗为无纵向边界的窗口,该目标滑动窗用于提供一个能量变化曲线在特定时刻的动态判断边界。本实施例中目标滑动窗连续滑动,需判断在每一特定时刻目标滑动窗内能量变化曲线的重音时刻。Among them, the target sliding window is a window without longitudinal boundary, and the target sliding window is used to provide a dynamic judgment boundary of an energy change curve at a specific time. In this embodiment, the target sliding window slides continuously, and it is necessary to determine the stress moment of the energy change curve in the target sliding window at each specific moment.
在一个具体实施例中,参见图3,首先在能量变化曲线中添加目标滑动窗,该目标滑动窗的窗口宽度具体设定为0.06秒。值得注意的是,滑动窗宽度选取为0.06秒,只是一个示例,也可以为0.05秒、0.07秒或其他。目标滑动窗的窗口宽度选取是参考“大部分音乐音频的重音间隔大约在0.02~1秒之间”这一现象,若滑动窗宽度取得太大或太小都会引入误差。其次,获取目标滑动窗内能量变化曲线的能量变化峰值(也即确定目标滑动窗内能量变化值的最大值),将能量变化峰值对应的时刻作为重音时刻。In a specific embodiment, referring to FIG. 3 , first, a target sliding window is added to the energy change curve, and the window width of the target sliding window is specifically set to 0.06 seconds. It is worth noting that the sliding window width is selected as 0.06 seconds, which is just an example, and can also be 0.05 seconds, 0.07 seconds, or others. The selection of the window width of the target sliding window is based on the phenomenon that "the accent interval of most music audio is between 0.02 and 1 second". If the width of the sliding window is too large or too small, errors will be introduced. Secondly, the energy change peak value of the energy change curve in the target sliding window is obtained (that is, the maximum value of the energy change value in the target sliding window is determined), and the time corresponding to the energy change peak value is taken as the accent time.
进一步的,本实施例中目标滑动窗是连续滑动的,而为使目标滑动窗遍历能量变化曲线,设定目标滑动窗在滑动开始的起始位置处,滑动窗的起始点(滑动窗的左侧端点)与能量变化曲线的起始点(t=0)一致。然后按照预设步长滑动目标滑动窗,并执行上述获取目标滑动窗内能量变化曲线的能量变化峰值,将能量变化峰值对应的时刻作为重音时刻的步骤,直至滑动窗的终止点(滑动窗的右侧端点)到达能量变化曲线的终止点,从而停止目标滑动窗的滑动。参见图4,图4为能量变化曲线中确定的所有重音时刻的示意图,将这些重音时刻在原始音频信号中进行标示,从而得到原始音频信号中的音频重音。Further, in this embodiment, the target sliding window is continuously sliding, and in order to make the target sliding window traverse the energy change curve, the target sliding window is set at the starting position of the sliding window, and the starting point of the sliding window (the left side of the sliding window) is set. side end point) is consistent with the starting point (t=0) of the energy change curve. Then slide the target sliding window according to the preset step size, and perform the above-mentioned steps of obtaining the energy change peak value of the energy change curve in the target sliding window, and taking the time corresponding to the energy change peak value as the accent time, until the end point of the sliding window (the The right end point) reaches the end point of the energy change curve, thereby stopping the sliding of the target sliding window. Referring to FIG. 4 , FIG. 4 is a schematic diagram of all the stress moments determined in the energy change curve. These stress moments are marked in the original audio signal, thereby obtaining the audio stress in the original audio signal.
在一个具体实施例中,由于重音是强度较大的音,因此还结合能量变化阈值来确定重音时刻。具体的,判断特定时刻内能量变化峰值是否大于或等于能量变化阈值,该能量变化阈值根据识别精度等要求可自行设置不同值,在此不做具体限定。若能量变化峰值大于或等于能量变化阈值,则将该能量变化峰值对应的时刻作为重音时刻;而若能量变化峰值小于能量变化阈值,则按照预设步长继续滑动目标滑动窗,直至找到下一满足能量变化阈值条件的重音时刻。In a specific embodiment, since the accent is a sound with relatively high intensity, the accent moment is also determined by combining the energy change threshold. Specifically, it is determined whether the peak value of the energy change at a specific time is greater than or equal to the energy change threshold, and the energy change threshold can be set to different values according to requirements such as identification accuracy, which is not specifically limited here. If the energy change peak value is greater than or equal to the energy change threshold value, the time corresponding to the energy change peak value is regarded as the accent moment; if the energy change peak value is less than the energy change threshold value, the target sliding window will continue to slide according to the preset step size until the next step is found. Accent moment that satisfies the energy change threshold condition.
上述音频重音识别方法,基于高斯窗函数对原始音频信号进行处理,充分考虑音频信号在时间上的相关性,相较于传统算法,后续重音识别的结果更为准确。进一步的,还基于滑动窗动态识别局部能量变化的最强烈点,并将其标记为重音时刻从而识别出音频重音,本申请排除了音频局部强度波动过大对整体音频识别造成的影响,因此也更具科学性及实用性。The above-mentioned audio stress recognition method processes the original audio signal based on a Gaussian window function, and fully considers the temporal correlation of the audio signal. Compared with the traditional algorithm, the result of subsequent stress recognition is more accurate. Further, the most intense point of local energy change is also dynamically identified based on the sliding window, and it is marked as the stress moment to identify the audio stress. More scientific and practical.
如图5所示,图5为第二实施例中音频重音识别方法的流程示意图,本第二实施例中音频重音识别方法提供的步骤包括:As shown in FIG. 5, FIG. 5 is a schematic flowchart of the audio stress recognition method in the second embodiment. The steps provided by the audio stress recognition method in the second embodiment include:
步骤502,获取原始音频信号。Step 502, acquiring the original audio signal.
在一个具体的实施场景中,步骤502与第一实施例中音频重音识别方法中的步骤102基本一致,此处不再进行赘述。In a specific implementation scenario, step 502 is basically the same as step 102 in the audio stress recognition method in the first embodiment, and details are not repeated here.
步骤504,获取目标高斯窗函数,根据目标高斯函数对原始音频信号进行加权计算,得到原始音频信号对应的能量曲线。Step 504: Obtain a target Gaussian window function, perform weighted calculation on the original audio signal according to the target Gaussian function, and obtain an energy curve corresponding to the original audio signal.
其中,目标高斯窗函数的设置与步骤104中一致,在此不做赘述。能量曲线是反应原始音频信号在不同目标时刻能量值的变化曲线。The setting of the target Gaussian window function is the same as that in step 104, which is not repeated here. The energy curve is a curve reflecting the energy value of the original audio signal at different target moments.
在一个具体实施例中,加权计算的步骤具体包括:首先,根据目标高斯窗函数确定原始音频信号在目标时刻的截断音频信号。其中,目标时刻为原始音频信号中的任意一个时刻;截断音频信号与高斯窗函数对应的高斯窗口的宽度相同,且都包括目标时刻。其次,将截断音频信号与目标高斯窗函数进行加权计算,从而获取原始音频信号在目标时刻的目标能量值。在时域上进行加窗计算具体表现为点乘,相应的,在目标时刻t时的目标能量值E(t)计算表示为: In a specific embodiment, the step of weighting calculation specifically includes: first, determining the truncated audio signal of the original audio signal at the target moment according to the target Gaussian window function. The target moment is any moment in the original audio signal; the width of the Gaussian window corresponding to the truncated audio signal and the Gaussian window function is the same, and both include the target moment. Second, weighting the truncated audio signal and the target Gaussian window function to obtain the target energy value of the original audio signal at the target moment. The windowing calculation in the time domain is specifically expressed as point multiplication. Correspondingly, the calculation of the target energy value E(t) at the target time t is expressed as:
E(t)=(x(n+t)^2).*Gw(n)E(t)=(x(n+t)^2).*Gw(n)
式中,n为固定域T的时间变量,t为原始音频信号的时间域变量。In the formula, n is the time variable of the fixed domain T, and t is the time domain variable of the original audio signal.
参见图6,当求得原始音频信号在所有目标时刻的目标能量值,即可根据这些目标能量值得到原始音频信号对应的能量曲线。Referring to FIG. 6 , when the target energy values of the original audio signal at all target times are obtained, the energy curve corresponding to the original audio signal can be obtained according to these target energy values.
在一个具体实施例中,参见图7,以目标时刻为目标高斯窗函数对应的高斯窗口的中间时刻,在原始音频信号上添加高斯窗口;将高斯窗口内的音频信号作为在目标时刻的截断音频信号。也即对于原始音频信号中的一个任意目标时刻t而言,若选取高斯窗的宽度为T =[-0.01,0.01]秒,则原始音频信号在该目标时刻t的截断音频信号为时间域[t-0.01,t+0.01]的音频信号。In a specific embodiment, referring to FIG. 7 , taking the target moment as the middle moment of the Gaussian window corresponding to the target Gaussian window function, a Gaussian window is added to the original audio signal; the audio signal in the Gaussian window is used as the truncated audio at the target moment. Signal. That is to say, for an arbitrary target time t in the original audio signal, if the width of the Gaussian window is selected as T = [-0.01, 0.01] seconds, the truncated audio signal of the original audio signal at the target time t is in the time domain [ t-0.01, t+0.01] audio signal.
值得注意的是,当高斯窗口超出原始音频信号的音频长度时,则不需要考虑对超出部分进行加权。也即当t取较小值时,高斯窗口的左半部分可能超出了原始音频信号的音频长度,对于该超出部分无需进行加权计算。相应的,当t取较大值时,高斯窗口的右半部分可能超出了原始音频信号的长度,对于该超出部分也无需进行加权计算。It is worth noting that when the Gaussian window exceeds the audio length of the original audio signal, there is no need to consider the weighting of the excess. That is, when t takes a small value, the left half of the Gaussian window may exceed the audio length of the original audio signal, and no weighting calculation is required for this excess. Correspondingly, when t takes a larger value, the right half of the Gaussian window may exceed the length of the original audio signal, and no weighting calculation is required for this excess.
步骤506,对能量曲线进行数值转换处理,得到原始音频信号对应的能量变化曲线。Step 506: Perform numerical conversion processing on the energy curve to obtain an energy change curve corresponding to the original audio signal.
在一个具体实施例中,数值转换处理具体包括:首先,对能量曲线进行取对数处理,从而获取原始音频信号对应的对数函数。这是因为若引入能量变化的方向性将增加后续对重音的识别难度,而先对能量曲线进行取对数处理可以消除能量变化的方向性(也即正负性),从而减小能量快速变大或快速变小带来的影响,继而更好的反映能量变化速率。进一步的,再对对数函数进行二次求导处理,从而获取原始音频信号对应的能量变化曲线,该能量变化曲线请参见图8。In a specific embodiment, the numerical conversion processing specifically includes: first, performing logarithmic processing on the energy curve, so as to obtain a logarithmic function corresponding to the original audio signal. This is because the introduction of the directionality of energy changes will increase the difficulty of identifying accents later, and the logarithmic processing of the energy curve can eliminate the directionality (that is, the positive and negative) of energy changes, thereby reducing the rapid change of energy. The effect of large or rapidly small, which in turn better reflects the rate of energy change. Further, a second derivation process is performed on the logarithmic function, so as to obtain an energy change curve corresponding to the original audio signal. Please refer to FIG. 8 for the energy change curve.
对加权后的能量曲线进行取对数并二次求导,获得能量变化特性曲线P(t)的具体计算方式描述如下:Taking the logarithm of the weighted energy curve and taking the second derivative, the specific calculation method to obtain the energy change characteristic curve P(t) is described as follows:
P(t)=d2(ln(E(t)+1))/dt2P(t)=d2(ln(E(t)+1))/dt2
本实施例提出对能量曲线取对数及二次求导的方式,可有效降低背景噪声影响,充分反映出能量变化曲线的能量变化特性。This embodiment proposes a method of taking the logarithm and the quadratic derivation of the energy curve, which can effectively reduce the influence of background noise and fully reflect the energy change characteristics of the energy change curve.
步骤508,获取目标滑动窗,根据目标滑动窗确定能量变化曲线中的重音时刻,将在重音时刻的原始音频信号标示为音频重音。Step 508: Acquire the target sliding window, determine the accent moment in the energy change curve according to the target sliding window, and mark the original audio signal at the accent moment as audio accent.
在一个具体的实施场景中,步骤508与第一实施例中音频重音识别方法中的步骤106基本一致,此处不再进行赘述。 In a specific implementation scenario, step 508 is basically the same as step 106 in the audio stress recognition method in the first embodiment, and details are not repeated here. 
在一个实施例中,如图9所示,提出了一种音频重音识别装置,该装置包括:In one embodiment, as shown in FIG. 9, an audio stress recognition device is proposed, the device includes:
能量变化曲线获取模块902,用于获取原始音频信号;获取目标高斯窗函数,根据目标高斯窗函数对原始音频信号进行处理,得到原始音频信号对应的能量变化曲线;The energy change curve obtaining module 902 is used to obtain the original audio signal; obtain the target Gaussian window function, process the original audio signal according to the target Gaussian window function, and obtain the energy change curve corresponding to the original audio signal;
重音识别模块904,用于获取目标滑动窗,根据目标滑动窗确定能量变化曲线中的重音时刻,将在重音时刻的原始音频信号标示为音频重音。The accent recognition module 904 is configured to acquire the target sliding window, determine the accent moment in the energy change curve according to the target sliding window, and mark the original audio signal at the accent moment as audio accent.
上述音频重音识别装置,基于高斯窗函数对原始音频信号进行处理,充分考虑音频信号在时间上的相关性,相较于传统算法,后续重音识别的结果更为准确。进一步的,还基于滑动窗动态识别局部能量变化的最强烈点,并将其标记为重音时刻从而识别出音频重音,本申请排除了音频局部强度波动过大对整体音频识别造成的影响,因此也更具科学性及实用性。The above audio stress recognition device processes the original audio signal based on a Gaussian window function, and fully considers the temporal correlation of the audio signal. Compared with the traditional algorithm, the result of subsequent stress recognition is more accurate. Further, the most intense point of local energy change is also dynamically identified based on the sliding window, and it is marked as the stress moment to identify the audio stress. More scientific and practical.
在一个实施例中,能量变化曲线获取模块902,还具体用于:根据目标高斯函数对原始音频信号进行加权计算,得到原始音频信号对应的能量曲线;对能量曲线进行数值转换处理,得到原始音频信号对应的能量变化曲线。In one embodiment, the energy change curve acquisition module 902 is further specifically configured to: perform weighted calculation on the original audio signal according to the target Gaussian function to obtain an energy curve corresponding to the original audio signal; perform numerical conversion processing on the energy curve to obtain the original audio signal The energy curve corresponding to the signal.
在一个实施例中,能量变化曲线获取模块902,还具体用于:根据目标高斯窗函数确定原始音频信号在目标时刻的截断音频信号;其中,目标时刻为原始音频信号中的任意一个时刻;将截断音频信号与目标高斯窗函数进行加权计算,获取原始音频信号在目标时刻的目标能量值,根据在每一目标时刻的目标能量值得到原始音频信号对应的能量曲线。In one embodiment, the energy change curve acquisition module 902 is further specifically configured to: determine the truncated audio signal of the original audio signal at the target moment according to the target Gaussian window function; wherein, the target moment is any moment in the original audio signal; The truncated audio signal is weighted with the target Gaussian window function to obtain the target energy value of the original audio signal at the target time, and the energy curve corresponding to the original audio signal is obtained according to the target energy value at each target time.
在一个实施例中,能量变化曲线获取模块902,还具体用于:以目标时刻为目标高斯窗函数对应的高斯窗口的中间时刻,在原始音频信号上添加高斯窗口;将高斯窗口内的音频信号作为在目标时刻的截断音频信号。In one embodiment, the energy change curve acquisition module 902 is further specifically configured to: take the target moment as the middle moment of the Gaussian window corresponding to the target Gaussian window function, add a Gaussian window to the original audio signal; as the truncated audio signal at the target time.
在一个实施例中,能量变化曲线获取模块902,还具体用于:对能量曲线进行取对数处理,获取原始音频信号对应的对数函数;对对数函数进行二次求导处理,获取原始音频信号对应的能量变化曲线。In one embodiment, the energy change curve obtaining module 902 is further specifically configured to: perform logarithmic processing on the energy curve to obtain a logarithmic function corresponding to the original audio signal; perform secondary derivation processing on the logarithmic function to obtain the original The energy change curve corresponding to the audio signal.
在一个实施例中,重音识别模块904,还具体用于:在能量变化曲线中添加目标滑动窗,获取目标滑动窗内能量变化曲线的能量变化峰值,将能量变化峰值对应的时刻作为重音时刻;其中,目标滑动窗在起始位置的起始点为能量变化曲线的起始点;按照预设步长滑动目标滑动窗,返回执行获取目标滑动窗内能量变化曲线的能量变化峰值,将能量变化峰值对应的时刻作为重音时刻的步骤。In one embodiment, the accent recognition module 904 is also specifically configured to: add a target sliding window to the energy variation curve, obtain the energy variation peak value of the energy variation curve in the target sliding window, and use the moment corresponding to the energy variation peak value as the accent moment; Among them, the starting point of the target sliding window at the starting position is the starting point of the energy change curve; slide the target sliding window according to the preset step size, return to obtain the energy change peak value of the energy change curve in the target sliding window, and set the energy change peak value corresponding to The moment is used as the step of the accent moment.
在一个实施例中,重音识别模块904,还具体用于:判断能量变化峰值是否大于或等于能量变化阈值;若能量变化峰值大于或等于能量变化阈值,则继续执行将能量变化峰值对应的时刻作为重音时刻的步骤;若能量变化峰值小于能量变化阈值,则继续执行按照预设步长滑动目标滑动窗的步骤。In one embodiment, the accent recognition module 904 is further specifically configured to: determine whether the peak value of the energy change is greater than or equal to the energy change threshold; if the peak value of the energy change is greater than or equal to the energy change threshold, continue to execute the time corresponding to the energy change peak as the Steps at the time of stress; if the energy change peak value is less than the energy change threshold, continue to perform the step of sliding the target sliding window according to the preset step size.
图10示出了一个实施例中音频重音识别设备的内部结构图。如图10所示,该音频重音识别设备包括通过系统总线连接的处理器、存储器和网络接口。其中,存储器包括非易失性存储介质和内存储器。该音频重音识别设备的非易失性存储介质存储有操作系统,还可存储有计算机程序,该计算机程序被处理器执行时,可使得处理器实现音频重音识别方法。该内存储器中也可储存有计算机程序,该计算机程序被处理器执行时,可使得处理器执行音频重音识别方法。本领域技术人员可以理解,图10中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的音频重音识别设备的限定,具体的音频重音识别设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。FIG. 10 shows an internal structure diagram of an audio accent recognition device in one embodiment. As shown in FIG. 10, the audio accent recognition device includes a processor, a memory and a network interface connected through a system bus. Wherein, the memory includes a non-volatile storage medium and an internal memory. The non-volatile storage medium of the audio stress recognition device stores an operating system, and also stores a computer program, which, when executed by the processor, enables the processor to implement the audio stress recognition method. A computer program may also be stored in the internal memory, and when executed by the processor, the computer program can cause the processor to execute the audio accent recognition method. Those skilled in the art can understand that the structure shown in FIG. 10 is only a block diagram of a partial structure related to the solution of the present application, and does not constitute a limitation on the audio stress recognition device to which the solution of the present application is applied. The accent recognition device may include more or fewer components than shown in the figures, or combine certain components, or have a different arrangement of components.
一种音频重音识别设备,包括存储器、处理器以及存储在该存储器中并可在该处理器上执行的计算机程序,该处理器执行该计算机程序时实现如下步骤:获取原始音频信号;获取目标高斯窗函数,根据目标高斯窗函数对原始音频信号进行处理,得到原始音频信号对应的能量变化曲线;获取目标滑动窗,根据目标滑动窗确定能量变化曲线中的重音时刻,将在重音时刻的原始音频信号标示为音频重音。An audio stress recognition device, comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implements the following steps when executing the computer program: acquiring an original audio signal; acquiring a target Gaussian Window function, process the original audio signal according to the target Gaussian window function, and obtain the energy change curve corresponding to the original audio signal; obtain the target sliding window, determine the stress time in the energy change curve according to the target sliding window, and convert the original audio at the stress time. Signals are marked as audio accents.
在一个实施例中,根据目标高斯窗函数对原始音频信号进行处理,得到原始音频信号对应的能量变化曲线,包括:根据目标高斯函数对原始音频信号进行加权计算,得到原始音频信号对应的能量曲线;对能量曲线进行数值转换处理,得到原始音频信号对应的能量变化曲线。In one embodiment, processing the original audio signal according to the target Gaussian window function to obtain an energy change curve corresponding to the original audio signal includes: performing weighted calculation on the original audio signal according to the target Gaussian function to obtain an energy curve corresponding to the original audio signal ; Perform numerical conversion processing on the energy curve to obtain the energy change curve corresponding to the original audio signal.
在一个实施例中,根据目标高斯函数对原始音频信号进行加权计算,得到原始音频信号对应的能量曲线,包括:根据目标高斯窗函数确定原始音频信号在目标时刻的截断音频信号;其中,目标时刻为原始音频信号中的任意一个时刻;将截断音频信号与目标高斯窗函数进行加权计算,获取原始音频信号在目标时刻的目标能量值,根据在每一目标时刻的目标能量值得到原始音频信号对应的能量曲线。In one embodiment, weighted calculation is performed on the original audio signal according to the target Gaussian function to obtain an energy curve corresponding to the original audio signal, including: determining the truncated audio signal of the original audio signal at the target time according to the target Gaussian window function; wherein, the target time is any moment in the original audio signal; the truncated audio signal and the target Gaussian window function are weighted to obtain the target energy value of the original audio signal at the target moment, and the corresponding original audio signal is obtained according to the target energy value at each target moment. energy curve.
在一个实施例中,根据目标高斯窗函数确定原始音频信号在目标时刻的截断音频信号,包括:以目标时刻为目标高斯窗函数对应的高斯窗口的中间时刻,在原始音频信号上添加高斯窗口;将高斯窗口内的音频信号作为在目标时刻的截断音频信号。In one embodiment, determine the truncated audio signal of the original audio signal at the target moment according to the target Gaussian window function, comprising: taking the target moment as the middle moment of the Gaussian window corresponding to the target Gaussian window function, adding a Gaussian window on the original audio signal; Take the audio signal within the Gaussian window as the truncated audio signal at the target time.
在一个实施例中,对能量曲线进行数值转换处理,得到原始音频信号对应的能量变化曲线,包括:对能量曲线进行取对数处理,获取原始音频信号对应的对数函数;对对数函数进行二次求导处理,获取原始音频信号对应的能量变化曲线。In one embodiment, performing numerical conversion processing on the energy curve to obtain an energy change curve corresponding to the original audio signal, including: performing logarithmic processing on the energy curve to obtain a logarithmic function corresponding to the original audio signal; The secondary derivation process is used to obtain the energy change curve corresponding to the original audio signal.
在一个实施例中,根据目标滑动窗确定能量变化曲线中的重音时刻,包括:在能量变化曲线中添加目标滑动窗,获取目标滑动窗内能量变化曲线的能量变化峰值,将能量变化峰值对应的时刻作为重音时刻;其中,目标滑动窗在起始位置的起始点为能量变化曲线的起始点;按照预设步长滑动目标滑动窗,返回执行获取目标滑动窗内能量变化曲线的能量变化峰值,将能量变化峰值对应的时刻作为重音时刻的步骤。In one embodiment, determining the stress moment in the energy change curve according to the target sliding window includes: adding a target sliding window to the energy change curve, obtaining the energy change peak value of the energy change curve in the target sliding window, and converting the energy change peak value corresponding to the energy change peak value time as the accent time; wherein, the starting point of the target sliding window at the starting position is the starting point of the energy change curve; sliding the target sliding window according to the preset step size, and returning to obtain the energy change peak value of the energy change curve in the target sliding window, The step of taking the time corresponding to the energy change peak as the accent time.
在一个实施例中,在将所有能量变化峰值对应的时刻作为重音时刻之前还包括:判断能量变化峰值是否大于或等于能量变化阈值;若能量变化峰值大于或等于能量变化阈值,则继续执行将能量变化峰值对应的时刻作为重音时刻的步骤;若能量变化峰值小于能量变化阈值,则继续执行按照预设步长滑动目标滑动窗的步骤。In one embodiment, before taking the time corresponding to all the energy change peaks as the stress time, the method further includes: judging whether the energy change peak value is greater than or equal to the energy change threshold; if the energy change peak value is greater than or equal to the energy change threshold, continue to execute the The time corresponding to the change peak is regarded as the step of the stress time; if the energy change peak is smaller than the energy change threshold, the step of sliding the target sliding window according to the preset step size is continued.
一种计算机可读存储介质,该计算机可读存储介质存储有计算机程序,该计算机程序被处理器执行时实现如下步骤:获取原始音频信号;获取目标高斯窗函数,根据目标高斯窗函数对原始音频信号进行处理,得到原始音频信号对应的能量变化曲线;获取目标滑动窗,根据目标滑动窗确定能量变化曲线中的重音时刻,将在重音时刻的原始音频信号标示为音频重音。A computer-readable storage medium, which stores a computer program, and when the computer program is executed by a processor, the following steps are implemented: obtaining an original audio signal; The signal is processed to obtain the energy change curve corresponding to the original audio signal; the target sliding window is obtained, the stress time in the energy change curve is determined according to the target sliding window, and the original audio signal at the stress time is marked as audio stress.
在一个实施例中,根据目标高斯窗函数对原始音频信号进行处理,得到原始音频信号对应的能量变化曲线,包括:根据目标高斯函数对原始音频信号进行加权计算,得到原始音频信号对应的能量曲线;对能量曲线进行数值转换处理,得到原始音频信号对应的能量变化曲线。In one embodiment, processing the original audio signal according to the target Gaussian window function to obtain an energy change curve corresponding to the original audio signal includes: performing weighted calculation on the original audio signal according to the target Gaussian function to obtain an energy curve corresponding to the original audio signal ; Perform numerical conversion processing on the energy curve to obtain the energy change curve corresponding to the original audio signal.
在一个实施例中,根据目标高斯函数对原始音频信号进行加权计算,得到原始音频信号对应的能量曲线,包括:根据目标高斯窗函数确定原始音频信号在目标时刻的截断音频信号;其中,目标时刻为原始音频信号中的任意一个时刻;将截断音频信号与目标高斯窗函数进行加权计算,获取原始音频信号在目标时刻的目标能量值,根据在每一目标时刻的目标能量值得到原始音频信号对应的能量曲线。In one embodiment, weighted calculation is performed on the original audio signal according to the target Gaussian function to obtain an energy curve corresponding to the original audio signal, including: determining the truncated audio signal of the original audio signal at the target time according to the target Gaussian window function; wherein, the target time is any moment in the original audio signal; the truncated audio signal and the target Gaussian window function are weighted to obtain the target energy value of the original audio signal at the target moment, and the corresponding original audio signal is obtained according to the target energy value at each target moment. energy curve.
在一个实施例中,根据目标高斯窗函数确定原始音频信号在目标时刻的截断音频信号,包括:以目标时刻为目标高斯窗函数对应的高斯窗口的中间时刻,在原始音频信号上添加高斯窗口;将高斯窗口内的音频信号作为在目标时刻的截断音频信号。In one embodiment, determine the truncated audio signal of the original audio signal at the target moment according to the target Gaussian window function, comprising: taking the target moment as the middle moment of the Gaussian window corresponding to the target Gaussian window function, adding a Gaussian window on the original audio signal; Take the audio signal within the Gaussian window as the truncated audio signal at the target time.
在一个实施例中,对能量曲线进行数值转换处理,得到原始音频信号对应的能量变化曲线,包括:对能量曲线进行取对数处理,获取原始音频信号对应的对数函数;对对数函数进行二次求导处理,获取原始音频信号对应的能量变化曲线。In one embodiment, performing numerical conversion processing on the energy curve to obtain an energy change curve corresponding to the original audio signal, including: performing logarithmic processing on the energy curve to obtain a logarithmic function corresponding to the original audio signal; The secondary derivation process is used to obtain the energy change curve corresponding to the original audio signal.
在一个实施例中,根据目标滑动窗确定能量变化曲线中的重音时刻,包括:在能量变化曲线中添加目标滑动窗,获取目标滑动窗内能量变化曲线的能量变化峰值,将能量变化峰值对应的时刻作为重音时刻;其中,目标滑动窗在起始位置的起始点为能量变化曲线的起始点;按照预设步长滑动目标滑动窗,返回执行获取目标滑动窗内能量变化曲线的能量变化峰值,将能量变化峰值对应的时刻作为重音时刻的步骤。In one embodiment, determining the stress moment in the energy change curve according to the target sliding window includes: adding a target sliding window to the energy change curve, obtaining the energy change peak value of the energy change curve in the target sliding window, and converting the energy change peak value corresponding to the energy change peak value time as the accent time; wherein, the starting point of the target sliding window at the starting position is the starting point of the energy change curve; sliding the target sliding window according to the preset step size, and returning to obtain the energy change peak value of the energy change curve in the target sliding window, The step of taking the time corresponding to the energy change peak as the accent time.
在一个实施例中,在将所有能量变化峰值对应的时刻作为重音时刻之前还包括:判断能量变化峰值是否大于或等于能量变化阈值;若能量变化峰值大于或等于能量变化阈值,则继续执行将能量变化峰值对应的时刻作为重音时刻的步骤;若能量变化峰值小于能量变化阈值,则继续执行按照预设步长滑动目标滑动窗的步骤。In one embodiment, before taking the time corresponding to all the energy change peaks as the stress time, the method further includes: judging whether the energy change peak value is greater than or equal to the energy change threshold; if the energy change peak value is greater than or equal to the energy change threshold, continue to execute the The time corresponding to the change peak is regarded as the step of the stress time; if the energy change peak is smaller than the energy change threshold, the step of sliding the target sliding window according to the preset step size is continued.
需要说明的是,上述音频重音识别方法、装置、设备及计算机可读存储介质属于一个总的发明构思,音频重音识别方法、装置、设备及计算机可读存储介质实施例中的内容可相互适用。It should be noted that the above-mentioned audio stress recognition method, apparatus, device and computer-readable storage medium belong to a general inventive concept, and the contents in the audio stress recognition method, apparatus, device and computer-readable storage medium embodiments are applicable to each other.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,该程序可存储于一非易失性计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink) DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented by instructing relevant hardware through a computer program, and the program can be stored in a non-volatile computer-readable storage medium, When the program is executed, it may include the flow of the embodiments of the above-mentioned methods. Wherein, any reference to memory, storage, database or other medium used in the various embodiments provided in this application may include non-volatile and/or volatile memory. Nonvolatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in various forms such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Road (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM) and so on.
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。The technical features of the above embodiments can be combined arbitrarily. In order to make the description simple, all possible combinations of the technical features in the above embodiments are not described. However, as long as there is no contradiction in the combination of these technical features It is considered to be the range described in this specification.
以上实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对本申请专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。The above examples only represent several embodiments of the present application, and the descriptions thereof are relatively specific and detailed, but should not be construed as a limitation on the scope of the patent of the present application. It should be pointed out that for those skilled in the art, without departing from the concept of the present application, several modifications and improvements can be made, which all belong to the protection scope of the present application. Therefore, the scope of protection of the patent of the present application shall be subject to the appended claims.

Claims (10)

  1. 一种音频重音识别的方法,其特征在于,所述方法包括:A method for audio accent recognition, characterized in that the method comprises:
    获取原始音频信号;Get the original audio signal;
    获取目标高斯窗函数,根据所述目标高斯窗函数对所述原始音频信号进行处理,得到所述原始音频信号对应的能量变化曲线;Obtain a target Gaussian window function, process the original audio signal according to the target Gaussian window function, and obtain an energy change curve corresponding to the original audio signal;
    获取目标滑动窗,根据所述目标滑动窗确定所述能量变化曲线中的重音时刻,将在所述重音时刻的所述原始音频信号标示为音频重音。Acquire a target sliding window, determine the accent moment in the energy change curve according to the target sliding window, and mark the original audio signal at the accent moment as audio accent.
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述目标高斯窗函数对所述原始音频信号进行处理,得到所述原始音频信号对应的能量变化曲线,包括:The method according to claim 1, wherein the processing of the original audio signal according to the target Gaussian window function to obtain an energy change curve corresponding to the original audio signal, comprising:
    根据所述目标高斯函数对所述原始音频信号进行加权计算,得到所述原始音频信号对应的能量曲线;Perform weighted calculation on the original audio signal according to the target Gaussian function to obtain an energy curve corresponding to the original audio signal;
    对所述能量曲线进行数值转换处理,得到所述原始音频信号对应的能量变化曲线。Perform numerical conversion processing on the energy curve to obtain an energy change curve corresponding to the original audio signal.
  3. 根据权利要求2所述的方法,其特征在于,所述根据所述目标高斯函数对所述原始音频信号进行加权计算,得到所述原始音频信号对应的能量曲线,包括:The method according to claim 2, wherein the weighted calculation of the original audio signal according to the target Gaussian function to obtain an energy curve corresponding to the original audio signal, comprising:
    根据所述目标高斯窗函数确定所述原始音频信号在目标时刻的截断音频信号;其中,所述目标时刻为所述原始音频信号中的任意一个时刻;Determine the truncated audio signal of the original audio signal at the target moment according to the target Gaussian window function; wherein, the target moment is any moment in the original audio signal;
    将所述截断音频信号与所述目标高斯窗函数进行加权计算,获取所述原始音频信号在所述目标时刻的目标能量值,根据在每一目标时刻的所述目标能量值得到所述原始音频信号对应的能量曲线。Carry out weighted calculation with the truncated audio signal and the target Gaussian window function, obtain the target energy value of the original audio signal at the target time, and obtain the original audio according to the target energy value at each target time. The corresponding energy curve of the signal.
  4. 根据权利要求3所述的方法,其特征在于,所述根据所述目标高斯窗函数确定所述原始音频信号在目标时刻的截断音频信号,包括:The method according to claim 3, wherein, determining the truncated audio signal of the original audio signal at a target moment according to the target Gaussian window function, comprising:
    以所述目标时刻为所述目标高斯窗函数对应的高斯窗口的中间时刻,在所述原始音频信号上添加高斯窗口;Taking the target moment as the middle moment of the Gaussian window corresponding to the target Gaussian window function, a Gaussian window is added to the original audio signal;
    将所述高斯窗口内的音频信号作为在所述目标时刻的截断音频信号。Taking the audio signal in the Gaussian window as the truncated audio signal at the target time.
  5. 根据权利要求2所述的方法,其特征在于,所述对所述能量曲线进行数值转换处理,得到所述原始音频信号对应的能量变化曲线,包括:The method according to claim 2, wherein, performing numerical conversion processing on the energy curve to obtain an energy change curve corresponding to the original audio signal, comprising:
    对所述能量曲线进行取对数处理,获取所述原始音频信号对应的对数函数;Perform logarithmic processing on the energy curve to obtain a logarithmic function corresponding to the original audio signal;
    对所述对数函数进行二次求导处理,获取所述原始音频信号对应的能量变化曲线。Performing a quadratic derivation process on the logarithmic function to obtain an energy change curve corresponding to the original audio signal.
  6. 根据权利要求1所述的方法,其特征在于,所述根据所述目标滑动窗确定所述能量变化曲线中的重音时刻,包括:The method according to claim 1, wherein the determining the stress time in the energy change curve according to the target sliding window comprises:
    在所述能量变化曲线中添加所述目标滑动窗,获取所述目标滑动窗内所述能量变化曲线的能量变化峰值,将所述能量变化峰值对应的时刻作为重音时刻;其中,所述目标滑动窗在起始位置的起始点为所述能量变化曲线的起始点;The target sliding window is added to the energy change curve, the energy change peak value of the energy change curve in the target sliding window is obtained, and the time corresponding to the energy change peak value is taken as the accent time; wherein, the target slide The starting point of the window at the starting position is the starting point of the energy change curve;
    按照预设步长滑动所述目标滑动窗,返回执行所述获取所述目标滑动窗内所述能量变化曲线的能量变化峰值,将所述能量变化峰值对应的时刻作为重音时刻的步骤。Sliding the target sliding window according to a preset step size, returning to the step of obtaining the energy change peak value of the energy change curve in the target sliding window, and taking the time corresponding to the energy change peak value as the accent time.
  7. 根据权利要求6所述的方法,其特征在于,在所述将所有能量变化峰值对应的时刻作为重音时刻之前还包括:The method according to claim 6, further comprising:
    判断所述能量变化峰值是否大于或等于能量变化阈值;Determine whether the energy change peak value is greater than or equal to the energy change threshold;
    若所述能量变化峰值大于或等于能量变化阈值,则继续执行所述将所述能量变化峰值对应的时刻作为重音时刻的步骤;If the energy change peak value is greater than or equal to the energy change threshold value, then continue to perform the step of using the time corresponding to the energy change peak value as the accent time;
    若所述能量变化峰值小于能量变化阈值,则继续执行所述按照预设步长滑动所述目标滑动窗的步骤。If the energy change peak value is smaller than the energy change threshold, the step of sliding the target sliding window according to a preset step size is continued.
  8. 一种音频重音识别装置,其特征在于,所述装置包括:An audio stress recognition device, characterized in that the device comprises:
    能量变化曲线获取模块,用于获取原始音频信号;获取目标高斯窗函数,根据所述目标高斯窗函数对所述原始音频信号进行处理,得到所述原始音频信号对应的能量变化曲线;an energy variation curve acquisition module, used to acquire an original audio signal; acquire a target Gaussian window function, process the original audio signal according to the target Gaussian window function, and obtain an energy variation curve corresponding to the original audio signal;
    重音识别模块,用于获取目标滑动窗,根据所述目标滑动窗确定所述能量变化曲线中的重音时刻,将在所述重音时刻的所述原始音频信号标示为音频重音。The accent recognition module is configured to acquire a target sliding window, determine the accent moment in the energy change curve according to the target sliding window, and mark the original audio signal at the accent moment as audio accent.
  9. 一种计算机可读存储介质,存储有计算机程序,所述计算机程序被处理器执行时,使得所述处理器执行如权利要求1至7中任一项所述方法的步骤。A computer-readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of the method according to any one of claims 1 to 7.
  10. 一种音频重音识别设备,包括存储器和处理器,所述存储器存储有计算机程序,所述计算机程序被所述处理器执行时,使得所述处理器执行如权利要求1至7中任一项所述方法的步骤。An audio accent recognition device, comprising a memory and a processor, wherein the memory stores a computer program, and when the computer program is executed by the processor, the processor is made to perform the process as claimed in any one of claims 1 to 7. steps of the method described.
PCT/CN2020/127679 2020-10-28 2020-11-10 Audio stress recognition method, apparatus and device, and medium WO2022088242A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011172637.0A CN112259088B (en) 2020-10-28 2020-10-28 Audio accent recognition method, device, equipment and medium
CN202011172637.0 2020-10-28

Publications (1)

Publication Number Publication Date
WO2022088242A1 true WO2022088242A1 (en) 2022-05-05

Family

ID=74261119

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/127679 WO2022088242A1 (en) 2020-10-28 2020-11-10 Audio stress recognition method, apparatus and device, and medium

Country Status (2)

Country Link
CN (1) CN112259088B (en)
WO (1) WO2022088242A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117373484A (en) * 2023-10-08 2024-01-09 国网湖北省电力有限公司超高压公司 Switch cabinet voiceprint fault detection method based on feature transformation

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150045920A1 (en) * 2013-08-08 2015-02-12 Sony Corporation Audio signal processing apparatus and method, and monitoring system
CN109584902A (en) * 2018-11-30 2019-04-05 广州市百果园信息技术有限公司 A kind of music rhythm determines method, apparatus, equipment and storage medium
CN109841232A (en) * 2018-12-30 2019-06-04 瑞声科技(新加坡)有限公司 The extracting method of note locations and device and storage medium in music signal
CN111739542A (en) * 2020-05-13 2020-10-02 深圳市微纳感知计算技术有限公司 Method, device and equipment for detecting characteristic sound

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2962299B1 (en) * 2013-02-28 2018-10-31 Nokia Technologies OY Audio signal analysis
CN104217729A (en) * 2013-05-31 2014-12-17 杜比实验室特许公司 Audio processing method, audio processing device and training method
US10789967B2 (en) * 2016-05-09 2020-09-29 Harman International Industries, Incorporated Noise detection and noise reduction
CN108335703B (en) * 2018-03-28 2020-10-09 腾讯音乐娱乐科技(深圳)有限公司 Method and apparatus for determining accent position of audio data

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150045920A1 (en) * 2013-08-08 2015-02-12 Sony Corporation Audio signal processing apparatus and method, and monitoring system
CN109584902A (en) * 2018-11-30 2019-04-05 广州市百果园信息技术有限公司 A kind of music rhythm determines method, apparatus, equipment and storage medium
CN109841232A (en) * 2018-12-30 2019-06-04 瑞声科技(新加坡)有限公司 The extracting method of note locations and device and storage medium in music signal
CN111739542A (en) * 2020-05-13 2020-10-02 深圳市微纳感知计算技术有限公司 Method, device and equipment for detecting characteristic sound

Also Published As

Publication number Publication date
CN112259088A (en) 2021-01-22
CN112259088B (en) 2024-05-17

Similar Documents

Publication Publication Date Title
Zhang12 et al. The effect of silence and dual-band fusion in anti-spoofing system
KR101942521B1 (en) Speech endpointing
WO2020177380A1 (en) Voiceprint detection method, apparatus and device based on short text, and storage medium
WO2019227590A1 (en) Voice enhancement method, apparatus, computer device, and storage medium
WO2021128256A1 (en) Voice conversion method, apparatus and device, and storage medium
Alam et al. Multitaper MFCC and PLP features for speaker verification using i-vectors
CN108022587B (en) Speech recognition method, speech recognition device, computer equipment and storage medium
CN108198547A (en) Sound end detecting method, device, computer equipment and storage medium
CN110047469A (en) Voice data Emotion tagging method, apparatus, computer equipment and storage medium
WO2023001128A1 (en) Audio data processing method, apparatus and device
CN111226275A (en) Voice synthesis method, device, terminal and medium based on rhythm characteristic prediction
CN108922561A (en) Speech differentiation method, apparatus, computer equipment and storage medium
CN110176243B (en) Speech enhancement method, model training method, device and computer equipment
WO2022088242A1 (en) Audio stress recognition method, apparatus and device, and medium
CN107393549A (en) Delay time estimation method and device
CN118098236B (en) Method, device, equipment and medium for determining left and right boundaries of voice recognition window
CN110648655A (en) Voice recognition method, device, system and storage medium
CN113012680B (en) Speech technology synthesis method and device for speech robot
CN105895084B (en) A kind of signal gain method and apparatus applied to speech recognition
WO2020015546A1 (en) Far-field speech recognition method, speech recognition model training method, and server
Al-Karawi Real-time adaptive training for forensic speaker verification in reverberation conditions
RU2317595C1 (en) Method for detecting pauses in speech signals and device for its realization
US11863946B2 (en) Method, apparatus and computer program for processing audio signals
WO2022078164A1 (en) Sound quality evaluation method and apparatus, and device
CN114157254A (en) Audio processing method and audio processing device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20959402

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20959402

Country of ref document: EP

Kind code of ref document: A1