WO2022088242A1

WO2022088242A1 - Audio stress recognition method, apparatus and device, and medium

Info

Publication number: WO2022088242A1
Application number: PCT/CN2020/127679
Authority: WO
Inventors: 郑亚军
Original assignee: 瑞声声学科技(深圳)有限公司; 瑞声光电科技(常州)有限公司
Priority date: 2020-10-28
Filing date: 2020-11-10
Publication date: 2022-05-05
Also published as: CN112259088A; CN112259088B

Abstract

Disclosed is an audio stress recognition method, the method comprising: acquiring an original audio signal; acquiring a target Gaussian window function, and processing the original audio signal according to the target Gaussian window function, so as to obtain an energy change curve corresponding to the original audio signal; and acquiring a target sliding window, determining a stress moment in the energy change curve according to the target sliding window, and marking the original audio signal at the stress moment as an audio stress. In the present application, the temporal correlation of an audio signal is taken into full consideration. Compared with a traditional algorithm, the result of subsequent stress recognition is more accurate. Furthermore, by means of the present application, the influence of an excessive local intensity fluctuation of audio on the overall audio recognition is eliminated, such that the present application is more scientific and practical. Also provided are an audio stress recognition apparatus and device, and a storage medium.

Description

Audio accent recognition method, apparatus, device and medium

technical field

The present application relates to the technical field of audio processing, and in particular, to an audio accent recognition method, apparatus, device and medium.

Background technique

Whether it is daily speech communication, music video, or voice calls, the sound can be saved as one or more audio signals through recording. Audio signal, as data that can be saved, is an important medium for information dissemination. Accent is the sound with greater intensity in music, the most prominent in the impact of sound, and is the main factor that constitutes the rhythm of music. By identifying the accent in music, the speed of the music rhythm can be judged.

technical problem

In addition, stress often contains certain subjective emotions or key information. By identifying the stress in the audio, the subjective emotions and key information in the audio can be distinguished. Therefore, it can be said that analyzing and identifying the audio stress can more fully understand the meaning to be expressed by the audio signal.

technical solutions

Based on this, it is necessary to provide an audio accent recognition method, apparatus, device and medium that can be accurately identified in order to address the above problems.

A method for audio accent recognition, the method comprising:

Get the original audio signal;

Obtain a target Gaussian window function, process the original audio signal according to the target Gaussian window function, and obtain an energy change curve corresponding to the original audio signal;

Acquire a target sliding window, determine the accent moment in the energy change curve according to the target sliding window, and mark the original audio signal at the accent moment as audio accent.

In one embodiment, the processing of the original audio signal according to the target Gaussian window function to obtain an energy change curve corresponding to the original audio signal, including:

Perform weighted calculation on the original audio signal according to the target Gaussian function to obtain an energy curve corresponding to the original audio signal;

Perform numerical conversion processing on the energy curve to obtain an energy change curve corresponding to the original audio signal.

In one embodiment, the weighted calculation is performed on the original audio signal according to the target Gaussian function to obtain an energy curve corresponding to the original audio signal, including:

Determine the truncated audio signal of the original audio signal at the target moment according to the target Gaussian window function; wherein, the target moment is any moment in the original audio signal;

Carry out weighted calculation with the truncated audio signal and the target Gaussian window function, obtain the target energy value of the original audio signal at the target moment, and obtain the original audio according to the target energy value at each target moment. The corresponding energy curve of the signal.

In one embodiment, the truncated audio signal of the original audio signal at the target moment is determined according to the target Gaussian window function, including:

Taking the target moment as the middle moment of the Gaussian window corresponding to the target Gaussian window function, a Gaussian window is added to the original audio signal;

Taking the audio signal in the Gaussian window as the truncated audio signal at the target time.

In one of the embodiments, performing numerical conversion processing on the energy curve to obtain an energy change curve corresponding to the original audio signal, including:

Perform logarithmic processing on the energy curve to obtain a logarithmic function corresponding to the original audio signal;

A second derivative process is performed on the logarithmic function to obtain an energy change curve corresponding to the original audio signal.

In one embodiment, the determining of the stress moment in the energy change curve according to the target sliding window includes:

The target sliding window is added to the energy change curve, the energy change peak value of the energy change curve in the target sliding window is obtained, and the time corresponding to the energy change peak value is taken as the accent time; wherein, the target slide The starting point of the window at the starting position is the starting point of the energy change curve;

Sliding the target sliding window according to a preset step size, returning to the step of obtaining the energy change peak value of the energy change curve in the target sliding window, and taking the time corresponding to the energy change peak value as the accent time.

In one of the embodiments, before the time corresponding to all the energy change peaks is regarded as the accent time, it further includes:

Determine whether the energy change peak value is greater than or equal to the energy change threshold;

If the energy change peak value is greater than or equal to the energy change threshold value, then continue to perform the step of using the time corresponding to the energy change peak value as the accent time;

If the energy change peak value is less than the energy change threshold, the step of sliding the target sliding window according to a preset step size is continued.

An audio stress recognition device, the device comprises:

an energy variation curve acquisition module, used to acquire an original audio signal; acquire a target Gaussian window function, process the original audio signal according to the target Gaussian window function, and obtain an energy variation curve corresponding to the original audio signal;

The accent recognition module is configured to obtain a target sliding window, determine the accent moment in the energy change curve according to the target sliding window, and mark the original audio signal at the accent moment as audio accent.

A computer-readable storage medium storing a computer program, when the computer program is executed by a processor, the processor causes the processor to perform the following steps:

Get the original audio signal;

An audio accent recognition device, comprising a memory and a processor, wherein the memory stores a computer program, and when the computer program is executed by the processor, the processor is caused to perform the following steps:

Get the original audio signal;

beneficial effect

The present application provides an audio stress recognition method, device, device and medium. The original audio signal is processed based on a Gaussian window function, and the temporal correlation of the audio signal is fully considered. Compared with the traditional algorithm, the result of subsequent stress recognition is more accurate. precise. Further, the most intense point of local energy change is also dynamically identified based on the sliding window, and it is marked as the stress moment to identify the audio stress. More scientific and practical.

Description of drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the following briefly introduces the accompanying drawings required for the description of the embodiments or the prior art. Obviously, the drawings in the following description are only These are some embodiments of the present application. For those of ordinary skill in the art, other drawings can also be obtained based on these drawings without any creative effort.

in:

Fig. 1 is the schematic flow chart of the audio stress recognition method in the first embodiment;

Fig. 2 is the schematic diagram of target Gaussian window function in one embodiment;

Fig. 3 is a schematic diagram of determining an accent moment according to a target sliding window in one embodiment;

4 is a schematic diagram of all stress moments determined in one embodiment;

5 is a schematic flowchart of the audio stress recognition method in the second embodiment;

6 is a schematic diagram of an energy curve in one embodiment;

7 is a schematic diagram of weighting processing on an original audio signal in one embodiment;

8 is a schematic diagram of an energy change curve in one embodiment;

9 is a schematic structural diagram of an audio accent recognition device in one embodiment;

FIG. 10 is a structural block diagram of an audio accent recognition device in one embodiment.

Embodiments of the present invention

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments are only a part of the embodiments of the present application, but not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative work fall within the protection scope of the present application.

As shown in FIG. 1, FIG. 1 is a schematic flowchart of the audio stress recognition method in the first embodiment. The steps provided by the audio stress recognition method in the first embodiment include:

Step 102, acquiring the original audio signal.

Wherein, the original audio signal is the audio signal of the accent to be identified. The original audio signal may be an audio signal pre-recorded and stored in a local storage medium, or may be a piece of audio signal collected in real time, which is not specifically limited here.

Step 104: Obtain a target Gaussian window function, process the original audio signal according to the target Gaussian window function, and obtain an energy change curve corresponding to the original audio signal.

Among them, the target Gaussian window function is used to weight the original audio signal. The energy change curve is a curve that reflects the change of the energy value of the original audio signal at different target times. The characteristic of the stress in the energy change curve is that there is a large energy change value, and based on this characteristic, the original audio signal can be identified in the subsequent steps. The audio accent of the audio signal.

In this embodiment, the expression of the target Gaussian window function is:

Gw(n)=e-n^2/(2•a^2)

where n is a time variable, n∈L, L is a parameter characterizing the width of the Gaussian window function, and a is a parameter characterizing the shape of the Gaussian window function. Exemplarily, referring to Fig. 2, Fig. 2 is a schematic diagram of a target Gaussian window function, and the parameter a of the target Gaussian window function is 0.003, the width of the Gaussian window L=[-0.01,0.01] (unit: seconds). The setting of the parameters of the Gaussian window function in this embodiment has a certain influence on the energy calculation, but the automatic identification method does not emphasize their optimization of the algorithm effect, and the parameters of the Gaussian window function are not further limited.

Further, weighted calculation is performed on the original audio signal based on the target Gaussian window function to obtain an energy curve corresponding to the original audio signal. And derivation processing is performed on the energy curve to obtain the energy change curve corresponding to the original audio signal. The specific implementation method will be described in detail later, and will not be repeated here.

Step 106: Obtain the target sliding window, determine the accent moment in the energy change curve according to the target sliding window, and mark the original audio signal at the accent moment as audio accent.

Among them, the target sliding window is a window without longitudinal boundary, and the target sliding window is used to provide a dynamic judgment boundary of an energy change curve at a specific time. In this embodiment, the target sliding window slides continuously, and it is necessary to determine the stress moment of the energy change curve in the target sliding window at each specific moment.

In a specific embodiment, referring to FIG. 3 , first, a target sliding window is added to the energy change curve, and the window width of the target sliding window is specifically set to 0.06 seconds. It is worth noting that the sliding window width is selected as 0.06 seconds, which is just an example, and can also be 0.05 seconds, 0.07 seconds, or others. The selection of the window width of the target sliding window is based on the phenomenon that "the accent interval of most music audio is between 0.02 and 1 second". If the width of the sliding window is too large or too small, errors will be introduced. Secondly, the energy change peak value of the energy change curve in the target sliding window is obtained (that is, the maximum value of the energy change value in the target sliding window is determined), and the time corresponding to the energy change peak value is taken as the accent time.

Further, in this embodiment, the target sliding window is continuously sliding, and in order to make the target sliding window traverse the energy change curve, the target sliding window is set at the starting position of the sliding window, and the starting point of the sliding window (the left side of the sliding window) is set. side end point) is consistent with the starting point (t=0) of the energy change curve. Then slide the target sliding window according to the preset step size, and perform the above-mentioned steps of obtaining the energy change peak value of the energy change curve in the target sliding window, and taking the time corresponding to the energy change peak value as the accent time, until the end point of the sliding window (the The right end point) reaches the end point of the energy change curve, thereby stopping the sliding of the target sliding window. Referring to FIG. 4 , FIG. 4 is a schematic diagram of all the stress moments determined in the energy change curve. These stress moments are marked in the original audio signal, thereby obtaining the audio stress in the original audio signal.

In a specific embodiment, since the accent is a sound with relatively high intensity, the accent moment is also determined by combining the energy change threshold. Specifically, it is determined whether the peak value of the energy change at a specific time is greater than or equal to the energy change threshold, and the energy change threshold can be set to different values according to requirements such as identification accuracy, which is not specifically limited here. If the energy change peak value is greater than or equal to the energy change threshold value, the time corresponding to the energy change peak value is regarded as the accent moment; if the energy change peak value is less than the energy change threshold value, the target sliding window will continue to slide according to the preset step size until the next step is found. Accent moment that satisfies the energy change threshold condition.

The above-mentioned audio stress recognition method processes the original audio signal based on a Gaussian window function, and fully considers the temporal correlation of the audio signal. Compared with the traditional algorithm, the result of subsequent stress recognition is more accurate. Further, the most intense point of local energy change is also dynamically identified based on the sliding window, and it is marked as the stress moment to identify the audio stress. More scientific and practical.

As shown in FIG. 5, FIG. 5 is a schematic flowchart of the audio stress recognition method in the second embodiment. The steps provided by the audio stress recognition method in the second embodiment include:

Step 502, acquiring the original audio signal.

In a specific implementation scenario, step 502 is basically the same as step 102 in the audio stress recognition method in the first embodiment, and details are not repeated here.

Step 504: Obtain a target Gaussian window function, perform weighted calculation on the original audio signal according to the target Gaussian function, and obtain an energy curve corresponding to the original audio signal.

The setting of the target Gaussian window function is the same as that in step 104, which is not repeated here. The energy curve is a curve reflecting the energy value of the original audio signal at different target moments.

In a specific embodiment, the step of weighting calculation specifically includes: first, determining the truncated audio signal of the original audio signal at the target moment according to the target Gaussian window function. The target moment is any moment in the original audio signal; the width of the Gaussian window corresponding to the truncated audio signal and the Gaussian window function is the same, and both include the target moment. Second, weighting the truncated audio signal and the target Gaussian window function to obtain the target energy value of the original audio signal at the target moment. The windowing calculation in the time domain is specifically expressed as point multiplication. Correspondingly, the calculation of the target energy value E(t) at the target time t is expressed as:

E(t)=(x(n+t)^2).*Gw(n)

In the formula, n is the time variable of the fixed domain T, and t is the time domain variable of the original audio signal.

Referring to FIG. 6 , when the target energy values of the original audio signal at all target times are obtained, the energy curve corresponding to the original audio signal can be obtained according to these target energy values.

In a specific embodiment, referring to FIG. 7 , taking the target moment as the middle moment of the Gaussian window corresponding to the target Gaussian window function, a Gaussian window is added to the original audio signal; the audio signal in the Gaussian window is used as the truncated audio at the target moment. Signal. That is to say, for an arbitrary target time t in the original audio signal, if the width of the Gaussian window is selected as T = [-0.01, 0.01] seconds, the truncated audio signal of the original audio signal at the target time t is in the time domain [ t-0.01, t+0.01] audio signal.

It is worth noting that when the Gaussian window exceeds the audio length of the original audio signal, there is no need to consider the weighting of the excess. That is, when t takes a small value, the left half of the Gaussian window may exceed the audio length of the original audio signal, and no weighting calculation is required for this excess. Correspondingly, when t takes a larger value, the right half of the Gaussian window may exceed the length of the original audio signal, and no weighting calculation is required for this excess.

Step 506: Perform numerical conversion processing on the energy curve to obtain an energy change curve corresponding to the original audio signal.

In a specific embodiment, the numerical conversion processing specifically includes: first, performing logarithmic processing on the energy curve, so as to obtain a logarithmic function corresponding to the original audio signal. This is because the introduction of the directionality of energy changes will increase the difficulty of identifying accents later, and the logarithmic processing of the energy curve can eliminate the directionality (that is, the positive and negative) of energy changes, thereby reducing the rapid change of energy. The effect of large or rapidly small, which in turn better reflects the rate of energy change. Further, a second derivation process is performed on the logarithmic function, so as to obtain an energy change curve corresponding to the original audio signal. Please refer to FIG. 8 for the energy change curve.

Taking the logarithm of the weighted energy curve and taking the second derivative, the specific calculation method to obtain the energy change characteristic curve P(t) is described as follows:

P(t)=d2(ln(E(t)+1))/dt2

This embodiment proposes a method of taking the logarithm and the quadratic derivation of the energy curve, which can effectively reduce the influence of background noise and fully reflect the energy change characteristics of the energy change curve.

Step 508: Acquire the target sliding window, determine the accent moment in the energy change curve according to the target sliding window, and mark the original audio signal at the accent moment as audio accent.

In a specific implementation scenario, step 508 is basically the same as step 106 in the audio stress recognition method in the first embodiment, and details are not repeated here.

In one embodiment, as shown in FIG. 9, an audio stress recognition device is proposed, the device includes:

The energy change curve obtaining module 902 is used to obtain the original audio signal; obtain the target Gaussian window function, process the original audio signal according to the target Gaussian window function, and obtain the energy change curve corresponding to the original audio signal;

The accent recognition module 904 is configured to acquire the target sliding window, determine the accent moment in the energy change curve according to the target sliding window, and mark the original audio signal at the accent moment as audio accent.

The above audio stress recognition device processes the original audio signal based on a Gaussian window function, and fully considers the temporal correlation of the audio signal. Compared with the traditional algorithm, the result of subsequent stress recognition is more accurate. Further, the most intense point of local energy change is also dynamically identified based on the sliding window, and it is marked as the stress moment to identify the audio stress. More scientific and practical.

In one embodiment, the energy change curve acquisition module 902 is further specifically configured to: perform weighted calculation on the original audio signal according to the target Gaussian function to obtain an energy curve corresponding to the original audio signal; perform numerical conversion processing on the energy curve to obtain the original audio signal The energy curve corresponding to the signal.

In one embodiment, the energy change curve acquisition module 902 is further specifically configured to: determine the truncated audio signal of the original audio signal at the target moment according to the target Gaussian window function; wherein, the target moment is any moment in the original audio signal; The truncated audio signal is weighted with the target Gaussian window function to obtain the target energy value of the original audio signal at the target time, and the energy curve corresponding to the original audio signal is obtained according to the target energy value at each target time.

In one embodiment, the energy change curve acquisition module 902 is further specifically configured to: take the target moment as the middle moment of the Gaussian window corresponding to the target Gaussian window function, add a Gaussian window to the original audio signal; as the truncated audio signal at the target time.

In one embodiment, the energy change curve obtaining module 902 is further specifically configured to: perform logarithmic processing on the energy curve to obtain a logarithmic function corresponding to the original audio signal; perform secondary derivation processing on the logarithmic function to obtain the original The energy change curve corresponding to the audio signal.

In one embodiment, the accent recognition module 904 is also specifically configured to: add a target sliding window to the energy variation curve, obtain the energy variation peak value of the energy variation curve in the target sliding window, and use the moment corresponding to the energy variation peak value as the accent moment; Among them, the starting point of the target sliding window at the starting position is the starting point of the energy change curve; slide the target sliding window according to the preset step size, return to obtain the energy change peak value of the energy change curve in the target sliding window, and set the energy change peak value corresponding to The moment is used as the step of the accent moment.

In one embodiment, the accent recognition module 904 is further specifically configured to: determine whether the peak value of the energy change is greater than or equal to the energy change threshold; if the peak value of the energy change is greater than or equal to the energy change threshold, continue to execute the time corresponding to the energy change peak as the Steps at the time of stress; if the energy change peak value is less than the energy change threshold, continue to perform the step of sliding the target sliding window according to the preset step size.

FIG. 10 shows an internal structure diagram of an audio accent recognition device in one embodiment. As shown in FIG. 10, the audio accent recognition device includes a processor, a memory and a network interface connected through a system bus. Wherein, the memory includes a non-volatile storage medium and an internal memory. The non-volatile storage medium of the audio stress recognition device stores an operating system, and also stores a computer program, which, when executed by the processor, enables the processor to implement the audio stress recognition method. A computer program may also be stored in the internal memory, and when executed by the processor, the computer program can cause the processor to execute the audio accent recognition method. Those skilled in the art can understand that the structure shown in FIG. 10 is only a block diagram of a partial structure related to the solution of the present application, and does not constitute a limitation on the audio stress recognition device to which the solution of the present application is applied. The accent recognition device may include more or fewer components than shown in the figures, or combine certain components, or have a different arrangement of components.

An audio stress recognition device, comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implements the following steps when executing the computer program: acquiring an original audio signal; acquiring a target Gaussian Window function, process the original audio signal according to the target Gaussian window function, and obtain the energy change curve corresponding to the original audio signal; obtain the target sliding window, determine the stress time in the energy change curve according to the target sliding window, and convert the original audio at the stress time. Signals are marked as audio accents.

In one embodiment, processing the original audio signal according to the target Gaussian window function to obtain an energy change curve corresponding to the original audio signal includes: performing weighted calculation on the original audio signal according to the target Gaussian function to obtain an energy curve corresponding to the original audio signal ; Perform numerical conversion processing on the energy curve to obtain the energy change curve corresponding to the original audio signal.

In one embodiment, weighted calculation is performed on the original audio signal according to the target Gaussian function to obtain an energy curve corresponding to the original audio signal, including: determining the truncated audio signal of the original audio signal at the target time according to the target Gaussian window function; wherein, the target time is any moment in the original audio signal; the truncated audio signal and the target Gaussian window function are weighted to obtain the target energy value of the original audio signal at the target moment, and the corresponding original audio signal is obtained according to the target energy value at each target moment. energy curve.

In one embodiment, determine the truncated audio signal of the original audio signal at the target moment according to the target Gaussian window function, comprising: taking the target moment as the middle moment of the Gaussian window corresponding to the target Gaussian window function, adding a Gaussian window on the original audio signal; Take the audio signal within the Gaussian window as the truncated audio signal at the target time.

In one embodiment, performing numerical conversion processing on the energy curve to obtain an energy change curve corresponding to the original audio signal, including: performing logarithmic processing on the energy curve to obtain a logarithmic function corresponding to the original audio signal; The secondary derivation process is used to obtain the energy change curve corresponding to the original audio signal.

In one embodiment, determining the stress moment in the energy change curve according to the target sliding window includes: adding a target sliding window to the energy change curve, obtaining the energy change peak value of the energy change curve in the target sliding window, and converting the energy change peak value corresponding to the energy change peak value time as the accent time; wherein, the starting point of the target sliding window at the starting position is the starting point of the energy change curve; sliding the target sliding window according to the preset step size, and returning to obtain the energy change peak value of the energy change curve in the target sliding window, The step of taking the time corresponding to the energy change peak as the accent time.

In one embodiment, before taking the time corresponding to all the energy change peaks as the stress time, the method further includes: judging whether the energy change peak value is greater than or equal to the energy change threshold; if the energy change peak value is greater than or equal to the energy change threshold, continue to execute the The time corresponding to the change peak is regarded as the step of the stress time; if the energy change peak is smaller than the energy change threshold, the step of sliding the target sliding window according to the preset step size is continued.

A computer-readable storage medium, which stores a computer program, and when the computer program is executed by a processor, the following steps are implemented: obtaining an original audio signal; The signal is processed to obtain the energy change curve corresponding to the original audio signal; the target sliding window is obtained, the stress time in the energy change curve is determined according to the target sliding window, and the original audio signal at the stress time is marked as audio stress.

It should be noted that the above-mentioned audio stress recognition method, apparatus, device and computer-readable storage medium belong to a general inventive concept, and the contents in the audio stress recognition method, apparatus, device and computer-readable storage medium embodiments are applicable to each other.

Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented by instructing relevant hardware through a computer program, and the program can be stored in a non-volatile computer-readable storage medium, When the program is executed, it may include the flow of the embodiments of the above-mentioned methods. Wherein, any reference to memory, storage, database or other medium used in the various embodiments provided in this application may include non-volatile and/or volatile memory. Nonvolatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in various forms such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Road (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM) and so on.

The technical features of the above embodiments can be combined arbitrarily. In order to make the description simple, all possible combinations of the technical features in the above embodiments are not described. However, as long as there is no contradiction in the combination of these technical features It is considered to be the range described in this specification.

The above examples only represent several embodiments of the present application, and the descriptions thereof are relatively specific and detailed, but should not be construed as a limitation on the scope of the patent of the present application. It should be pointed out that for those skilled in the art, without departing from the concept of the present application, several modifications and improvements can be made, which all belong to the protection scope of the present application. Therefore, the scope of protection of the patent of the present application shall be subject to the appended claims.

Claims

A method for audio accent recognition, characterized in that the method comprises:

Get the original audio signal;

Obtain a target Gaussian window function, process the original audio signal according to the target Gaussian window function, and obtain an energy change curve corresponding to the original audio signal;

Acquire a target sliding window, determine the accent moment in the energy change curve according to the target sliding window, and mark the original audio signal at the accent moment as audio accent.
The method according to claim 1, wherein the processing of the original audio signal according to the target Gaussian window function to obtain an energy change curve corresponding to the original audio signal, comprising:

Perform weighted calculation on the original audio signal according to the target Gaussian function to obtain an energy curve corresponding to the original audio signal;

Perform numerical conversion processing on the energy curve to obtain an energy change curve corresponding to the original audio signal.
The method according to claim 2, wherein the weighted calculation of the original audio signal according to the target Gaussian function to obtain an energy curve corresponding to the original audio signal, comprising:

Determine the truncated audio signal of the original audio signal at the target moment according to the target Gaussian window function; wherein, the target moment is any moment in the original audio signal;

Carry out weighted calculation with the truncated audio signal and the target Gaussian window function, obtain the target energy value of the original audio signal at the target time, and obtain the original audio according to the target energy value at each target time. The corresponding energy curve of the signal.
The method according to claim 3, wherein, determining the truncated audio signal of the original audio signal at a target moment according to the target Gaussian window function, comprising:

Taking the target moment as the middle moment of the Gaussian window corresponding to the target Gaussian window function, a Gaussian window is added to the original audio signal;

Taking the audio signal in the Gaussian window as the truncated audio signal at the target time.
The method according to claim 2, wherein, performing numerical conversion processing on the energy curve to obtain an energy change curve corresponding to the original audio signal, comprising:

Perform logarithmic processing on the energy curve to obtain a logarithmic function corresponding to the original audio signal;

Performing a quadratic derivation process on the logarithmic function to obtain an energy change curve corresponding to the original audio signal.
The method according to claim 1, wherein the determining the stress time in the energy change curve according to the target sliding window comprises:

The target sliding window is added to the energy change curve, the energy change peak value of the energy change curve in the target sliding window is obtained, and the time corresponding to the energy change peak value is taken as the accent time; wherein, the target slide The starting point of the window at the starting position is the starting point of the energy change curve;

Sliding the target sliding window according to a preset step size, returning to the step of obtaining the energy change peak value of the energy change curve in the target sliding window, and taking the time corresponding to the energy change peak value as the accent time.
The method according to claim 6, further comprising:

Determine whether the energy change peak value is greater than or equal to the energy change threshold;

If the energy change peak value is greater than or equal to the energy change threshold value, then continue to perform the step of using the time corresponding to the energy change peak value as the accent time;

If the energy change peak value is smaller than the energy change threshold, the step of sliding the target sliding window according to a preset step size is continued.
An audio stress recognition device, characterized in that the device comprises:

an energy variation curve acquisition module, used to acquire an original audio signal; acquire a target Gaussian window function, process the original audio signal according to the target Gaussian window function, and obtain an energy variation curve corresponding to the original audio signal;

The accent recognition module is configured to acquire a target sliding window, determine the accent moment in the energy change curve according to the target sliding window, and mark the original audio signal at the accent moment as audio accent.
A computer-readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of the method according to any one of claims 1 to 7.
An audio accent recognition device, comprising a memory and a processor, wherein the memory stores a computer program, and when the computer program is executed by the processor, the processor is made to perform the process as claimed in any one of claims 1 to 7. steps of the method described.