CN109960484B

CN109960484B - Audio volume acquisition method and device, storage medium and terminal

Info

Publication number: CN109960484B
Application number: CN201711429071.3A
Authority: CN
Inventors: 王天宝
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2017-12-26
Filing date: 2017-12-26
Publication date: 2021-08-24
Anticipated expiration: 2037-12-26
Also published as: CN109960484A

Abstract

The embodiment of the invention discloses an audio volume acquisition method and device, a storage medium and a terminal. Acquiring the frame volume of each frame of audio segment in a plurality of frames of audio segments contained in first audio data; acquiring a plurality of volume intervals, dividing the frame volume of each frame of audio segment in the multi-frame audio segment into the volume intervals to which the frame volume belongs, and determining the number of the frame volume contained in each volume interval in the plurality of volume intervals; and determining a target volume interval according to the number of frame volumes contained in each volume interval, and determining the audio volume of the first audio data according to the target frame volume contained in the target volume interval. By the adoption of the method and the device, the accuracy of the acquired audio volume can be improved.

Description

Audio volume acquisition method and device, storage medium and terminal

Technical Field

The present invention relates to the field of audio processing technologies, and in particular, to an audio volume obtaining method and apparatus, a storage medium, and a terminal.

Background

The audio volume of audio data plays an important role in techniques such as volume gain control and volume normalization. Different audio data may result in different volume levels due to different acquisition modes, different audio sources, etc. The existing volume statistical method is determined by calculating the average decibel value of the whole audio data, however, the volume determined by using this scheme is not accurate, because for the whole audio data, if the individual decibel value is particularly high or the individual decibel value is particularly low, the average decibel value of the whole audio data is very easily affected, so that the determined volume gain will have errors in the application scenario of volume gain control. In this case, how to accurately acquire the volume of the audio data is a considerable problem.

Disclosure of Invention

The embodiment of the invention provides an audio volume acquisition method and device, a storage medium and a terminal, which can improve the accuracy of the acquired audio volume.

In one aspect, an embodiment of the present invention provides an audio volume obtaining method, including:

acquiring the frame volume of each frame of audio segment in a plurality of frames of audio segments contained in first audio data;

acquiring a plurality of volume intervals, dividing the frame volume of each frame of audio segment in the multi-frame audio segment into the volume intervals to which the frame volume belongs, and determining the number of the frame volume contained in each volume interval in the plurality of volume intervals;

and determining a target volume interval according to the number of frame volumes contained in each volume interval, and determining the audio volume of the first audio data according to the target frame volume contained in the target volume interval.

In one possible embodiment, the obtaining the frame volume of each frame of audio segments in the plurality of frames of audio segments included in the first audio data includes:

acquiring a first amplitude value of a sampling point contained in each frame of audio segment in first audio data;

and acquiring the frame volume of each frame of audio segment according to the first amplitude value of the sampling point contained in each frame of audio segment.

In a possible embodiment, the obtaining a frame volume of each frame of audio segments according to a first amplitude value of a sampling point included in each frame of audio segments includes:

acquiring a volume peak value of each sampling point according to a first amplitude value of the sampling point contained in each frame of audio segment;

and calculating the frame volume of each audio segment by adopting the volume peak value of the sampling point contained in each audio segment.

In a possible embodiment, the obtaining a volume peak value of each sample point according to the first amplitude value of the sample point included in each frame of audio segment includes:

acquiring a first amplitude value of an nth sampling point and a volume peak value of an (n-1) th sampling point contained in each frame of audio segment, wherein n is a positive integer;

and calculating the volume peak value of the nth sampling point according to the first amplitude value of the nth sampling point and the volume peak value of the (n-1) th sampling point.

In one possible embodiment, the calculating the volume peak value of the nth sample point according to the first amplitude value of the nth sample point and the volume peak value of the (n-1) th sample point includes:

calculating a first candidate volume peak value of the nth sampling point according to the first amplitude value of the nth sampling point;

calculating a second candidate volume peak value of the nth sampling point according to the volume peak value and the attenuation value of the (n-1) th sampling point;

determining a larger value of the first candidate volume peak and the second candidate volume peak as a volume peak of the nth sample point.

In a possible embodiment, the determining the target volume interval according to the number of frame volumes included in each volume interval includes:

acquiring the number of frame volumes contained in a first volume interval with the maximum decibel value in the plurality of volume intervals;

and when the number of frame volumes contained in the first volume interval is larger than a target threshold value, determining the first volume interval as a target volume interval.

In one possible embodiment, the method further comprises:

when the number of frame volumes contained in the first volume interval is not larger than the target threshold, determining a volume interval with the maximum decibel value in the rest volume intervals which do not participate in comparison as a second volume interval, and calculating the sum of the number of frame volumes contained in the second volume interval from the first volume interval;

and when the sum of the number of frame volumes contained in the first volume interval to the second volume interval is greater than the target threshold value, determining the second volume interval as a target volume interval.

In one possible embodiment, the method further comprises:

when the sum of the number of frame volumes included in the first volume interval to the second volume interval is not greater than the target threshold, determining a volume interval with a maximum decibel value among remaining volume intervals not involved in comparison as a second volume interval, and calculating the sum of the number of frame volumes included in the first volume interval to the second volume interval.

In a possible embodiment, before obtaining the frame volume of each frame of audio segment in the multiple frames of audio segments included in the first audio data, the method further includes:

acquiring a second amplitude value of an mth sampling point in second audio data, wherein m is a positive integer;

and calculating the first amplitude value of the mth sampling point in the first audio data after the direct current removal processing according to the second amplitude value and the direct current component of the mth sampling point.

In a possible embodiment, before the calculating the third amplitude value of the mth sampling point in the first audio data after the dc removal processing according to the second amplitude value and the dc component of the mth sampling point, the method further includes:

and calculating a direct current component according to the average amplitude value of a first audio frequency segment of the second audio data where the m-th sampling point is located in the second audio data and the average amplitude value of a second audio frequency segment of the second audio data, wherein the second audio frequency segment and the first audio frequency segment contain different sampling points.

voice detection is performed on third audio data, and first audio data is generated using voice periods contained in the third audio data.

On the other hand, an embodiment of the present invention provides an audio volume obtaining apparatus, including:

the frame volume acquisition module is used for acquiring the frame volume of each frame of audio segment in a plurality of frames of audio segments contained in the first audio data;

the number determining module is used for acquiring a plurality of volume intervals, dividing the frame volume of each frame of audio segment in the multi-frame audio segment into the volume intervals to which the frame volume belongs, and determining the number of the frame volume contained in each volume interval in the plurality of volume intervals;

the interval determining module is used for determining a target volume interval according to the number of frame volumes contained in each volume interval;

and the volume determining module is used for determining the audio volume of the first audio data according to the target frame volume contained in the target volume interval.

In one possible embodiment, each frame of the audio segment includes a plurality of sampling points, and the frame volume obtaining module includes:

the amplitude value acquisition unit is used for acquiring a first amplitude value of a sampling point contained in each frame of audio segment in first audio data;

the frame volume acquisition unit is used for acquiring the frame volume of each frame of audio segment according to the first amplitude value of the sampling point contained in each frame of audio segment;

in one possible embodiment, the frame volume acquiring unit includes:

the volume peak value acquisition subunit is used for acquiring a volume peak value of each sampling point according to a first amplitude value of the sampling point contained in each frame of audio segment;

and the frame sound volume calculating operator unit is used for calculating the frame sound volume of each audio segment by adopting the sound volume peak value of the sampling point contained in each audio segment.

In a possible embodiment, the volume peak obtaining subunit is specifically configured to:

Optionally, the calculating the volume peak value of the nth sample point according to the first amplitude value of the nth sample point and the volume peak value of the (n-1) th sample point specifically includes:

In a possible embodiment, the interval determining module is specifically configured to:

In a possible embodiment, the interval determination module is further configured to determine, as a second volume interval, a volume interval with a maximum decibel value among remaining volume intervals not participating in comparison when the first volume interval includes a number of frame volumes not greater than the target threshold, and calculate a sum of the numbers of frame volumes included in the first volume interval to the second volume interval;

In a possible embodiment, the interval determination module is further configured to determine, as the second volume interval, a volume interval in which a decibel value is maximum among remaining volume intervals that do not participate in the comparison, and calculate a sum of the numbers of frame volumes included in the second volume interval from the first volume interval, when the sum of the numbers of frame volumes included in the second volume interval from the first volume interval is not greater than the target threshold.

In a possible embodiment, the audio volume obtaining apparatus further includes:

the amplitude value acquisition module is used for acquiring a second amplitude value of an mth sampling point in second audio data, wherein m is a positive integer;

and the amplitude value calculation module is used for calculating the first amplitude value of the mth sampling point in the first audio data after the direct current removal processing according to the second amplitude value and the direct current component of the mth sampling point.

and the direct current component calculation module is used for calculating a direct current component according to the average amplitude value of a first audio frequency segment of the second audio frequency data where the mth sampling point is located and the average amplitude value of a second audio frequency segment of the second audio frequency data, wherein the second audio frequency segment and the first audio frequency segment contain different sampling points.

and the voice detection module is used for executing voice detection on the third audio data and generating the first audio data by adopting the voice time interval contained in the third audio data.

In another aspect, a computer storage medium is provided for embodiments of the present invention, the computer storage medium storing a plurality of instructions adapted to be loaded by a processor and to perform the method steps provided in the first aspect.

On the other hand, an embodiment of the present invention provides a terminal, including: a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the steps of:

In the embodiment of the invention, the frame volume of each frame of audio segment in a plurality of frames of audio segments contained in first audio data is firstly acquired, then the frame volume of each frame of audio segment in the plurality of frames of audio segments is divided into the volume intervals to which the frame volume belongs, and the number of the frame volume contained in each volume interval in a plurality of volume intervals is determined; and finally, determining a target volume interval according to the number of frame volumes contained in each volume interval, and determining the audio volume of the first audio data according to the target frame volumes contained in the target volume interval. The target volume interval for determining the audio volume is selected by counting the number of the frame volumes in the volume interval, so that the situation that the audio volume of the audio data is influenced due to the fact that the decibel value of the frame volume of each audio segment is particularly high or low can be reduced, and the accuracy of the acquired audio volume is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic flowchart of an audio volume obtaining method according to an embodiment of the present invention;

fig. 2 is a schematic flow chart of another audio volume obtaining method according to an embodiment of the present invention;

FIG. 3 is a flowchart of a method for processing the DC component in step 202 according to an embodiment of the present invention;

FIG. 4a is a waveform diagram illustrating a DC component removal process according to an embodiment of the present invention;

FIG. 4b is a waveform diagram illustrating another DC component removal process provided by an embodiment of the present invention;

FIG. 4c is a waveform diagram of audio data according to an embodiment of the present invention;

FIG. 4d is a waveform diagram illustrating another audio data according to an embodiment of the present invention;

FIG. 5 is a flow chart of a method for obtaining frame volume of an audio segment according to an embodiment of the present invention;

FIG. 6 is a flowchart of a method for determining a target volume interval according to an embodiment of the present invention;

fig. 7 is an exemplary diagram of an audio volume obtaining apparatus according to an embodiment of the present invention;

fig. 8 is a schematic structural diagram of an audio volume obtaining device according to an embodiment of the present invention;

fig. 9 is a schematic structural diagram of another audio volume acquiring device according to an embodiment of the present invention;

fig. 10 is a schematic structural diagram of a terminal according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The audio volume acquisition method provided by the embodiment of the invention can be applied to an audio acquisition scene of voice audio data, and particularly, the audio volume acquisition equipment acquires the frame volume of each frame of audio segment in a plurality of frame audio segments contained in the voice audio data; acquiring a plurality of volume intervals, dividing the frame volume of each frame of audio segment in a plurality of frames of audio segments into the volume intervals to which the frame volume belongs, and determining the number of the frame volume contained in each volume interval in the plurality of volume intervals; and determining a target volume interval according to the number of frame volumes contained in each volume interval, and determining the audio volume of the voice audio data according to the target frame volumes contained in the target volume interval. The target volume interval for determining the audio volume is selected by counting the number of the frame volumes in the volume interval, so that the situation that the audio volume of the audio data is influenced due to the fact that the decibel value of the frame volume of each audio segment is particularly high or low can be reduced, and the accuracy of the acquired audio volume is improved.

The audio volume acquiring device or the terminal according to the embodiment of the present invention may be a device having a processing capability, for example: tablet computers, mobile phones, electronic readers, Personal Computers (PCs), notebook computers, and the like.

Referring to fig. 1, a flow chart of an audio volume obtaining method according to an embodiment of the present invention is schematically shown. As shown in fig. 1, the method of an embodiment of the present invention may include the following steps 101-103.

101, obtaining the frame volume of each frame of audio segment in the multi-frame audio segments contained in the first audio data.

Specifically, the audio volume acquiring device may divide the first audio data into multiple audio frames, which is specifically illustrated by the following two cases. The first is for the case where the duration of the first audio data is an integer multiple of the frame length, and each frame of audio segment contains the same number of sample points. For example, the first audio data has a duration of 10s, 8kHz samples, and is divided into 20ms frames of audio segments, and then the first audio data is divided into 500 frames, and each 20ms frame of audio segment includes 160 samples of 20ms × 8 kHz. The second is for the case that the duration of the first audio data is not an integer multiple of the frame length, the number of sampling points contained in the audio segments of other frames except the last frame is the same, and the number of sampling points contained in the last frame is less than that of the sampling points contained in the audio segments of other frames. For example, the duration of the first audio data is 1.05s, 8kHz samples, and the first audio data is divided into one frame of audio segment in 20ms, so that the first audio data is divided into 53 frames, and the audio segments of the 1 st to 52 th frames include 20ms × 8kHz — 160 sampling points; the audio segment of frame 53 contains 80 sample points.

Then, the frame volume of each frame of audio segment in the multiple frames of audio segments included in the first audio data may be specifically determined by first obtaining a first amplitude value of a sampling point included in each frame of audio segment in the first audio data, where the first amplitude value of the sampling point included in the first audio data may be an amplitude of the sampling point in the first audio data; and then, acquiring the frame volume of each frame of audio segment according to the first amplitude value of the sampling point contained in each frame of audio segment. In an embodiment of the present invention, the volume of the segment of audio may be represented by a frame volume.

102, obtaining a plurality of volume intervals, dividing the frame volume of each frame of audio segment in the multi-frame audio segment into the volume intervals to which the frame volume belongs, and determining the number of the frame volume contained in each volume interval in the plurality of volume intervals.

Optionally, the multiple volume intervals acquired by the audio volume acquiring device are pre-established, so that the established multiple volume intervals can be directly acquired under the condition that the audio volume acquiring device acquires the volume of any one first audio data, and the determining efficiency of the volume intervals is improved.

For example, the plurality of volume intervals established are: the length of each volume interval is 1 dB, namely, 0 dB to-1 dB is a volume interval, -1 dB to-2 dB is a volume interval, -39 dB to-40 dB is a volume interval, and less than-40 dB is a volume interval. This is because in practice a decibel lower than-40 decibels of speech is a very small volume and can be considered as a large interval. That is, the plurality of volume intervals include: (∞ -40)]、(40,-39]、(-39,-38]、…(-1,0]. Then, dividing the frame volume of each frame of audio segment in the multi-frame audio segment into the volume interval to which the frame volume belongs, for example, the frame volume of the ith frame of audio segment is PeakdB_iAnd when the sound volume of the frame of the ith audio segment is equal to-2.3 decibels, the sound volume of the frame of the ith audio segment belongs to a sound volume interval (-3, -2)]。

For example, if the first audio data includes frames of audio segments having frame volumes of-1.3, -1.6, -3.5, -0.7, then the set elements corresponding to each volume interval are:

the volume interval (-1,0] corresponds to a set { -0.7}, which is denoted as S1;

the volume interval (-2, -1] corresponds to the set { -1.3, -1.6}, and the set is denoted as S2;

volume interval (-3, -2)]Corresponding to

This set is denoted as S3;

the volume interval (-4, -3] corresponds to a set { -3.5}, which is denoted as S4;

the other volume intervals correspond to

After dividing the frame volume of each frame of audio segment in the multiple frame of audio segments into the volume intervals to which the frame volume belongs, the number of frame volumes included in each volume interval in the multiple volume intervals may be determined, or may be described as the number of audio segments included in each volume interval in the multiple volume intervals may be determined.

It is understood that the volume intervals may be combined into a continuous decibel range, and the decibel range may include the individual decibel values that can occur in the audio segment. For example, in the embodiment of the present invention, the amplitude value of the audio data is compared with the maximum quantization value U₀The ratio of (a) is converted into a logarithmic domain, so that the frame volume of each frame of audio segments is less than 0, e.g., the range of decibels consisting of a plurality of volume intervals may be (∞, 0)]. Therefore, the frame volume of each frame of audio segment can be divided into the volume intervals to which the frame volume belongs. In addition, there is no overlapped decibel value between the volume intervals, so that the volume of one frame can be divided into only one volume interval, and the number of the frame volumes contained in the volume interval can be counted conveniently.

It should be noted that, in the embodiment of the present invention, a decibel value range corresponding to each volume space in the multiple volume spaces is not limited.

103, determining a target volume interval according to the number of frame volumes contained in each volume interval, and determining the audio volume of the first audio data according to the target frame volume contained in the target volume interval.

Specifically, the target volume interval determined by the audio volume acquiring device is included in the plurality of volume intervals acquired in step 102. The audio volume acquisition equipment determines a target volume interval according to the number of frame volumes contained in each volume interval, on one hand, to determine the volume interval in which the audio volume of the first audio data is located, and on the other hand, to determine an audio segment used for calculating the audio volume of the first audio data. In this way, in the process of determining the target volume interval, the target frame volume may be determined from the frame volumes of the plurality of audio segments, and the audio volume of the first audio data may be determined. Therefore, the situation that the audio volume of the audio data is influenced due to the fact that the frame volume decibel value of each audio segment is particularly high or low can be reduced, and the accuracy of the acquired audio volume is improved.

In the aspect of determining the audio volume of the first audio data by the audio volume obtaining device according to the target frame volume included in the target volume interval, reference may be made to the following detailed descriptions of three optional implementations, and the specific determination process of the audio volume is not limited in the embodiment of the present invention.

In a first optional scheme, the audio volume of the first audio data determined by the audio volume obtaining device may be an average value of the target frame volumes included in the target volume interval. For example, assuming the example in step 102 as a premise, if the target volume interval is S2, the target frame volumes are-1.3 and-1.6, and the average of-1.3 and-1.6 is calculated to be-1.45, so the audio volume of the first audio data is-1.45.

In a second optional scheme, the audio volume of the first audio data determined by the audio volume obtaining device may be one of the target frame volumes included in the target volume interval. For example, the target frame volumes are-1.3 and-1.6, the audio volume acquisition device may determine-1.3 as the audio volume of the first audio data; alternatively, -1.6 may be determined as the audio volume of the first audio data.

In a third optional scheme, the audio volume of the first audio data determined by the audio volume obtaining device may be a weighted average of the volumes of the target frames included in the target volume interval. Optionally, the weighted value corresponding to the volume of each target frame may be set according to an arrangement order of the decibel values, for example, a smaller weighted value with a smaller decibel value and a larger weighted value with a larger decibel value.

Referring to fig. 2, a flow chart of an audio volume obtaining method according to an embodiment of the present invention is shown. As shown in fig. 2, the method of an embodiment of the present invention may include the following steps 201-204.

And 201, performing direct current removal processing on the second audio data to obtain third audio data.

Specifically, in practice, the second audio data may include a dc offset due to a collection manner of the audio data, and the like, and the embodiment of the present invention can improve accuracy of the acquired audio volume by eliminating the dc offset, where the dc offset may be represented by a dc component.

For the specific implementation process of step 201, the following detailed descriptions of step 301 to step 302 may be referred to.

202, performing speech detection on the third audio data and generating the first audio data with speech periods comprised by said third audio data.

Specifically, because there may be a silence period and a voice period in the audio data, the silence period refers to a period without voice, and if the period of audio data is also used to obtain the audio volume, the accuracy of the obtained audio volume will be reduced. Therefore, the audio volume acquiring device in the embodiment of the present invention generates the first audio data including the voice period by performing the voice detection.

For example, when Voice Activity Detection (VAD) is used to detect the third audio data, the data in the silent period may be deleted, and the first audio data may be generated according to the data in the remaining period; alternatively, the detected speech periods may be sorted out to constitute the first audio data.

In alternative implementations, the audio volume acquisition device may perform at least one of step 201 and step 202. For example, the audio volume obtaining device executes step 202, does not execute step 201, that is, the initial audio data is the third audio data, determines the second audio data as the first audio data after performing voice detection on the third audio data to obtain the second audio data, and then executes steps 203 to 206. For another example, the audio volume obtaining apparatus does not perform step 202, and performs step 201, where the initial audio data is the second audio data, and performs dc removal processing on the second audio data to obtain third audio data, determines the third audio data as the first audio data, and then performs steps 203 to 206 on the first audio data. As another example, steps 201 to 206 are performed according to the scheme of the embodiment of the present invention.

And 203, acquiring the frame volume of each frame of audio segment in the multi-frame audio segments contained in the first audio data.

In a specific implementation, the audio volume obtaining device may divide the first audio data into multiple audio segments, and the dividing manner may refer to the manner of dividing the audio segments into the second audio segment in step 302, which is not described herein again. The frame volume of each frame of audio segment in the multiple frames of audio segments contained in the first audio data can be specifically obtained by first obtaining a first amplitude value of a sampling point contained in each frame of audio segment in the first audio data; and then, acquiring the frame volume of each frame of audio segment according to the first amplitude value of the sampling point contained in each frame of audio segment.

For a specific implementation process of obtaining the frame volume of each frame of audio segment according to the first amplitude value of the sampling point included in each frame of audio segment, reference may be made to the detailed description of steps 501 to 502.

And 204, acquiring a plurality of volume intervals, dividing the frame volume of each frame of audio segment in the multi-frame audio segment into the volume intervals to which the frame volume belongs, and determining the number of the frame volume contained in each volume interval in the plurality of volume intervals.

For example, the plurality of volume intervals established are: the length of each volume interval is 1 dB, namely, 0 dB to-1 dB is a volume interval, -1 dB to-2 dB is a volume interval, -39 dB to-40 dB is a volume interval, and less than-40 dB is a volume interval. This is because in practice a decibel lower than-40 decibels of speech is a very small volume and can be considered as a large interval. That is, the plurality of volume intervals include: (∞ -40)]、(40,-39]、(-39,-38]、…(-1,0]. Then, dividing the frame volume of each frame of audio segment in the multi-frame audio segment into the volume interval to which the frame volume belongs, for example, the frame volume PeakdB of the ith frame of audio segment_iAnd when the sound volume of the frame of the ith audio segment is equal to-2.3 decibels, the sound volume of the frame of the ith audio segment belongs to a sound volume interval (-3, -2)]。

volume interval (-3, -2)]Corresponding to

This set is denoted as S3;

the other volume intervals correspond to

205, determining a target volume interval according to the number of frame volumes contained in each volume interval.

For determining the target volume interval according to the number of frame volumes included in each volume interval, reference may be made to the detailed description of steps 601 to 606.

And 206, determining the audio volume of the first audio data according to the target frame volume contained in the target volume interval.

Specifically, in a first optional scheme, the audio volume of the first audio data determined by the audio volume obtaining device may be an average value of the volumes of the target frames included in the target volume interval. For example, assuming the example in step 204 as a premise, if the target volume interval is S2, the target frame volumes are-1.3 and-1.6, and the average of-1.3 and-1.6 is calculated to be-1.45, so the audio volume of the first audio data is-1.45.

Referring to fig. 3, a flowchart of a method for processing the dc component in step 202 is provided, as shown in fig. 3, the flowchart includes steps 301 to 303.

301, a second amplitude value of the mth sampling point in the second audio data is obtained.

Specifically, the second audio data includes a plurality of sampling points, any one of which is represented by m, and m is a positive integer. The second amplitude value for the mth sampling point of the second audio data may be the amplitude of the sampling point in the second audio data. For example, if the second audio data is quantized with 16 bits, the amplitude of the second audio data may be divided into 2¹⁶The quantization range of the amplitude of the second audio data is-32767 to 32768, which is 65536 quantization levels. Thus, the second amplitude value of the mth sampling point acquired by the audio volume acquisition device is the quantized value of the mth sampling point.

And 302, calculating a direct current component according to the average amplitude value of the first audio frequency segment of the m-th sampling point in the second audio frequency data and the average amplitude value of the second audio frequency segment of the second audio frequency data.

Specifically, step 302 is used to determine the dc component, and is a determination method of the dc component, and the determination method of the dc component in the embodiment of the present invention is not limited. Firstly, the audio volume acquisition device divides the second audio data into multiple audio frames, for example, the frame length is set to be 20 ms; then, the DC component is determined according to the average amplitude value between two frames.

The division of the second audio data into multiple frame audio segments may be illustrated by the following two cases. The first is for the case where the duration of the second audio data is an integer multiple of the frame length, and each frame of audio segment contains the same number of sample points. For example, the second audio data has a duration of 10s, 8kHz samples, and is divided into 20ms frames of audio segments, and then the second audio data is divided into 500 frames, and each 20ms frame of audio segment includes 160 samples of 20ms × 8 kHz. The second is for the case that the duration of the second audio data is not an integer multiple of the frame length, the number of sampling points contained in the audio segments of other frames except the last frame is the same, and the number of sampling points contained in the last frame is less than that of the sampling points contained in the audio segments of other frames. For example, the duration of the second audio data is 1.05s, 8kHz samples, and the second audio data is divided into 53 frames according to 20ms, where the audio segments of the 1 st to 52 th frames include 20ms × 8kHz — 160 samples; the audio segment of frame 53 contains 80 sample points.

Next, the average amplitude value for each frame of the audio segment may be determined by calculating an average of the second amplitude values of the sample points included in the frame of the audio segment.

And finally, calculating the direct current component according to the average amplitude value of the first audio frequency segment and the average amplitude value of the second audio frequency segment of the second audio frequency data. Wherein the first audio segment is an audio segment at which the mth sampling point is located, the second audio segment is an audio segment different from the first audio segment, for example, according to the time sequence of the audio segment, the second audio segment is an audio segment of a frame before the first audio segment, and the calculation formula that the direct current component of the mth sampling point is dc (m) is as follows:

where α and β are set weights, and α + β is 1.

Representing the average amplitude value of the audio segment of the ith frame, wherein the audio segment where the mth sampling point is located is the ith frame;

representing the average amplitude value of the audio segment of the i-1 th frame.

Wherein the content of the first and second substances,

wherein L is the total number of sample points included in one frame of audio segment. It will be appreciated that, in the case where the audio segment in which the mth sample point is located is the first frame of audio segment,

a constant, e.g., 0, may be set.

Optionally, the second audio segment is a different audio segment from the first audio segment, and the second audio segment is not limited in the embodiment of the present invention.

303, calculating the first amplitude value of the mth sampling point in the third audio data after the dc removal processing according to the second amplitude value and the dc component of the mth sampling point.

Specifically, in order to eliminate the dc component, the dc component may be subtracted from the second amplitude value of the mth sampling point, so as to determine the first amplitude value of the mth sampling point. After the first amplitude value is calculated at each of the sampling points, the audio data composed of these sampling points is referred to herein as third audio data.

For example, if the first amplitude value at the m-th sampling point in the third audio data is represented by y (m), the second amplitude value at the m-th sampling point in the second audio data is represented by x (m), and the dc component at the m-th sampling point is represented by dc (m), the formula for y (m) is as follows:

y(m)＝x(m)-DC(m)

for example, referring to fig. 4a and fig. 4b together, a waveform diagram of a dc component removing process is provided for the embodiment of the present invention. As shown in fig. 4a and 4b, the abscissa represents time and the ordinate represents amplitude values. As shown in fig. 4a, a waveform diagram of the second audio data including the dc component is shown, and it can be seen that the second audio data does not fluctuate above and below the quantization value of 0, but fluctuates above and below the quantization value of-5000; as shown in fig. 4b, a waveform diagram of the third audio data after the dc-removing process is shown, and it can be seen that the third audio data fluctuates above and below the quantization value of 0 after the dc-removing process. In combination with the specific implementation process of step 203 to step 206, the determination of the first amplitude value of each sampling point is affected when the dc component is not removed from the audio data containing the dc component, so that the accuracy of the acquired audio volume is affected, and the accuracy of the acquired audio volume can be improved through the dc removal processing.

Referring to fig. 5, a flowchart of a method for obtaining frame volume of an audio segment is provided, as shown in fig. 5, the flowchart includes steps 501 to 502.

501. And acquiring the volume peak value of each sampling point according to the first amplitude value of the sampling point contained in each frame of audio segment.

Wherein, the volume peak refers to the maximum possible volume value of the sampling point. Specifically, the method can be realized by the steps a1 and a 2.

A1, acquiring a first amplitude value of an nth sampling point and a volume peak value of an (n-1) th sampling point contained in each frame of audio segment.

Specifically, n is a positive integer. The first amplitude value for the nth sample point of the first audio data may be the amplitude of the sample point in the first audio data. The volume peak for the (n-1) th sample point may have been calculated.

In a first alternative for the volume peak of the (n-1) th sampling point, in the audio segment of the ith frame, when n is 1, the (n-1) th sampling point may be the last sampling point of the (i-1) th frame, and therefore, the volume peak of the (n-1) th sampling point is the volume peak of the last sampling point of the (i-1) th frame. If the ith frame of audio segment is the 1 st frame of audio segment, the volume peak of the (n-1) th sample point may be a default value.

In a second alternative for the volume peak of the (n-1) th sample point, in the ith frame audio segment, when n is 1, the volume peak of the (n-1) th sample point is a default value.

And A2, calculating the volume peak value of the nth sampling point according to the first amplitude value of the nth sampling point and the volume peak value of the (n-1) th sampling point.

The step A2 can be realized through steps B1, B2 and B3.

And B1, calculating a first candidate volume peak value of the nth sampling point according to the first amplitude value of the nth sampling point.

Since the quantized value of each sampling point in the first audio data includes a negative number, in order to determine the decibel value of each sampling point, an absolute value of the first amplitude value of the sampling point is first obtained, for example, if the first amplitude value of the nth sampling point is y (n) and the absolute value is y' (n), then: y' (n) ═ y (n) |;

for example, please refer to fig. 4c and fig. 4d together, which provide a waveform diagram of audio data according to an embodiment of the present invention. As shown in fig. 4c and 4d, the abscissa indicates the number of sample points and the ordinate indicates the amplitude value. As shown in FIG. 4a, the waveform contains y (n) of 80 samples, and it can be seen that the amplitude value of y (n) has positive and negative values. After the absolute value is obtained, the waveform diagram shown in fig. 4b is obtained, which is y '(n), and it can be seen that y' (n) and y (n) contain the same number of sampling points, and the amplitude value with a negative value becomes the inverse of the absolute value after the absolute value is obtained. The difference between y (n) and y' (n) can thus be clearly seen in fig. 4c and 4 d.

Then, a first candidate volume peak value of the nth sampling point is calculated, specifically, y' (n) after the absolute value is taken is converted into a logarithmic domain, and the conversion formula is as follows, wherein U is₀Representing maximum quantisation of sample points, e.g. using 16bit quantisationThen corresponds to U₀＝32768。

And B2, calculating a second candidate volume peak value of the nth sampling point according to the volume peak value and the attenuation value of the (n-1) th sampling point.

For example, the Peak volume value of the (n-1) th sample point is Peak (n-1)_dBExpressed in decibels, the attenuation value is expressed in dBpT, specifically in decibels of attenuation between two sampling points with respect to the volume peak. Thus, the second candidate volume peak of the nth sample point is obtained by subtracting the attenuation value from the volume peak of the (n-1) th sample point, and the calculation formula is as follows:

Peak(n-1)_dB-dBpT

it is understood that the volume peak of the (n-1) th sampling point may be determined according to steps B1 through B3. Alternatively, in the case where n is 1, in order to calculate the volume peak value of the 1 st sampling point, the volume peak value of the (n-1) th sampling point may be a default value.

B3, determining the larger value of the first candidate volume peak value and the second candidate volume peak value as the volume peak value of the nth sampling point.

For example, the volume peak of the nth sample point is defined by Peak (n)_dBIndicating, then Peak (n)_dBThe calculation formula of (2) is as follows:

thus, the volume peak of the nth sample point can be calculated through steps B1-B3.

502. And calculating the frame volume of each audio segment by adopting the volume peak value of the sampling point contained in each audio segment.

Specifically, after the volume peak of each sampling point included in the ith frame of audio segment is obtained through calculation in step 501, the frame volume of the ith frame of audio segment may be obtained through calculation according to the volume peak of each sampling point.

For example, the frame volume of the ith frame of audio segment may be calculated by performing weighted average on the volume peak values of the respective sample points, and the specific calculation formula is as follows:

wherein PeakdB_iThe frame volume of the ith frame audio segment is shown, wherein i is a positive integer; l is the total number of sample points contained in a frame of an audio segment. L denotes the first sample point of the ith frame, and (i +1) L denotes the last sample point of the ith frame.

Optionally, the audio volume obtaining device may also determine the frame volume of each audio segment by using other manners (for example, setting weights corresponding to volume peaks of each sampling point, and then averaging), which is not limited in the embodiment of the present invention.

Referring to fig. 6, a flowchart of a method for determining a target volume interval is provided according to an embodiment of the present invention, and as shown in fig. 6, the flowchart includes steps 601 to 606.

601. And acquiring the number of frame volumes contained in a first volume interval with the maximum decibel value in the plurality of volume intervals.

Specifically, each volume interval in the plurality of volume intervals has a corresponding decibel range, and no overlapped decibel value exists between every two volume intervals, so that a certain decibel value contained in the volume interval E can be compared with a certain decibel value contained in the volume interval F, and if the decibel value contained in the volume interval E is greater than the decibel value contained in the volume interval F, the decibel value in the volume interval E is greater than the decibel value in the volume interval F. Therefore, in this way, the volume interval with the largest decibel value is selected from the plurality of volume intervals, and is referred to as the first volume interval. Then, the number of frame volumes included in the first volume interval is obtained.

And 602, determining whether the number of frame volumes included in the first volume interval is greater than a target threshold.

Specifically, the audio volume acquiring device determines whether the number of frame volumes included in the first volume interval is greater than a target threshold, if so, step 603 is executed, and if not, step 604 is executed.

603. And determining the first volume interval as a target volume interval.

Specifically, when the number of frame volumes included in the first volume interval is greater than a target threshold, the audio volume acquisition device determines the first volume interval as a target volume interval.

604. And determining a volume interval with the maximum decibel value in the rest volume intervals which do not participate in comparison as a second volume interval, and calculating the sum of the number of frame volumes contained in the second volume interval from the first volume interval.

Specifically, when the number of frame volumes included in the first volume interval is not greater than the target threshold, the audio volume acquisition device determines a volume interval with a maximum decibel value in the remaining volume intervals not involved in the comparison as a second volume interval, and calculates the sum of the number of frame volumes included in the second volume interval from the first volume interval.

And the residual volume interval not participating in the comparison represents the number of frame volumes contained in the volume interval and participates in the comparison process with the target threshold. In this step, a volume interval with the largest decibel value can be selected from the remaining volume intervals that are not involved in the comparison in the same manner as described in step 604, and the selected volume interval is determined as the second volume interval.

Alternatively, the target threshold may be a preset value. Alternatively, the target threshold may be calculated according to the duration of the first audio data, for example, if the duration of the first audio data is 10s and the frame length is 20ms, the first audio data includes 500 frames of audio segments, and the target threshold may be set to 30% of the total number of frames of the audio segments, that is, the target threshold is 150, so that the target threshold may be automatically adjusted according to the duration of the first audio data, thereby avoiding the situation where the duration of the first audio data is too long and the target threshold is too small, and avoiding the situation where the duration of the first audio data is too short and the target threshold is too large.

605, determining whether the sum of the frame volumes included in the first volume interval to the second volume interval is greater than the target threshold.

Specifically, the audio volume acquisition device determines whether the sum of the frame volumes included in the first volume interval to the second volume interval is greater than the target threshold. If yes, go to step 606, and if not, go to step 604.

606. And determining the second volume interval as a target volume interval.

Specifically, when the sum of the number of frame volumes included in the first volume interval to the second volume interval is greater than the target threshold, the second volume interval is determined as a target volume interval. In addition, when the sum of the frame volumes included in the first volume interval to the second volume interval is not greater than the target threshold, step 604 is executed to determine a volume interval with the largest decibel value among the remaining volume intervals not participating in the comparison as a second volume interval, and calculate the sum of the frame volumes included in the first volume interval to the second volume interval.

For example, assuming the example of step 204 as a premise, if the target threshold is 2, the specific process of implementing the flow of fig. 6 is: first, the volume interval with the largest decibel value is (-1, 0), (-1, 0) corresponds to the set S1 containing 1 frame volume, then, it can be seen that the number of frame volumes contained in S1 is smaller than the target threshold 2, the second volume interval is determined to be (-2, -1), (-1, 0) corresponds to the set S1 and (-2, -1) corresponds to the set S2 containing 3 frame volumes, and finally, it can be seen that the number of frame volumes contained in S1 and S2 is larger than the target threshold 2, so that the volume interval (-2, -1) is determined to be the target volume interval.

Note that, in the embodiment shown in fig. 6, the target volume interval is determined by comparing the number of frame volumes. Optionally, the number of frame volumes may also be normalized, for example, the total number of frames of the audio segment included in the first audio data is determined, then the proportion of the frame volumes included in each volume interval in the total number of frames of the audio segment is determined, and the target volume interval is determined by comparing the proportion with the proportion threshold.

In the scheme for determining the target volume interval shown in fig. 6, it can be seen from the limitation of the target threshold that a certain number of audio segments are considered in the scheme, the number of frame volumes of the audio segments is counted from the volume interval with the larger decibel value, and finally, the volume interval with the smallest decibel value where the frame volumes currently participating in comparison are located is determined as the target volume interval when the number of frame volumes of the audio segments is greater than the target threshold. Therefore, the scheme can eliminate the frame volume with a particularly small decibel value by setting the condition that the frame volume meets the condition that the decibel value is larger than the target threshold value, and can eliminate the frame volume with a particularly large decibel value by selecting the step of the target volume space under the condition that a certain number of audio segments are considered, so that the condition that the audio volume of the audio data is influenced by the particularly high or low decibel value of the frame volume of each audio segment is reduced, and the accuracy of the acquired audio volume is improved.

In the embodiment of the invention, the frame volume of each frame of audio segment in a plurality of frames of audio segments contained in first audio data is firstly acquired, then the frame volume of each frame of audio segment in the plurality of frames of audio segments is divided into the volume intervals to which the frame volume belongs, and the number of the frame volume contained in each volume interval in a plurality of volume intervals is determined; and finally, determining a target volume interval according to the number of frame volumes contained in each volume interval, and determining the audio volume of the first audio data according to the target frame volumes contained in the target volume interval. The target volume interval for determining the audio volume is selected by counting the number of frame volumes in the volume interval, so that the situation that the audio volume of the audio data is influenced due to the fact that the decibel value of the frame volume of each audio segment is particularly high or low can be reduced. In addition, in the embodiment, the operation of removing the direct current component and the operation of voice detection are also performed on the original audio data, so that the accuracy of the acquired audio volume is further improved.

Further, an audio and video acquiring apparatus according to an embodiment of the present invention is specifically described with reference to fig. 7. Fig. 7 more fully illustrates other aspects that may be involved in the above-described method from the functional logic point of view, to facilitate the reader's further understanding of the technical solutions set forth herein. Referring to fig. 7, an exemplary diagram of an audio video capturing device is provided according to an embodiment of the present invention. As shown in fig. 7, the audio video capturing apparatus 700 may include: the device comprises a direct current removing module, a voice detection module, a frame volume calculation module, a target volume interval determination module and an audio volume calculation module. The direct current removing module is used for removing a direct current component in the input second audio data to obtain third audio data. The voice detection module is used for carrying out voice detection on the third audio data to obtain first audio data containing voice time intervals. The frame volume calculating module is used for calculating the frame volume of each frame of audio segments in the multi-frame audio segments contained in the first audio data. The target volume interval determining module is used for determining a target volume interval according to the number of the frame volumes contained in each volume interval after dividing the frame volumes of the multi-frame audio segments into a plurality of volume intervals. The audio volume calculation module is used for determining the audio volume of the first audio data after the target volume interval is determined. The method for acquiring the audio volume can be realized through the modules.

The present invention is only for illustration, and the embodiment of the present invention does not limit the modules included in the audio video capturing apparatus 700.

Referring to fig. 8, a schematic structural diagram of an audio volume obtaining device according to an embodiment of the present invention is provided. As shown in fig. 8, the audio volume acquiring apparatus 800 according to an embodiment of the present invention may include: a frame volume acquisition module 801, a number determination module 802, an interval determination module 803, and a volume determination module 804.

A frame volume obtaining module 801, configured to obtain a frame volume of each frame of audio segments in multiple frames of audio segments included in the first audio data.

Specifically, the frame volume obtaining module 801 may divide the first audio data into multiple audio frames, which is illustrated in the following two cases. The first is for the case where the duration of the first audio data is an integer multiple of the frame length, and each frame of audio segment contains the same number of sample points. For example, the first audio data has a duration of 10s, 8kHz samples, and is divided into 20ms frames of audio segments, and then the first audio data is divided into 500 frames, and each 20ms frame of audio segment includes 160 samples of 20ms × 8 kHz. The second is for the case that the duration of the first audio data is not an integer multiple of the frame length, the number of sampling points contained in the audio segments of other frames except the last frame is the same, and the number of sampling points contained in the last frame is less than that of the sampling points contained in the audio segments of other frames. For example, the duration of the first audio data is 1.05s, 8kHz samples, and the first audio data is divided into one frame of audio segment in 20ms, so that the first audio data is divided into 53 frames, and the audio segments of the 1 st to 52 th frames include 20ms × 8kHz — 160 sampling points; the audio segment of frame 53 contains 80 sample points.

Then, the frame volume of each frame of audio segment in the multiple frames of audio segments included in the first audio data may be specifically determined by first obtaining a first amplitude value of a sampling point included in each frame of audio segment in the first audio data, and the first amplitude value of the sampling point included in the first audio data may be the amplitude of the sampling point in the first audio data; and then, acquiring the frame volume of each frame of audio segment according to the first amplitude value of the sampling point contained in each frame of audio segment. Thus, in an embodiment of the present invention, the volume of the segment of audio may be represented by a frame volume.

The number determining module 802 is configured to obtain a plurality of volume intervals, divide the frame volume of each frame of audio segment in the multiple frames of audio segments into the volume intervals to which the frame volume belongs, and determine the number of frame volumes included in each volume interval in the multiple volume intervals.

Specifically, the volume intervals acquired by the quantity determining module 802 are pre-divided. For example, the volume interval is divided in the following manner: the length of each volume interval is 1 dB, namely, 0 dB to-1 dB is a volume interval, -1 dB to-2 dB is a volume interval, -39 dB to-40 dB is a volume interval, and less than-40 dB is a volume interval. Because in practice a decibel lower than-40 decibels of speech is a very small volumeIt can be regarded as a large interval. That is, the plurality of volume intervals include: (∞ -40)]、(40,-39]、(-39,-38]、…(-1,0]. Then, dividing the frame volume of each frame of audio segment in the multi-frame audio segment into the volume interval to which the frame volume belongs, for example, the frame volume of the ith frame of audio segment is PeakdB_iAnd when the sound volume of the frame of the ith audio segment is equal to-2.3 decibels, the sound volume of the frame of the ith audio segment belongs to a sound volume interval (-3, -2)]。

volume interval (-3, -2)]Corresponding to

This set is denoted as S3;

the other volume intervals correspond to

And an interval determining module 803, configured to determine a target volume interval according to the number of frame volumes included in each volume interval.

Specifically, the target volume interval determined by the audio volume obtaining device is included in the volume intervals obtained in the number determining module 802. The interval determination module 803 determines a target volume interval according to the number of frame volumes included in each volume interval, on one hand, to determine a volume interval in which the audio volume of the first audio data is located, and on the other hand, to determine an audio segment used for calculating the audio volume of the first audio data. In this way, in the process of determining the target volume interval, the target frame volume may be determined from the frame volumes of the plurality of audio segments, and the audio volume of the first audio data may be determined. Therefore, the situation that the audio volume of the audio data is influenced due to the fact that the frame volume decibel value of each audio segment is particularly high or low can be reduced, and the accuracy of the acquired audio volume is improved.

A volume determining module 804, configured to determine an audio volume of the first audio data according to a target frame volume included in the target volume interval.

Specifically, in the aspect that the volume determining module 804 determines the audio volume of the first audio data according to the target frame volume included in the target volume interval, reference may be made to the following detailed descriptions of three optional implementations, and the specific determination process of the audio volume is not limited in the embodiment of the present invention.

In a first alternative, the audio volume of the first audio data determined by the volume determination module 804 may be an average value of the target frame volumes included in the target volume interval. For example, assuming the example in step 204 as a premise, if the target volume interval is S2, the target frame volumes are-1.3 and-1.6, and the average of-1.3 and-1.6 is calculated to be-1.45, so the audio volume of the first audio data is-1.45.

In a second alternative, the audio volume of the first audio data determined by the volume determining module 804 may be one of the target frame volumes included in the target volume interval. For example, the target frame volumes are-1.3 and-1.6, the volume determination module 804 may determine-1.3 as the audio volume of the first audio data; alternatively, -1.6 may be determined as the audio volume of the first audio data.

In a third alternative, the audio volume of the first audio data determined by the volume determination module 804 may be a weighted average of the target frame volumes included in the target volume interval. Optionally, the weighted value corresponding to the volume of each target frame may be set according to an arrangement order of the decibel values, for example, a smaller weighted value with a smaller decibel value and a larger weighted value with a larger decibel value.

Referring to fig. 9, a schematic structural diagram of another audio volume obtaining device according to an embodiment of the present invention is provided. As shown in fig. 9, the audio volume acquiring apparatus 900 according to an embodiment of the present invention may include: an amplitude value acquisition module 901, an amplitude value calculation module 902, a voice detection module 903, a frame volume acquisition module 904, a number determination module 905, an interval determination module 906, and a volume determination module 907.

An amplitude value obtaining module 901, configured to obtain a second amplitude value of an mth sampling point in second audio data, where m is a positive integer;

an amplitude value calculating module 902, configured to calculate a first amplitude value of the mth sampling point in the first audio data after the dc removal processing according to the second amplitude value and the dc component of the mth sampling point.

Optionally, before executing the amplitude value calculating module 902, a direct current component calculating module is further included, where the direct current component calculating module is configured to calculate a direct current component according to an average amplitude value of a first audio segment of the second audio data where the mth sampling point is located, and an average amplitude value of a second audio segment of the second audio data, where the second audio segment is different from the first audio segment in terms of sampling points.

A voice detection module 903, configured to perform voice detection on the third audio data, and generate the first audio data by using a voice period included in the third audio data.

A frame volume obtaining module 904, configured to obtain a frame volume of each frame of audio segments in multiple frames of audio segments included in the first audio data;

in an embodiment of the present invention, each frame of audio segment includes a plurality of sampling points, and the frame volume obtaining module 904 includes: an amplitude value acquisition unit 9041, and a frame volume acquisition unit 9042.

An amplitude value acquisition unit 9041, configured to acquire a first amplitude value of a sampling point included in each frame of audio segment in the first audio data;

a frame volume acquiring unit 9042, configured to acquire a frame volume of each frame of audio segment according to a first amplitude value of a sampling point included in each frame of audio segment;

the frame volume acquisition unit 9042 includes: the volume peak value acquisition subunit is used for acquiring a volume peak value of each sampling point according to a first amplitude value of the sampling point contained in each frame of audio segment; and the frame sound volume calculating operator unit is used for calculating the frame sound volume of each audio segment by adopting the sound volume peak value of the sampling point contained in each audio segment. The volume peak acquisition subunit is specifically configured to: acquiring a first amplitude value of an nth sampling point and a volume peak value of an (n-1) th sampling point contained in each frame of audio segment, wherein n is a positive integer; and calculating the volume peak value of the nth sampling point according to the first amplitude value of the nth sampling point and the volume peak value of the (n-1) th sampling point.

In the step of calculating the volume peak value of the nth sample point according to the first amplitude value of the nth sample point and the volume peak value of the (n-1) th sample point, the step of calculating the volume peak value of the nth sample point is specifically performed as follows: calculating a first candidate volume peak value of the nth sampling point according to the first amplitude value of the nth sampling point; calculating a second candidate volume peak value of the nth sampling point according to the volume peak value and the attenuation value of the (n-1) th sampling point; the larger of the first candidate volume peak and the second candidate volume peak is determined as the volume peak of the nth sample point.

The number determining module 905 is configured to obtain a plurality of volume intervals, divide the frame volume of each frame of audio segment in the multiple frames of audio segments into the volume intervals to which the frame volume belongs, and determine the number of frame volumes included in each volume interval in the multiple volume intervals;

an interval determining module 906, configured to determine a target volume interval according to the number of frame volumes included in each volume interval. In an optional scenario, the interval determining module 906 is specifically configured to: acquiring the number of frame volumes contained in a first volume interval with the maximum decibel value in the plurality of volume intervals; and when the number of frame volumes contained in the first volume interval is larger than a target threshold value, determining the first volume interval as a target volume interval.

Optionally, the interval determining module 906 is further configured to determine, when the number of frame volumes included in the first volume interval is not greater than the target threshold, a volume interval with a maximum decibel value in remaining volume intervals that do not participate in the comparison as a second volume interval, and calculate a sum of the number of frame volumes included in the second volume interval from the first volume interval; and when the sum of the number of frame volumes contained in the first volume interval to the second volume interval is greater than the target threshold value, determining the second volume interval as a target volume interval.

Optionally, the interval determining module 906 is further configured to, when the sum of the numbers of frame volumes included in the first volume interval to the second volume interval is not greater than the target threshold, determine a volume interval with a maximum decibel value in the remaining volume intervals not participating in the comparison as the second volume interval, and calculate the sum of the numbers of frame volumes included in the first volume interval to the second volume interval.

A volume determining module 907, configured to determine an audio volume of the first audio data according to a target frame volume included in the target volume interval.

It should be noted that the units and the advantageous effects performed by the audio volume obtaining device described in the embodiments of the present invention can be implemented according to the steps performed by the audio volume obtaining device in the method embodiments shown in fig. 2 to fig. 7, and are not described herein again.

An embodiment of the present invention further provides a computer storage medium, where the computer storage medium may store a plurality of instructions, where the instructions are suitable for being loaded by a processor and being used to execute the method steps performed by the audio volume obtaining device in the embodiments shown in fig. 1 to 7, and a specific execution process may refer to specific descriptions of the embodiments shown in fig. 1 to 7, which is not described herein again.

Referring to fig. 10, a schematic structural diagram of another terminal is provided in the embodiment of the present invention. As shown in fig. 10, the terminal 1000 can include: the at least one processor 1001, e.g. CPU, communication interfaces, e.g. communication interfaces, comprise at least one network interface 1004 and a user interface 1003, a memory 1005, at least one communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), and the optional user interface 1003 may also include a standard wired interface or a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (non-volatile memory), such as at least one disk memory. The memory 1005 may optionally be at least one memory device located remotely from the processor 1001. As shown in fig. 10, a memory 1005, which is a kind of computer storage medium, may include therein an operating system, a network communication module, a user interface module, and an audio volume acquisition application program.

In the terminal 1000 shown in fig. 10, the user interface 1003 is mainly used as an interface for providing input to the user, for example, inputting audio data to be detected or the like; the processor 1001 may be configured to call an audio volume acquisition application stored in the memory 1005, and specifically perform the following operations:

In one possible embodiment, each frame of audio segment includes multiple sampling points, and the processor 1001 performs, in obtaining the frame volume of each frame of audio segment in the multiple frames of audio segments included in the first audio data, specifically:

acquiring the frame volume of each frame of audio segment according to the first amplitude value of the sampling point contained in each frame of audio segment;

in one possible embodiment, the processor 1001, during the step of obtaining the frame volume of each frame of audio segment according to the first amplitude value of the sampling point included in each frame of audio segment, specifically performs:

In one possible embodiment, the processor 1001, during the execution, obtains the volume peak value of each sample point according to the first amplitude value of the sample point included in each frame of audio segment, specifically performs:

In one possible embodiment, the processor 1001 calculates the volume peak value of the nth sample point according to the first amplitude value of the nth sample point and the volume peak value of the (n-1) th sample point, and specifically performs:

In a possible embodiment, the processor 1001, after determining the target volume interval according to the number of frame volumes included in each volume interval, specifically performs:

In a possible embodiment, in the process of determining the target volume interval according to the number of frame volumes included in each volume interval, the processor 1001 further performs:

In one possible embodiment, before performing the step of obtaining the frame volume of each frame of audio segments in the multiple frames of audio segments included in the first audio data, the processor 1001 further performs:

In one possible embodiment, the processor 1001 further performs, before performing the step of calculating the third amplitude value of the mth sample point in the dc-removed first audio data according to the second amplitude value of the mth sample point and the dc component:

For the specific implementation of the processor according to the embodiment of the present invention, reference may be made to the description of relevant contents in the foregoing embodiments, which are not repeated herein.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

While the invention has been described with reference to a number of embodiments, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. An audio volume acquisition method, comprising:

determining a target volume interval according to the number of frame volumes contained in each volume interval, and determining the audio volume of the first audio data according to the target frame volume contained in the target volume interval;

wherein, the determining the target volume interval according to the number of the frame volumes contained in each volume interval includes:

when the number of frame volumes contained in the first volume interval is larger than a target threshold value, determining the first volume interval as a target volume interval;

2. The method of claim 1, wherein each frame of audio segments comprises a plurality of sampling points, and wherein obtaining the frame volume of each frame of audio segments in the plurality of frames of audio segments comprised in the first audio data comprises:

3. The method as claimed in claim 2, wherein the obtaining the frame volume of each frame of audio segments according to the first amplitude value of the sampling point included in each frame of audio segments comprises:

and calculating the frame volume of each frame of audio segment by adopting the volume peak value of the sampling point contained in each frame of audio segment.

4. The method as claimed in claim 3, wherein the obtaining the volume peak value of each sample point according to the first amplitude value of the sample point included in each frame of audio segment comprises:

5. The method of claim 4, wherein calculating the volume peak value for the nth sample point from the first amplitude value for the nth sample point and the volume peak value for the (n-1) th sample point comprises:

6. The method of claim 1, further comprising:

7. The method as claimed in any one of claims 1-6, wherein before obtaining the frame volume of each frame of audio segment in the plurality of frames of audio segments included in the first audio data, the method further comprises:

8. The method of claim 7, wherein before calculating the first amplitude value of the mth sample point in the dc-processed first audio data according to the second amplitude value of the mth sample point and the dc component, the method further comprises:

9. The method as claimed in any one of claims 1-6, wherein before obtaining the frame volume of each frame of audio segment in the plurality of frames of audio segments included in the first audio data, the method further comprises:

10. An audio volume acquisition apparatus, comprising:

the volume determining module is used for determining the audio volume of the first audio data according to the target frame volume contained in the target volume interval;

wherein the interval determination module is specifically configured to:

11. The apparatus of claim 10, further comprising:

12. A computer storage medium, characterized in that it stores a plurality of instructions adapted to be loaded by a processor and to perform the method steps according to any of claims 1-9.

13. A terminal, comprising: a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the method steps of any of claims 1-9.