CN110931021A

CN110931021A - Audio signal processing method and device

Info

Publication number: CN110931021A
Application number: CN201911034571.6A
Authority: CN
Inventors: 张丝潆; 彭俊清; 王健宗
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2019-10-29
Filing date: 2019-10-29
Publication date: 2020-03-27
Anticipated expiration: 2039-10-29
Also published as: CN110931021B; WO2021082083A1

Abstract

The application discloses an audio signal processing method and device, wherein the audio signal processing method comprises the following steps: acquiring a first audio signal with an amplitude truncation; acquiring target data representing a clipping ratio of the first audio signal; if the target data belong to a target range, dividing the first audio signal into at least two audio segments; and carrying out amplitude-cutting detection processing on the at least two audio segments, and obtaining a second audio signal according to the audio segments after the amplitude-cutting detection processing. By adopting the technical scheme, effective audio signals can be reserved as much as possible, so that the availability ratio of the audio signals is greatly improved.

Description

Audio signal processing method and device

Technical Field

The present invention relates to the field of communications technologies, and in particular, to an audio signal processing method and apparatus.

Background

In the voiceprint recognition process, the preprocessing of the audio signal in the early stage is very critical, and the subsequent recognition accuracy is greatly influenced. Wherein the pre-processing comprises amplitude clipping detection of the audio signal. The amplitude of the audio signal is mainly due to the fact that the amplitude of the audio signal is too high and exceeds the maximum value of a sampling value range, and therefore amplitude truncation, also called truncation, occurs.

The truncation results in the loss of information in the speech signal, and in the prior art, once the truncation of a section of speech signal is detected, the section of speech signal is discarded, which results in the loss of a lot of valid speech signals.

Disclosure of Invention

Embodiments of the present invention provide an audio signal processing method and apparatus, which can retain more effective audio signals, so that the availability of the audio signals is greatly improved.

In a first aspect, an embodiment of the present invention provides an audio signal processing method, including:

acquiring a first audio signal with amplitude clipping, wherein the first audio signal comprises N sampling points, and N is a positive integer;

acquiring target data for representing an amplitude truncation ratio of the first audio signal, wherein the amplitude truncation ratio is used for representing a ratio between the number of sampling points with amplitude truncation in the N sampling points and the N;

if the target data belong to a target range, dividing the first audio signal into at least two audio segments;

and carrying out amplitude-cutting detection processing on the at least two audio segments, and obtaining a second audio signal according to the audio segments after the amplitude-cutting detection processing.

In a possible implementation manner, the performing amplitude-clipping detection processing on the at least two audio segments and obtaining a second audio signal according to the audio segment after amplitude-clipping detection processing includes:

for each of the at least two audio segments, detecting whether a clipping exists for the audio segment;

if the audio segment has amplitude truncation, discarding the audio segment;

obtaining the discarded residual audio segments of the at least two audio segments;

and obtaining a second audio signal according to the residual audio segment.

In one possible implementation, detecting whether the audio length of the second audio signal is greater than or equal to a first threshold;

if the audio length of the second audio signal is greater than or equal to the first threshold, determining that the second audio signal is an available audio signal;

and if the audio length of the second audio signal is smaller than the first threshold value, discarding the second voice signal.

In one possible implementation, each of the at least two audio segments includes at least one sampling point, and the detecting whether the audio segment has a truncation includes:

acquiring the amplitude value of each sampling point in at least one sampling point included in the audio segment;

if the amplitude value of the at least one sampling point meets a first condition, determining that amplitude truncation exists in the audio segment, wherein the first condition comprises: the amplitude value of the first number of consecutive sample points or the number of consecutive sample points greater than the first number is greater than the second threshold value.

In another possible implementation manner, before the obtaining the first audio signal with the truncated amplitude, the method further includes:

acquiring an amplitude value of each of N sampling points included in the first audio signal;

if the amplitude values of the N sampling points meet a first condition, determining that amplitude truncation exists in the first audio signal, wherein the first condition comprises the following steps: the amplitude value of the first number of consecutive sample points or the number of consecutive sample points greater than the first number is greater than the second threshold value.

In one possible implementation, the amplitude value of each of the N sampling points belongs to a target sampling range;

the second threshold is a product of a maximum value of the target sampling range and a target proportion.

dividing a target sampling range into at least two sub-ranges, wherein the at least two sub-ranges are not overlapped with each other, and the target sampling range is a range in which amplitude values of N sampling points included in the first audio signal are located;

counting the number of sampling points belonging to each sub-range of the at least two sub-ranges in the amplitude values of the N sampling points;

constructing a histogram, a horizontal axis of the histogram comprising the at least two sub-ranges and a vertical axis of the histogram comprising the number of sample points belonging to the sub-ranges;

and if the variation trend of the histogram meets a second condition, determining that the amplitude truncation exists in the first audio signal.

In one possible implementation, the obtaining target data representing a clipping ratio of the first audio signal includes:

determining a first sub-range from the at least two sub-ranges, wherein the amplitude value of the first sub-range is the sub-range with the largest amplitude value in the at least two sub-ranges;

acquiring the number of sampling points of which the amplitude values belong to the first sub-range from the N sampling points as a first number;

calculating a ratio between the first number and the N, and using the ratio as target data for representing a clipping ratio of the first audio signal.

In one possible implementation, the maximum value of the target range is a third threshold, and the minimum value of the target range is a fourth threshold, and the method further includes:

if the target data is greater than the third threshold, discarding the first audio signal;

and if the target data is smaller than the fourth threshold value, determining the first audio signal as a usable audio signal.

In a second aspect, an embodiment of the present invention provides an audio signal processing apparatus, including:

the device comprises a first acquisition unit, a second acquisition unit and a processing unit, wherein the first acquisition unit is used for acquiring a first audio signal with amplitude truncation, the first audio signal comprises N sampling points, and N is a positive integer;

a second acquisition unit configured to acquire target data indicating an amplitude truncation ratio of the first audio signal, the amplitude truncation ratio indicating a ratio between the number of samples having an amplitude truncation among the N samples and the N;

the first dividing unit is used for dividing the first audio signal into at least two audio segments if the target data belong to a target range;

and the third acquisition unit is used for carrying out amplitude-cutting detection processing on the at least two audio segments and acquiring a second audio signal according to the audio segments after the amplitude-cutting detection processing.

In a possible implementation manner, the third obtaining unit is specifically configured to:

if the audio segment has amplitude truncation, discarding the audio segment;

and obtaining a second audio signal according to the residual audio segment.

In one possible implementation, the apparatus further includes:

a detection unit for detecting whether the audio length of the second audio signal is greater than or equal to a first threshold;

a first determining unit, configured to determine that the second audio signal is an available audio signal if the audio length of the second audio signal is greater than or equal to the first threshold;

In one possible implementation manner, each of the at least two audio segments includes at least one sampling point, and the third obtaining unit detects whether amplitude truncation exists in the audio segment by obtaining an amplitude value of each of the at least one sampling point included in the audio segment;

In one possible implementation, the apparatus further includes:

a fourth obtaining unit, configured to obtain an amplitude value of each of N sampling points included in the first audio signal;

a second determining unit, configured to determine that amplitude truncation exists in the first audio signal if the amplitude values of the N sampling points satisfy a first condition, where the first condition includes: the amplitude value of the first number of consecutive sample points or the number of consecutive sample points greater than the first number is greater than the second threshold value.

In one possible implementation, the apparatus further includes:

the second dividing unit is used for dividing a target sampling range into at least two sub-ranges, wherein the at least two sub-ranges are not overlapped with each other, and the target sampling range is a range in which amplitude values of N sampling points included in the first audio signal are located;

the counting unit is used for counting the number of sampling points which belong to each sub-range of the at least two sub-ranges in the amplitude values of the N sampling points;

a construction unit for constructing a histogram, a horizontal axis of the histogram comprising the at least two sub-ranges and a vertical axis of the histogram comprising the number of sample points belonging to the sub-ranges;

and the third determining unit is used for determining that the amplitude truncation exists in the first audio signal if the variation trend of the histogram meets a second condition.

In an implementation manner that may be implemented, the second obtaining unit is specifically configured to:

In one possible implementation, the maximum value of the target range is a third threshold value, and the minimum value of the target range is a fourth threshold value, and the apparatus further includes a fourth determining unit;

the fourth determining unit is specifically configured to discard the first audio signal if the target data is greater than the third threshold;

In a third aspect, an embodiment of the present invention provides an audio signal processing apparatus, where the audio signal processing apparatus includes a processor, a memory, and a communication interface, where the processor, the memory, and the communication interface are connected to each other, where the communication interface is used to receive and send data, the memory is used to store program codes, and the processor is used to call the program codes to execute the method according to the first aspect.

In a fourth aspect, the present invention provides a computer-readable storage medium, which stores a computer program, and the computer program is executed by a processor to implement the method described above.

In the embodiment of the invention, after a first audio signal with amplitude clipping is acquired, a processing mode of the first audio signal is determined according to target data used for representing the amplitude clipping proportion of the first audio signal, if the target data belongs to a target range, the first audio signal is divided into at least two audio segments, amplitude clipping detection processing is carried out on the at least two audio segments, and a second audio signal is obtained according to the audio segments after amplitude clipping detection processing. The embodiment of the application does not simply discard the audio signal with the amplitude clipping, but further processes the audio signal with the amplitude clipping, so that the effective audio signal can be kept as much as possible, and the availability of the audio signal is greatly improved.

Drawings

In order to illustrate embodiments of the present invention or technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.

Fig. 1 is a flowchart of an audio signal processing method according to an embodiment of the present invention;

FIG. 2 is a waveform diagram of an audio signal with amplitude clipping according to an embodiment of the present invention;

FIG. 3 is a flow chart of another audio signal processing method according to an embodiment of the present invention;

FIG. 4 is a flowchart of a method for obtaining target data representing an amplitude truncation ratio of a first audio signal according to an embodiment of the present invention;

FIG. 5 is a flowchart of determining whether target data belongs to the content of a target scope according to an embodiment of the present invention;

fig. 6 is a flowchart of a method for performing amplitude truncation detection processing on an audio segment according to an embodiment of the present invention;

FIG. 7 is a flowchart of a method for determining whether to discard a second audio signal according to an embodiment of the present invention;

FIG. 8 is a flowchart of another audio signal processing method according to an embodiment of the present invention;

FIG. 9 is a histogram without clipping according to an embodiment of the present invention;

FIG. 10 is a histogram with clipping according to an embodiment of the present invention;

fig. 11 is a schematic structural diagram of an audio signal processing apparatus according to an embodiment of the present invention;

fig. 12 is a schematic structural diagram of another audio signal processing apparatus according to an embodiment of the present invention;

fig. 13 is a schematic structural diagram of another audio signal processing apparatus according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention.

An audio signal processing method according to an embodiment of the present invention will be described in detail below with reference to fig. 1 to 10.

Referring to fig. 1, a flow chart of an audio signal processing method according to an embodiment of the invention is shown. As shown in fig. 1, the audio signal processing method of an embodiment of the present invention may include the following steps S101 to S104.

S101, acquiring a first audio signal with amplitude clipping, wherein the first audio signal comprises N sampling points, and N is a positive integer;

in this embodiment, the first audio signal may include a voice data signal in an instant messaging process, or a music data signal recorded on the spot, and the embodiment of the present application is not limited thereto.

In this embodiment, a manner of acquiring the first audio signal may be to perform amplitude clipping detection processing on a plurality of audio signals, determine whether amplitude clipping exists in the audio signals, and then acquire at least one audio signal in which amplitude clipping exists.

The first audio signal comprises N sampling points, the amplitude value of each sampling point belongs to a preset target sampling range, the target sampling range is determined by the number of bits used for storing the amplitude value, for example, if 16 bits are used for storing the amplitude value, the target sampling range is 2-15 to 215-1, namely-32768 to 32767.

Optionally, the process of sampling and quantizing the original analog signal to obtain the first audio signal may be to sample the original analog signal to obtain N sampling points. Wherein, the sampling frequency can be 8kHz, and 8000 sampling points exist in 1s time. Then, the original amplitude value of each sampling point in each sampling point is quantized, as shown in fig. 2, if the original amplitude value of a certain sampling point exceeds the maximum value of the target sampling range, the maximum value in the target sampling range is used for representing, and if the original amplitude value of a certain sampling point exceeds the minimum value of the target sampling range, the minimum value in the target sampling range is used for representing. After quantization, the original amplitude value of each sampling point can be limited to N amplitude values within the target sampling range, and one sampling point corresponds to one amplitude value. As shown in fig. 2, after the sampling quantization step, there is a clipping of the first audio signal.

It should be noted that the sampling frequency may be other frequencies, which may be customized according to the needs of the user, and in addition, the bit number for storing the amplitude value may also be other bit numbers, which may be set according to the sampling range needed by the user.

Optionally, the original analog signal may be sampled and quantized by calculating an amplitude value function to obtain N amplitude values corresponding to N sampling points included in the first audio signal, for example, setting a sampling frequency of the amplitude value calculation function and a bit number used for storing the amplitude values, and inputting the original analog signal into the amplitude value calculation function, so as to obtain the first audio signal.

S102, acquiring target data for representing an amplitude truncation ratio of the first audio signal, wherein the amplitude truncation ratio is used for representing the ratio of the number of amplitude-truncated sampling points in the N sampling points to N;

in this embodiment, target data used for representing the amplitude truncation ratio of the first audio signal is obtained by analyzing N amplitude values corresponding to N sampling points of the first audio signal, where the target data may be the amplitude truncation ratio itself or other data capable of reflecting the amplitude truncation ratio, and for example, the target data may be within a preset range of the amplitude truncation ratio.

In an alternative embodiment, the target data for representing the clipping ratio of the first audio signal may be obtained by first analyzing the amplitude value of each of the N sampling points included in the first audio signal, determining the sampling points with clipping, and calculating a ratio between the number of the sampling points with clipping and the total number N of the sampling points, where the ratio is the target data. Optionally, the method for analyzing the first audio signal to determine that there are samples with clipping may be to determine whether there are consecutive first numbers of samples or amplitude values of consecutive samples greater than the first number, where the first number may be 3, greater than a second threshold value, where the second threshold value may be 90% of the maximum value of the target sampling range, for example, if the amplitude values of consecutive 5 samples are greater than the second threshold value, the 5 samples are regarded as the samples with clipping.

In another alternative embodiment, the target data representing the amplitude truncation ratio of the first audio signal may be obtained by counting, among the plurality of sampling points, the number of sampling points whose amplitude values exceed the first threshold in the plurality of sampling points in the first audio signal, and then calculating a ratio between the number and the number N of all the sampling points, where the ratio is the target data. Because the amplitude values of the amplitude-truncated sampling points are all larger and exceed the first threshold, if the amplitude-truncated proportion of the first audio signal is larger, the calculated ratio is also larger, so that the amplitude-truncated proportion can be indirectly reflected through the calculated ratio, but the ratio is not the amplitude-truncated proportion, wherein the value of the first threshold can be set, and is continuously checked and updated to obtain a more reasonable first threshold.

In yet another alternative embodiment, the manner of obtaining the target data representing the amplitude-cut ratio of the first audio signal may also be, please refer to fig. 3, which schematically shows a flow of obtaining the target data representing the amplitude-cut ratio of the first audio signal, including but not limited to steps S21-S23;

s21, determining a first sub-range from at least two sub-ranges;

specifically, optionally, a target sampling range to which N sampling values corresponding to N sampling points of the first audio signal belong is divided into at least two sub-ranges, and the at least two sub-ranges are not overlapped with each other. The division mode is not limited in the present application, and may be equal or unequal. The at least two subranges mentioned above can be 22, or 24, 30 or other values.

A first sub-range is determined from the at least two mutually non-overlapping sub-ranges, where the first sub-range refers to a sub-range having an amplitude value with a maximum amplitude value of the at least two sub-ranges, for example, the at least two sub-ranges include [0,7], [8,15], [16,23], and the first sub-range is [16,23 ].

S22, acquiring the number of sampling points of which the amplitude values belong to the first sub-range from the N sampling points as a first number;

s23, calculating a ratio between the first number and the N, and using the ratio as target data for representing the clipping ratio of the first audio signal.

As described above, most of the number of amplitude values of the sampling points in the first audio signal that belong to the first sub-range is truncated, so the ratio between the first number and the total number N of sampling points is calculated as target data that can reflect the size of the truncation ratio of the first audio signal.

S103, if the target data belong to a target range, dividing the first audio signal into at least two audio segments;

in this embodiment, after target data is acquired, whether the acquired target data belongs to a target range is determined, and if the acquired target data belongs to the target range, the first audio signal is divided into at least two audio segments, where the division may be performed in a manner of dividing by taking 1 second as a unit, or may be performed in other time units, such as 5 seconds.

The target range may be 60% to 80%, and it is understood that the target range may be other ranges, and the embodiments of the present application are not limited thereto.

Referring to fig. 4, a flowchart of a process for providing target data belonging to different ranges according to an embodiment of the present application is shown, which includes steps S31-S35:

s31, acquiring target data;

s32, determining whether the target data belongs to the target range, if the target data belongs to the target range, performing step S33, if the target data does not belong to the target range, performing step S34 or S35, if the target data is greater than the third threshold, performing step S35, if the target data is less than the fourth threshold, performing step S34; wherein the third threshold is a maximum value of the target range, and the fourth threshold is a minimum value of the target range.

S33, dividing the first audio signal into at least two audio segments, and carrying out amplitude-cutting detection processing on the at least two audio segments;

s34, determining the first audio signal as a usable audio signal;

s35, the first audio signal is discarded.

There are two conclusions to determine whether the target data falls within the target range, one is: the target data belongs to the target range, and please refer to step S104 specifically, which is not described herein again. The other is as follows: the target data do not belong to the target range, if the target data are larger than the third threshold, it is indicated that the ratio of the number of the amplitude-truncated sampling points to the total number of the sampling points in the first audio signal is too high, the number of the amplitude-truncated sampling points is too large, if the training of the voiceprint recognition model is performed by using the first audio signal, the verification rate of voiceprint recognition can be reduced, and the first audio signal is discarded. If the target data is smaller than the fourth threshold, it is indicated that the ratio between the number of amplitude-truncated sampling points in the first audio signal and the number of total sampling points is small, the number of amplitude-truncated sampling points is small, which is not enough to affect the damage of the information of the first audio signal, and has almost no influence on the subsequent actual processing, so that the audio signal is directly input into the system for the subsequent processing without audio segment division and amplitude detection processing of the audio segment, for example, the first speech signal can be directly used for training a voiceprint recognition model.

S104, performing amplitude-cutting detection processing on the at least two audio segments, and obtaining a second audio signal according to the audio segments after the amplitude-cutting detection processing.

In this embodiment, if the target data belongs to the target range, the first audio signal may be divided into at least two audio segments, the dividing method may be average dividing, that is, the duration of each audio segment is the target duration, the target duration may be 1s or 5s, and the like, and whether there is a truncation in each speech segment is detected in turn.

The detection method for detecting whether each audio segment has an amplitude cut may be to detect whether, in each speech segment, absolute values of amplitude values of a first number of consecutive sample points or of the sample points greater than the first number are both greater than a second threshold, where the first number may be 3, the second threshold may be a product of a maximum value in a sample value range and a target proportion, the target proportion may be 90%, that is, 32768 × 0.9 ≈ 29491, and if absolute values of amplitude values of three or more consecutive sample points in one audio segment all exceed 90% of the maximum value in the sample value range, it is determined that the audio segment has an amplitude cut, and the speech segment is discarded. It should be noted that the 90% ratio may be other ratios, such as 91%, 89%, 85%, 95%, etc., that is, it may be around 90%, the first number may be 3, or may be other values, and there may be a mutual constraint relationship between the first number, the target ratio, and the sampling frequency.

And if the absolute values of the amplitude values of the three continuous sampling points do not exceed 90% of the maximum value in the sampling value range, the fact that the amplitude of the audio segment is not intercepted is indicated, the audio segment is determined to be an available audio segment, and the audio segment is reserved. The above-mentioned mode of going to detect whether there is the truncation through the speech section can avoid the discontinuous condition of remaining speech section.

Optionally, the second audio signal may be obtained according to the result of the amplitude-cut detection processing on the at least two audio segments, for example, the audio segments with amplitude cut in the at least two audio segments are discarded, the audio segments without amplitude cut are retained, and then all the audio segments without amplitude cut are combined into the second audio signal according to the time sequence.

As for the step S104 mentioned in the above embodiment, reference may be made to fig. 5, which is a schematic diagram of the processing for amplitude-cut detection of an audio segment proposed by the present application, including, but not limited to, steps S41-S44;

s41, detecting whether the audio segment has amplitude truncation or not for each audio segment in the at least two audio segments;

s42, if the audio segment has clipping, discarding the audio segment;

s43, obtaining the discarded residual audio segment in the at least two audio segments;

and S44, obtaining a second audio signal according to the residual audio segment.

Optionally, after obtaining the second audio segment, in order to ensure whether the second audio signal obtained from the remaining audio segments can be used in subsequent systems, whether the audio length of the second audio signal meets a certain condition may be detected, and the determining method may refer to fig. 6, where a flowchart of a method for determining whether to discard the second audio signal is shown in the figure, including but not limited to steps S51-S54;

s51, obtaining a second audio signal;

s52, detecting whether the audio length of the second audio signal is larger than or equal to a first threshold value; if the audio length of the second audio signal is smaller than the first threshold, go to step S53, and if the audio length of the second audio signal is greater than or equal to the first threshold, go to step S54;

s53, discarding the second audio signal;

s54, determining the second audio signal as a usable audio signal;

the first threshold mentioned above refers to the length of an audio signal that can be input into a subsequent system for processing, for example, in a text-independent voiceprint registration scene, a length of a speech signal that needs to be registered reaches 20S, so it can be determined whether the audio length of the second audio signal is greater than or equal to 20S, if so, the second audio signal is retained, and a voiceprint recognition model is trained by using the second audio signal.

In another embodiment, before the first audio signal with the clipping amplitude is obtained in step S101, it may be determined whether the clipping amplitude exists in the first audio signal, and optionally, the detection manner for determining whether the clipping amplitude exists in the first audio signal includes, but is not limited to, the following two optional embodiments, the first optional embodiment is shown in fig. 7 and includes, but is not limited to, steps S201 to S202, and the second optional embodiment is shown in fig. 8 and includes, but is not limited to, steps S301 to S304, which are specifically set forth below:

the first alternative embodiment is:

s201, acquiring amplitude values of N sampling points included in the first audio signal;

in this embodiment, the first audio signal is sampled, and referring to step S101, the original analog signal may be sampled and quantized by calculating an amplitude value function, so as to obtain N amplitude values corresponding to N sampling points included in the first audio signal.

S202, if the amplitude value of the first audio signal meets a first condition, determining that amplitude truncation exists in the first audio signal.

Wherein the first condition comprises: the amplitude value of the first number of consecutive sample points or the number of consecutive sample points greater than the first number is greater than the second threshold value. If the amplitude values of the sampling points of the first audio signal satisfy: the amplitude values of a first number of consecutive sample points or a number of consecutive sample points greater than the first number are greater than a second threshold value, it may be determined that clipping is present in the first audio signal. Please refer to step S104, which is not described herein.

The second alternative embodiment is:

s301, dividing a target sampling range into at least two sub-ranges;

s302, counting the number of sampling points belonging to each sub-range of the at least two sub-ranges in the amplitude values of the N sampling points;

s302, constructing a histogram;

And counting the number of sampling points of which the amplitude values belong to each sub-range in the N sampling points, and constructing a histogram. The horizontal axis of the histogram may be the sub-range, and the vertical axis may be the number of sampling points, of the N sampling points in the first audio signal, whose amplitude values belong to the respective sub-ranges.

S304, if the variation trend of the histogram meets a second condition, determining that the first audio signal has amplitude truncation;

as shown in fig. 9 and 10, the target sampling range is divided equally into 22 sub-ranges in order of magnitude, and as shown in fig. 9, if there is no clipping in the first audio signal, the number of occurrences of the amplitude value gradually decreases as the value of the sub-range interval increases; as shown in fig. 10, if the amplitude truncation exists in the first audio signal, when the value of the sub-range interval is the highest, the frequency of occurrence of the amplitude value also reaches the highest, a phenomenon that the last column of the histogram is higher than all the preceding columns occurs, that is, the frequency value of the last sub-range of the histogram is the highest, the frequency value represented by the last column is referred to as an abnormally-rising portion, and the second condition is that the abnormally-rising portion exists in the histogram.

If the amplitude truncation does not exist in the first audio signal, the waveform of the audio signal is relatively gentle, and most amplitude values of the N sampling points are relatively small.

In this embodiment, before the first audio signal with clipping exists in step S101, it is determined whether clipping exists in the first audio signal, and at least one audio signal with clipping exists is obtained from the multiple audio signals.

Fig. 11 is a schematic structural diagram of an audio signal processing apparatus according to an embodiment of the present invention. As shown in fig. 11, the audio signal processing apparatus according to an embodiment of the present invention may include:

a first obtaining unit 11, configured to obtain a first audio signal with an amplitude truncation, where the first audio signal includes N sampling points, where N is a positive integer;

Optionally, the process of sampling and quantizing the original analog signal to obtain the first audio signal may be to sample the original analog signal to obtain N sampling points. Wherein, the sampling frequency can be 8kHz, and 8000 sampling points exist in 1s time. Then, the original amplitude value of each sampling point in each sampling point is quantized, as shown in fig. 2, if the original amplitude value of a certain sampling point exceeds the maximum value of the target sampling range, the maximum value in the target sampling range is used for representing, and if the original amplitude value of a certain sampling point exceeds the minimum value of the target sampling range, the minimum value in the target sampling range is used for representing. After quantization, the original amplitude value of each sampling point can be limited to N amplitude values within the target sampling range, and one sampling point corresponds to one amplitude value.

A second obtaining unit 12 configured to obtain target data indicating an amplitude truncation ratio of the first audio signal, where the amplitude truncation ratio is used to indicate a ratio between the number of samples having amplitude truncation among the N samples and the N;

In yet another alternative embodiment, the second obtaining unit is specifically configured to, please refer to fig. 3, schematically show a process of obtaining target data representing a clipping ratio of the first audio signal, including but not limited to steps S21-S23;

s21, determining a first sub-range from at least two sub-ranges;

A first dividing unit 13, configured to divide the first audio signal into at least two audio segments if the target data belongs to a target range;

s31, acquiring target data;

s34, determining the first audio signal as a usable audio signal;

s35, the first audio signal is discarded.

A third obtaining unit 14, configured to perform amplitude-clipping detection processing on the at least two audio segments, and obtain a second audio signal according to the audio segment after the amplitude-clipping detection processing;

Optionally, the third obtaining module is specifically configured to obtain the second audio signal according to the result of the amplitude-cut detection processing on the at least two audio segments, for example, discard audio segments with amplitude cut in the at least two audio segments, retain audio segments without amplitude cut, and then combine all audio segments without amplitude cut into the second audio signal according to a time sequence.

s42, if the audio segment has clipping, discarding the audio segment;

s51, obtaining a second audio signal;

s53, discarding the second audio signal;

s54, determining the second audio signal as a usable audio signal;

In an embodiment, the third obtaining unit is specifically configured to:

if the audio segment has amplitude truncation, discarding the audio segment;

and obtaining a second audio signal according to the residual audio segment.

Optionally, as shown in fig. 12, the apparatus further includes:

In one embodiment, each of the at least two audio segments includes at least one sampling point, and the third obtaining unit detects whether amplitude truncation exists in the audio segment by obtaining an amplitude value of each of the at least one sampling point included in the audio segment;

Optionally, as shown in fig. 12, the apparatus further includes:

In one embodiment, the amplitude value of each of the N sampling points belongs to a target sampling range;

Optionally, as shown in fig. 12, the apparatus further includes:

In an embodiment, the second obtaining unit is specifically configured to:

Optionally, the maximum value of the target range is a third threshold, and the minimum value of the target range is a fourth threshold, and the apparatus further includes a fourth determining unit;

In the embodiment of the invention, after the first audio signal with the amplitude clipping is obtained, the processing mode of the first audio signal is determined according to the target data for representing the amplitude clipping proportion of the first audio signal, and the second audio signal is obtained after the amplitude clipping detection processing. The embodiment of the application does not simply discard the audio signal with the amplitude clipping, but further processes the audio signal with the amplitude clipping, so that the effective audio signal can be kept as much as possible, and the availability of the audio signal is greatly improved.

Referring to fig. 13, which is a schematic structural diagram of another audio signal processing apparatus according to an embodiment of the present invention, as shown in fig. 13, the audio signal processing apparatus 1000 may include: at least one processor 1001, such as a CPU, at least one communication interface 1003, memory 1004, at least one communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The communication interface 1003 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1004 may be a high-speed RAM memory or a non-volatile memory (e.g., at least one disk memory). The memory 1004 may optionally be at least one storage device located remotely from the processor 1001. As shown in fig. 13, memory 1004, which is a type of computer storage medium, may include an operating system, a network communication module, and program instructions.

In the audio signal processing apparatus 1000 shown in fig. 13, the processor 1001 may be configured to load program instructions stored in the memory 1004 and specifically perform the following operations:

Optionally, before acquiring the first audio signal with the truncated amplitude, the method further includes:

Optionally, the obtaining target data representing the clipping ratio of the first audio signal includes:

Optionally, the performing amplitude-clipping detection processing on the at least two audio segments and obtaining a second audio signal according to the audio segment after amplitude-clipping detection processing includes:

if the audio segment has amplitude truncation, discarding the audio segment;

and obtaining a second audio signal according to the residual audio segment.

Optionally, each of the at least two audio segments includes at least one sampling point, and the determining whether there is a truncation of the audio segment includes:

Optionally, the amplitude value of each of the N sampling points belongs to a target sampling range, and the second threshold is a product of a maximum value of the target sampling range and a target proportion.

Optionally, the maximum value of the target range is a third threshold, and the minimum value of the target range is a fourth threshold, and the method further includes:

Optionally, the processor 1001 may also be configured to load program instructions stored in the memory 1004 for performing the following operations:

detecting whether the audio length of the second audio signal is greater than or equal to a first threshold;

It should be noted that, for a specific implementation process, reference may be made to specific descriptions of the method embodiment shown in fig. 1, which are not described herein again.

For specific execution steps, reference may be made to the description of the foregoing embodiments, which are not described herein again.

An embodiment of the present invention further provides a computer storage medium, where the computer storage medium may store a plurality of instructions, where the instructions are suitable for being loaded by a processor and executing the method steps in the embodiment shown in fig. 1, and a specific execution process may refer to a specific description of the embodiment shown in fig. 1, which is not described herein again.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and includes processes of the embodiments of the methods described above when the computer program is executed. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

Claims

1. An audio signal processing method, comprising:

2. The method of claim 1, wherein said performing amplitude clipping detection processing on said at least two audio segments and obtaining a second audio signal based on said amplitude-clipped, detection processed audio segments comprises:

if the audio segment has amplitude truncation, discarding the audio segment;

and obtaining a second audio signal according to the residual audio segment.

3. The method of claim 2, wherein the method further comprises:

4. The method of claim 2, wherein each of the at least two audio segments includes at least one sample point, the detecting whether an amplitude truncation exists for the audio segment comprising:

5. The method of claim 1, wherein prior to obtaining the first audio signal in which the clipping is present, further comprising:

6. The method of claim 1, wherein prior to obtaining the first audio signal in which the clipping is present, further comprising:

7. The method of claim 6, wherein obtaining target data representing a truncation ratio of the first audio signal comprises:

8. An audio signal processing apparatus, comprising:

9. An audio signal processing apparatus comprising a processor, a memory and a communication interface, the processor, the memory and the communication interface being interconnected, wherein the communication interface is configured to receive and transmit data, the memory is configured to store program code, and the processor is configured to invoke the program code to perform the method of any of claims 1 to 7.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which is executed by a processor to implement the method of any one of claims 1 to 7.