WO2021082083A1

WO2021082083A1 - Audio signal processing method and device

Info

Publication number: WO2021082083A1
Application number: PCT/CN2019/118444
Authority: WO
Inventors: 张丝潆; 彭俊清; 王健宗
Original assignee: 平安科技（深圳）有限公司
Priority date: 2019-10-29
Filing date: 2019-11-14
Publication date: 2021-05-06
Also published as: CN110931021A; CN110931021B

Abstract

The present application discloses an audio signal processing method and device. The audio signal processing method comprises: acquiring a first audio signal having amplitude clipping; acquiring target data for representing the amplitude clipping ratio of the first audio signal; if the target data belongs to a target range, dividing the first audio signal into at least two audio segments; performing amplitude clipping detection on the at least two audio segments, and obtaining a second audio signal according to the audio segments on which the amplitude clipping detection has been performed. By means of the technical solution of the present application, valid audio signals can be retained as many as possible, so that the usability of the audio signals is greatly improved.

Description

Audio signal processing method and device

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on October 29, 2019, the application number is 201911034571.6, and the application name is "a method and device for audio signal processing", the entire content of which is incorporated into this application by reference in.

Technical field

This application relates to the field of communication technology, and in particular to an audio signal processing method and device.

Background technique

In the process of voiceprint recognition, the pre-processing of the audio signal is very critical, which has a great influence on the accuracy of subsequent recognition. Among them, the pre-processing includes clipping detection of the audio signal. The clipping of the audio signal is mainly due to the excessively high amplitude of the audio signal, which exceeds the maximum value of the sampling value range, and thus the clipping occurs, which is also called the phenomenon of clipping.

Clipping can cause damage to the information in the voice signal. In the prior art, once a clip of a voice signal is detected, the voice signal is discarded. This method will cause the loss of many effective voice signals.

Summary of the invention

The embodiments of the present application provide an audio signal processing method and device, which can retain more effective audio signals, so that the usable rate of the audio signals is greatly improved.

In the first aspect, an embodiment of the present application provides an audio signal processing method, including:

Acquiring a first audio signal with clipping, where the first audio signal includes N sampling points, where N is a positive integer;

Acquiring target data used to represent a clipping ratio of the first audio signal, where the clipping ratio is used to represent a ratio between the number of sample points with clipping in the N sampling points and the N;

If the target data belongs to the target range, dividing the first audio signal into at least two audio segments;

Clipping detection processing is performed on the at least two audio segments, and a second audio signal is obtained according to the audio segment after the clipping detection processing.

In a second aspect, an embodiment of the present application provides an audio signal processing device, including:

The first acquiring unit is configured to acquire a first audio signal with clipping, where the first audio signal includes N sampling points, where N is a positive integer;

The second acquiring unit is configured to acquire target data used to represent the clipping ratio of the first audio signal, where the clipping ratio is used to represent the number of sample points with clipping in the N sampling points and the total The ratio between N;

A first dividing unit, configured to divide the first audio signal into at least two audio segments if the target data belongs to a target range;

The third acquisition unit is configured to perform clipping detection processing on the at least two audio segments, and obtain a second audio signal according to the audio segment after the clipping detection processing.

In a third aspect, an embodiment of the present application provides an audio signal processing device. The audio signal processing device includes a processor, a memory, and a communication interface. The processor, the memory, and the communication interface are connected to each other. For receiving and sending data, the memory is used to store program code, and the processor is used to call the program code to execute the method described in the first aspect.

In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and the computer program is executed by a processor to implement the above-mentioned method.

In the embodiment of the present application, after the first audio signal with clipping is obtained, the processing method for the first audio signal is determined according to the target data used to indicate the clipping ratio of the first audio signal. If the target The data belongs to the target range, the first audio signal is divided into at least two audio segments, the at least two audio segments are subjected to clipping detection processing, and the second audio segment is obtained according to the audio segment after the clipping detection processing. audio signal. The embodiments of this application do not simply discard the audio signal with clipping, but further process the audio signal with clipping, so as to retain as many valid audio signals as possible, so that the usable rate of the audio signal is larger. Promote.

Description of the drawings

In order to illustrate the technical solutions in the embodiments of the present application or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art.

FIG. 1 is a flowchart of an audio signal processing method provided by an embodiment of this application;

Fig. 2 is a waveform diagram of an audio signal with clipping provided by an embodiment of the application;

Fig. 3 is a flowchart of another audio signal processing method provided by an embodiment of the present application;

FIG. 4 is a flowchart of a method for obtaining target data representing the clipping ratio of a first audio signal according to an embodiment of the application;

FIG. 5 is a flow chart for determining whether target data belongs to the content of the target range provided by an embodiment of the application;

FIG. 6 is a flowchart of a method for performing clipping detection processing on an audio segment according to an embodiment of the application;

FIG. 7 is a flowchart of a method for determining whether to discard a second audio signal according to an embodiment of the application;

FIG. 8 is a flowchart of yet another audio signal processing method provided by an embodiment of this application;

FIG. 9 is a histogram without clipping provided by an embodiment of the application;

FIG. 10 is a histogram with cutout provided by an embodiment of the application;

FIG. 11 is a schematic structural diagram of an audio signal processing device provided by an embodiment of the application;

FIG. 12 is a schematic structural diagram of another audio signal processing device provided by an embodiment of the application;

FIG. 13 is a schematic structural diagram of another audio signal processing device provided by an embodiment of the application.

Detailed ways

The technical solutions in the embodiments of the present application will be described below in conjunction with the drawings in the embodiments of the present application.

Hereinafter, an audio signal processing method provided by an embodiment of the present application will be introduced in detail with reference to FIG. 1 to FIG. 10.

Please refer to FIG. 1, which provides a schematic flowchart of an audio signal processing method according to an embodiment of this application. As shown in FIG. 1, the audio signal processing method of the embodiment of the present application may include the following steps S101 to S104.

S101: Acquire a first audio signal with clipping, where the first audio signal includes N sampling points, where N is a positive integer;

In this embodiment, the first audio signal may include a voice data signal in the instant messaging process, or may be a music data signal recorded on site, etc., which is not limited in the embodiment of the present application.

Wherein, the method for acquiring the first audio signal in this embodiment may be to perform clipping detection processing on multiple audio signals, determine whether the audio signal has clipping, and then obtain at least one audio signal with clipping. The implementation of this application The first audio signal in this example may be any one of the at least one audio signal.

Wherein, the first audio signal includes N sampling points, and the amplitude value of each sampling point belongs to a preset target sampling range, and the target sampling range is determined by the number of bits used to store the amplitude value, for example, if 16bit is used to store the amplitude value , The target sampling range is 2-15 to 215-1, which is -32768 to 32767.

Optionally, the process of sampling and quantizing the original analog signal to obtain the first audio signal may be sampling the original analog signal to obtain N sampling points. Among them, the sampling frequency can be 8kHz, that is, there are 8000 sampling points within 1s. Then quantize the original amplitude value of each sampling point in each sampling point. As shown in Figure 2, if the original amplitude value of a sampling point exceeds the maximum value of the target sampling range, the maximum value in the target sampling range Indicates that if the original amplitude value of a sampling point exceeds the minimum value of the target sampling range, it is expressed as the minimum value within the target sampling range. After quantization, the original amplitude value of each sampling point can be limited to N amplitude values within the target sampling range, and one sampling point corresponds to one amplitude value. As shown in FIG. 2, after the sampling and quantization steps, the first audio signal has a clipping.

It should be noted that the above sampling frequency can also be other frequencies, which can be customized according to the needs of the user. In addition, the number of bits used to store the amplitude value can also be other bits, which can be set according to the sampling range required by the user.

Optionally, the original analog signal can be sampled and quantized by calculating the amplitude value function to obtain N amplitude values corresponding to the N sampling points contained in the first audio signal. For example, setting the sampling frequency of the calculated amplitude value function and using In order to store the number of bits of the amplitude value, and input the original analog signal into the calculated amplitude value function, the first audio signal is obtained.

S102. Obtain target data that is used to represent a clipping ratio of the first audio signal, where the clipping ratio is used to represent the difference between the number of sample points with clipping in the N sampling points and the N ratio;

In this embodiment, by analyzing the N amplitude values corresponding to the N sampling points of the first audio signal, the target data representing the clipping ratio of the first audio signal is obtained, where the target data may be The clipping ratio itself may also be other data that can reflect the size of the clipping ratio. For example, the target data may be within a preset range of the clipping ratio.

In an optional implementation manner, the method of obtaining target data used to indicate the clipping ratio of the first audio signal may be: first, the amplitude of each of the N sampling points included in the first audio signal Analyze the value, determine the sampling points with clipping, and calculate the ratio between the number of clipping points and the total number of sampling points N. This ratio is the target data. In this embodiment, the target data is Is the cut-off ratio itself. Optionally, the method for analyzing the first audio signal to determine the sample points with clipping may be to determine whether there is a first number of consecutive or continuously greater than the first number of sample points whose amplitude value is greater than a second threshold, A quantity can be 3, and the second threshold can be 90% of the maximum value of the target sampling range. For example, if the amplitude values of 5 consecutive sampling points are all greater than the second threshold, then the 5 sampling points will be regarded as those with clipping Sampling point.

In another optional implementation manner, the method of obtaining target data used to indicate the clipping ratio of the first audio signal may also be, among multiple sampling points, counting multiple sampling points in the first audio signal The number of sampling points whose amplitude value in exceeds the first threshold, and then the ratio between this number and the number N of all sampling points is calculated, and the ratio is the target data. Since the amplitude values of the sample points with clipping are relatively large, and the amplitude value exceeds the first threshold, if the clipping ratio of the first audio signal is relatively large, the calculated ratio will also be relatively large. Therefore, it can be calculated by The ratio of indirectly reflects the size of the clipping ratio, but the ratio is not the clipping ratio itself. Among them, the value of the first threshold can be set, and check and update continuously to obtain a more reasonable first threshold.

In yet another optional implementation manner, the method of obtaining target data representing the cut-off ratio of the first audio signal may also be, referring to FIG. 3, which schematically illustrates obtaining the target data representing the first audio signal The flow of the target data of the clip ratio includes but not limited to steps S21-S23;

S21: Determine a first sub-range from at least two sub-ranges;

Specifically, optionally, the target sampling range to which the N sampling values corresponding to the N sampling points of the first audio signal belong is divided into at least two subranges, and the at least two subranges do not overlap with each other. The division method is not limited in this application, and it can be divided equally or unevenly. The aforementioned at least two sub-ranges can be 22, or 24, 30, or other numerical values.

The first sub-range is determined from the at least two non-overlapping sub-ranges, and the first sub-range refers to the sub-range with the largest amplitude value among the at least two sub-ranges. For example, the at least two sub-ranges include [0,7], [8,15], [16,23], the first sub-range is [16,23].

S22: Acquire the number of sampling points whose amplitude values belong to the first sub-range among the N sampling points as the first number;

S23: Calculate the ratio between the first number and the N, and use the ratio as target data for representing the clipping ratio of the first audio signal.

As mentioned above, most of the amplitude values of the sampling points in the first audio signal that belong to the first sub-range are clipped, so the ratio between the first number and the total number of sampling points N is calculated as Target data, the target data can reflect the size of the clipping ratio of the first audio signal.

S103: If the target data belongs to a target range, divide the first audio signal into at least two audio segments;

In this embodiment, after the target data is acquired, it is determined whether the acquired target data belongs to the target range. If it belongs to the target range, the first audio signal is divided into at least two audio segments. The division method may be It can be divided in units of 1 second, or can also be divided in other time units, such as 5 seconds.

Wherein, the target range may be 60% to 80%. It is understood that the target range may also be other ranges, which are not limited in the embodiments of the present application.

Please refer to FIG. 4, which is a flowchart of processing target data belonging to different ranges according to an embodiment of this application. As shown in the figure, it includes steps S31-S35:

S31: Obtain target data;

S32. Determine whether the target data belongs to the target range. If the target data belongs to the target range, perform step S33; if the target data does not belong to the target range, perform step S34 or S35; if the target data is greater than the third threshold, perform step S35. If the data is less than the fourth threshold value, step S34 is executed; wherein, the third threshold value is the maximum value of the target range, and the fourth threshold value is the minimum value of the target range.

S33: Divide the first audio signal into at least two audio segments, and perform clipping detection processing on the at least two audio segments;

S34: Determine the first audio signal as an available audio signal;

S35. Discard the first audio signal.

There are two conclusions for determining whether the target data belongs to the target range. One is: the target data belongs to the target range. For details, please refer to step S104, which will not be repeated here. The other is: the target data does not belong to the target range. If the target data is greater than the third threshold, it means that the ratio of the number of sample points with clipping to the total number of sample points in the first audio signal is too high, and there are samples with clipping There are too many points. If the first audio signal is used to train the voiceprint recognition model, the verification rate of the voiceprint recognition will be reduced, and the first audio signal is discarded. If the target data is less than the fourth threshold, it means that the ratio between the number of sample points with clipping in the first audio signal and the total number of sample points is relatively small, and the number of sample points with clipping is relatively small, which is not enough to affect the first audio signal. The information of an audio signal is damaged and has almost no effect on the subsequent actual processing. Therefore, it is not necessary to divide the audio segment and the clipping detection processing of the audio segment, but directly enter the system for subsequent processing. For example, the first one can be directly used. The voice signal is trained on the voiceprint recognition model.

S104: Perform clipping detection processing on the at least two audio segments, and obtain a second audio signal according to the audio segment after the clipping detection processing.

In this embodiment, if the target data belongs to the target range, the first audio signal can be divided into at least two audio segments. The method of division can be equal division, that is, the duration of each audio segment is the target duration, and the target duration can be 1s or 5s, etc., in order to detect whether there is a clip in each voice segment.

Wherein, the detection method for detecting whether there is clipping in each audio segment may be to detect whether there is a first number of consecutive or greater than the first number of sampling points in each voice segment. The absolute value of the amplitude value is greater than the second threshold. , Where the first number can be 3, and the second threshold can be the product of the maximum value in the sampling value range and the target ratio. The target ratio can be 90%, that is, 32768*0.9≈29491. If there are three consecutive audio segments in an audio segment, If the absolute value of the amplitude value of one or more sampling points exceeds 90% of the maximum value in the sampling value range, it is determined that the audio segment has an amplitude cut, and the speech segment is discarded. It should be noted that the above-mentioned 90% ratio can also be other ratios, such as 91%, 89%, 85%, 95%, etc., that is, around 90%. The above-mentioned first number can be 3 or For other values, there can be a mutual constraint relationship between the first number, target ratio, and sampling frequency.

If there is no absolute value of the amplitude values of three consecutive sampling points that all exceed 90% of the maximum value in the sampling value range, it means that the audio segment does not have clipping, then the audio segment is determined to be an available audio segment, and the audio segment is retained . The above detection of whether there is clipping through the voice segment can avoid the discontinuity of the remaining voice segment.

Optionally, the second audio signal can be obtained according to the results of the clipping detection processing on the at least two audio segments, for example, the audio segment with clipping in the at least two audio segments is discarded, and no clipping is retained. Then, all audio segments that do not have clipped amplitudes are combined into a second audio signal in chronological order.

For step S104 mentioned in the above embodiment, refer to FIG. 5. As shown in the figure, this application proposes a schematic diagram of performing clipping detection processing on an audio segment, including but not limited to steps S41-S44. ；

S41: For each audio segment of the at least two audio segments, detect whether the audio segment has clipping;

S42: If there is a clip in the audio segment, discard the audio segment;

S43. Obtain the discarded remaining audio segments in the at least two audio segments;

S44: Obtain a second audio signal according to the remaining audio segment.

Optionally, after obtaining the second audio segment, in order to ensure whether the second audio signal obtained according to the remaining audio segment can continue to be used in subsequent systems, it can be first detected whether the audio length of the second audio signal meets a certain condition, The determination method can refer to FIG. 6. The flowchart of the method for determining whether to discard the second audio signal is shown in the figure, including but not limited to steps S51-S54;

S51: Obtain a second audio signal;

S52: Detect whether the audio length of the second audio signal is greater than or equal to the first threshold; if the audio length of the second audio signal is less than the first threshold, perform step S53, if the audio length of the second audio signal is greater than or equal to the first threshold, Go to step S54;

S53: Discard the second audio signal;

S54: Determine that the second audio signal is an available audio signal;

Among them, the first threshold mentioned above refers to the length of the audio signal that can be input to the subsequent system for processing. For example, in a text-independent voiceprint registration scenario, the registered voice signal needs to be 20s in length, so the second audio signal can be judged. Whether the audio length is greater than or equal to 20S, if so, the second audio signal is retained, and the second audio signal is used to train the voiceprint recognition model.

In another embodiment, before acquiring the first audio signal with clipping in step S101, it may be determined whether clipping exists in the first audio signal. Optionally, a detection method for determining whether clipping exists in the first audio signal may be performed. Including but not limited to the following two optional implementation manners, the first optional implementation manner, please refer to Figure 7, including but not limited to steps S201-S202, the second optional implementation manner, please refer to Figure 8 As shown, including but not limited to steps S301-S304, the two optional implementation manners are specifically described below:

The first optional implementation is:

S201: Acquire amplitude values of N sampling points included in the first audio signal;

In this embodiment, the first audio signal is sampled. For details, refer to step S101. The original analog signal can be sampled and quantized by calculating the amplitude value function to obtain N corresponding to the N sampling points contained in the first audio signal. Amplitude value.

S202: If the amplitude value of the first audio signal satisfies a first condition, it is determined that the first audio signal has an amplitude clipping.

Wherein, the first condition includes: the amplitude value of the consecutive first number or consecutively greater than the first number of sampling points is greater than the second threshold. If the amplitude value of the sampling point of the first audio signal satisfies: the amplitude value of the consecutive first number or consecutively greater than the first number of sampling points is greater than the second threshold, it can be determined that there is a clipping in the first audio signal. For details, please refer to step S104, which will not be repeated here.

The second optional implementation is:

S301: Divide the target sampling range into at least two sub-ranges;

S302: Count the number of sampling points belonging to each of the at least two sub-ranges among the amplitude values of the N sampling points;

S302, construct a histogram;

Count the number of sampling points whose amplitude values belong to each sub-range among the N sampling points, and construct a histogram. The horizontal axis of the histogram may be the sub-range, and the vertical axis may be the number of sampling points whose amplitude values belong to each sub-range among the N sampling points in the first audio signal.

S304: If the change trend of the histogram satisfies the second condition, it will be determined that the first audio signal has a clipped amplitude;

As shown in Figure 9 and Figure 10, the target sampling range is equally divided into 22 sub-ranges in order of magnitude. As shown in Figure 9, if there is no clipping in the first audio signal, as the value of the sub-range interval increases , The number of occurrences of the amplitude value will gradually decrease; as shown in Figure 10, if there is a clipping in the first audio signal, as the value of the sub-range interval reaches the highest, the number of occurrences of the amplitude value also reaches the highest, and a histogram The phenomenon that the last column is higher than all the previous columns, that is, the frequency value of the last sub-range of the histogram is the highest. The frequency value represented by the last column is called the abnormally elevated part, and the second condition refers to the histogram There is an abnormally elevated part in the figure.

If there is no clipping in the first audio signal, the waveform of the audio signal is relatively smooth, and most of the amplitude values of N sampling points are relatively small. If there is clipping in the first audio signal, the amplitude of the audio signal's waveform will be If the value is larger, the amplitude value of the N sampling points will be relatively large, resulting in the amplitude value of more sampling points appearing in the sub-range of the larger amplitude value in the histogram.

In this embodiment, before acquiring the first audio signal with clipping in step S101, it is first determined whether there is clipping in the first audio signal, and at least one audio signal with clipping is obtained from a plurality of audio signals. The first audio signal of the embodiment may be any one of the at least one audio signal, so as to obtain the first audio signal with clipped amplitude, and further process the audio signal with clipped amplitude. For details, please refer to the previous embodiment. Keep as many effective audio signals as possible, so that the usable rate of audio signals is greatly improved.

Refer to FIG. 11, which provides a schematic structural diagram of an audio signal processing apparatus according to an embodiment of the present application. As shown in FIG. 11, the audio signal processing apparatus of the embodiment of the present application may include:

The first acquiring unit 11 is configured to acquire a first audio signal with clipping, where the first audio signal includes N sampling points, where N is a positive integer;

Optionally, the process of sampling and quantizing the original analog signal to obtain the first audio signal may be sampling the original analog signal to obtain N sampling points. Among them, the sampling frequency can be 8kHz, that is, there are 8000 sampling points within 1s. Then quantize the original amplitude value of each sampling point in each sampling point. As shown in Figure 2, if the original amplitude value of a sampling point exceeds the maximum value of the target sampling range, the maximum value in the target sampling range Indicates that if the original amplitude value of a sampling point exceeds the minimum value of the target sampling range, it is expressed as the minimum value within the target sampling range. After quantization, the original amplitude value of each sampling point can be limited to N amplitude values within the target sampling range, and one sampling point corresponds to one amplitude value.

The second acquiring unit 12 is configured to acquire target data used to represent a clipping ratio of the first audio signal, where the clipping ratio is used to represent the number of sample points with clipping in the N sampling points and The ratio between said N;

In an optional implementation manner, the method of obtaining target data used to indicate the clipping ratio of the first audio signal may be: first, the amplitude of each of the N sampling points included in the first audio signal Value analysis, determine the sampling points with clipping, and calculate the ratio between the number of clipping points and the total number of sampling points N. This ratio is the target data. In this embodiment, the target data is Is the cut-off ratio itself. Optionally, the method for analyzing the first audio signal to determine the sample points with clipping may be to determine whether there is a first number of consecutive or continuously greater than the first number of sample points whose amplitude value is greater than a second threshold, A quantity can be 3, and the second threshold value can be 90% of the maximum value of the target sampling range. For example, if the amplitude values of 5 consecutive sampling points are all greater than the second threshold value, then the 5 sampling points are regarded as those with clipping Sampling point.

In yet another optional implementation manner, the second acquiring unit is specifically configured to, referring to FIG. 3, schematically illustrate the process of acquiring target data used to represent the clipping ratio of the first audio signal, including but not limited to Steps S21-S23;

S21: Determine a first sub-range from at least two sub-ranges;

The first dividing unit 13 is configured to divide the first audio signal into at least two audio segments if the target data belongs to a target range;

S31: Obtain target data;

S34: Determine the first audio signal as an available audio signal;

S35. Discard the first audio signal.

The third acquiring unit 14 is configured to perform clipping detection processing on the at least two audio segments, and obtain a second audio signal according to the audio segment after the clipping detection processing;

Optionally, the third obtaining module is specifically configured to obtain the second audio signal according to the clipping detection processing result of the at least two audio segments, for example, in the at least two audio segments, The clipped audio segments are discarded, the audio segments that do not have clipped are retained, and all audio segments that do not have clipped are formed into a second audio signal in chronological order.

S42: If there is a clip in the audio segment, discard the audio segment;

S44: Obtain a second audio signal according to the remaining audio segment.

S51: Obtain a second audio signal;

S53: Discard the second audio signal;

S54: Determine that the second audio signal is an available audio signal;

In an embodiment, the third acquiring unit is specifically configured to:

For each audio segment of the at least two audio segments, detecting whether the audio segment has clipping;

If there is clipping in the audio segment, discard the audio segment;

Acquiring the discarded remaining audio segments in the at least two audio segments;

According to the remaining audio segment, a second audio signal is obtained.

Optionally, as shown in FIG. 12, the device further includes:

A detecting unit, configured to detect whether the audio length of the second audio signal is greater than or equal to a first threshold;

A first determining unit, configured to determine that the second audio signal is an available audio signal if the audio length of the second audio signal is greater than or equal to the first threshold;

If the audio length of the second audio signal is less than the first threshold, the second voice signal is discarded.

In an embodiment, each audio segment of the at least two audio segments includes at least one sampling point, and the third acquiring unit detects whether there is clipping in the audio segment by acquiring the audio segment The amplitude value of each sampling point in at least one sampling point included;

If the amplitude value of the at least one sampling point satisfies the first condition, it is determined that the audio segment has a clipping, and the first condition includes: the amplitude value of the first number consecutively or consecutively greater than the first number of sampling points is greater than the first condition. Two thresholds.

Optionally, as shown in FIG. 12, the device further includes:

A fourth acquiring unit, configured to acquire the amplitude value of each of the N sampling points included in the first audio signal;

The second determining unit is configured to determine that the first audio signal has a clipping if the amplitude values of the N sampling points meet a first condition, and the first condition includes: a continuous first number or a continuous greater than the first The amplitude value of the number of sampling points is greater than the second threshold.

In an embodiment, the amplitude value of each sampling point in the N sampling points belongs to the target sampling range;

The second threshold is the product of the maximum value of the target sampling range and the target ratio.

Optionally, as shown in FIG. 12, the device further includes:

The second dividing unit is configured to divide the target sampling range into at least two sub-ranges, the at least two sub-ranges do not overlap each other, and the target sampling range is the amplitude of the N sampling points included in the first audio signal The range of the value;

A statistical unit, configured to count the number of sampling points belonging to each of the at least two sub-ranges among the amplitude values of the N sampling points;

A construction unit, configured to construct a histogram, the horizontal axis of the histogram includes the at least two sub-ranges, and the vertical axis of the histogram includes the number of sampling points belonging to the sub-ranges;

The third determining unit is configured to determine that the first audio signal has a clipping if the change trend of the histogram meets the second condition.

In an embodiment, the second acquiring unit is specifically configured to:

Determine a first sub-range from the at least two sub-ranges, where the amplitude value of the first sub-range is the sub-range with the largest amplitude value among the at least two sub-ranges;

Acquiring, among the N sampling points, the number of sampling points whose amplitude values belong to the first sub-range as the first number;

The ratio between the first number and the N is calculated, and the ratio is used as target data for representing the clipping ratio of the first audio signal.

Optionally, the maximum value of the target range is a third threshold, and the minimum value of the target range is a fourth threshold, and the device further includes a fourth determining unit;

The fourth determining unit is specifically configured to discard the first audio signal if the target data is greater than the third threshold;

If the target data is less than the fourth threshold, the first audio signal is determined as an available audio signal.

In the embodiment of the present application, after the first audio signal with clipping is obtained, the processing method for the first audio signal is determined according to the target data used to indicate the clipping ratio of the first audio signal, and the first audio signal is detected according to clipping After processing, the second audio signal is obtained. The embodiments of this application do not simply discard the audio signal with clipping, but further process the audio signal with clipping, so as to retain as many valid audio signals as possible, so that the usable rate of the audio signal is larger. Promote.

Please refer to FIG. 13, which is a schematic structural diagram of another audio signal processing device provided by an embodiment of this application. As shown in FIG. 13, the audio signal processing device 1000 may include: at least one processor 1001, such as a CPU, at least one Communication interface 1003, memory 1004, and at least one communication bus 1002. Among them, the communication bus 1002 is used to implement connection and communication between these components. The communication interface 1003 may optionally include a standard wired interface and a wireless interface (such as a WI-FI interface). The memory 1004 may be a high-speed RAM memory, or a non-volatile memory (non-volatile memory), such as at least one disk memory. Optionally, the memory 1004 may also be at least one storage device located far away from the foregoing processor 1001. As shown in FIG. 13, the memory 1004, which is a computer storage medium, may include an operating system, a network communication module, and program instructions.

In the audio signal processing device 1000 shown in FIG. 13, the processor 1001 may be used to load program instructions stored in the memory 1004, and specifically perform the following operations:

Optionally, before acquiring the first audio signal with clipped amplitude, the method further includes:

Acquiring the amplitude value of each of the N sampling points included in the first audio signal;

If the amplitude values of the N sampling points meet the first condition, it is determined that the first audio signal has a clipping, and the first condition includes: the amplitude values of the first number consecutively or consecutively greater than the first number of sampling points Greater than the second threshold.

Dividing the target sampling range into at least two sub-ranges, the at least two sub-ranges do not overlap each other, and the target sampling range is a range in which amplitude values of N sampling points included in the first audio signal are located;

Counting the number of sampling points belonging to each of the at least two sub-ranges among the amplitude values of the N sampling points;

Constructing a histogram, the horizontal axis of the histogram includes the at least two sub-ranges, and the vertical axis of the histogram includes the number of sampling points belonging to the sub-ranges;

If the change trend of the histogram satisfies the second condition, it is determined that the first audio signal has a clipping.

Optionally, the acquiring target data used to represent the clipping ratio of the first audio signal includes:

Determining a first sub-range from the at least two sub-ranges, where the amplitude value of the first sub-range is the sub-range with the largest amplitude value among the at least two sub-ranges;

Optionally, the performing clip detection processing on the at least two audio segments, and obtaining a second audio signal according to the audio segment after the clip detection processing includes:

If there is clipping in the audio segment, discard the audio segment;

According to the remaining audio segment, a second audio signal is obtained.

Optionally, each of the at least two audio segments includes at least one sampling point, and the determining whether the audio segment has clipping includes:

Acquiring the amplitude value of each sampling point in at least one sampling point included in the audio segment;

Optionally, the amplitude value of each sampling point in the N sampling points belongs to a target sampling range, and the second threshold is the product of the maximum value of the target sampling range and the target ratio.

Optionally, the maximum value of the target range is a third threshold, and the minimum value of the target range is a fourth threshold, and the method further includes:

If the target data is greater than the third threshold, discard the first audio signal;

Optionally, the processor 1001 may also be used to load program instructions stored in the memory 1004 to perform the following operations:

Detecting whether the audio length of the second audio signal is greater than or equal to a first threshold;

If the audio length of the second audio signal is greater than or equal to the first threshold, determining that the second audio signal is an available audio signal;

It should be noted that, for the specific execution process, reference may be made to the specific description of the method embodiment shown in FIG. 1, which will not be repeated here.

For specific execution steps, please refer to the description of the foregoing embodiment, which will not be repeated here.

The embodiment of the present application also provides a computer storage medium. The computer storage medium may store a plurality of instructions, and the instructions are suitable for being loaded by a processor and executing the method steps of the embodiment shown in FIG. For the process, reference may be made to the specific description of the embodiment shown in FIG. 1, which is not repeated here.

A person of ordinary skill in the art can understand that all or part of the processes in the above-mentioned embodiment methods can be implemented by instructing relevant hardware through a computer program. The program can be stored in a computer readable storage medium, and the program can be stored in a computer readable storage medium. When executed, it includes the processes of the above-mentioned method embodiments. Wherein, the storage medium may be a magnetic disk, an optical disc, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM), etc.

Claims

An audio signal processing method, characterized in that it comprises:

Acquiring a first audio signal with clipping, where the first audio signal includes N sampling points, where N is a positive integer;

Acquiring target data used to represent a clipping ratio of the first audio signal, where the clipping ratio is used to represent a ratio between the number of sample points with clipping in the N sampling points and the N;

If the target data belongs to the target range, dividing the first audio signal into at least two audio segments;

Clipping detection processing is performed on the at least two audio segments, and a second audio signal is obtained according to the audio segment after the clipping detection processing.
The method according to claim 1, wherein the performing clipping detection processing on the at least two audio segments, and obtaining the second audio signal according to the audio segment after the clipping detection processing, comprises:

For each audio segment of the at least two audio segments, detecting whether the audio segment has clipping;

If there is clipping in the audio segment, discard the audio segment;

Acquiring the discarded remaining audio segments in the at least two audio segments;

According to the remaining audio segment, a second audio signal is obtained.
The method according to claim 2, wherein the method further comprises:

Detecting whether the audio length of the second audio signal is greater than or equal to a first threshold;

If the audio length of the second audio signal is greater than or equal to the first threshold, determining that the second audio signal is an available audio signal;

If the audio length of the second audio signal is less than the first threshold, the second voice signal is discarded.
The method according to claim 2, wherein each of the at least two audio segments includes at least one sampling point, and the detecting whether the audio segment has clipping includes:

Acquiring the amplitude value of each sampling point in at least one sampling point included in the audio segment;

If the amplitude value of the at least one sampling point satisfies the first condition, it is determined that the audio segment has a clipping, and the first condition includes: the amplitude value of the sampling point consecutively greater than the first number or greater than the first number Two thresholds.
The method according to claim 1, characterized in that before said acquiring the first audio signal with clipped amplitude, the method further comprises:

Acquiring the amplitude value of each of the N sampling points included in the first audio signal;

If the amplitude values of the N sampling points meet a first condition, it is determined that the first audio signal has an amplitude clipping, and the first condition includes: the amplitude values of the first number consecutively or consecutively greater than the first number of sampling points Greater than the second threshold.
The method according to claim 5, wherein the amplitude value of each sampling point in the N sampling points belongs to the target sampling range;

The second threshold is the product of the maximum value of the target sampling range and the target ratio.
The method according to claim 1, characterized in that before said acquiring the first audio signal with clipped amplitude, the method further comprises:

Dividing the target sampling range into at least two sub-ranges, the at least two sub-ranges do not overlap each other, and the target sampling range is a range in which amplitude values of N sampling points included in the first audio signal are located;

Counting the number of sampling points belonging to each of the at least two sub-ranges among the amplitude values of the N sampling points;

Constructing a histogram, the horizontal axis of the histogram includes the at least two sub-ranges, and the vertical axis of the histogram includes the number of sampling points belonging to the sub-ranges;

If the changing trend of the histogram satisfies the second condition, it is determined that the first audio signal has a clipping.
8. The method according to claim 7, wherein the acquiring target data used to represent the clipping ratio of the first audio signal comprises:

Determine a first sub-range from the at least two sub-ranges, where the amplitude value of the first sub-range is the sub-range with the largest amplitude value among the at least two sub-ranges;

Acquiring, among the N sampling points, the number of sampling points whose amplitude values belong to the first sub-range as the first number;

The ratio between the first number and the N is calculated, and the ratio is used as target data for representing the clipping ratio of the first audio signal.
The method according to claim 1, wherein the maximum value of the target range is a third threshold, and the minimum value of the target range is a fourth threshold, and the method further comprises:

If the target data is greater than the third threshold, discard the first audio signal;

If the target data is less than the fourth threshold, the first audio signal is determined as an available audio signal.
An audio signal processing device, characterized in that it comprises:

The first acquiring unit is configured to acquire a first audio signal with clipping, where the first audio signal includes N sampling points, where N is a positive integer;

The second acquiring unit is configured to acquire target data used to represent the clipping ratio of the first audio signal, where the clipping ratio is used to represent the number of sample points with clipping in the N sampling points and the total The ratio between N;

A first dividing unit, configured to divide the first audio signal into at least two audio segments if the target data belongs to a target range;

The third acquisition unit is configured to perform clipping detection processing on the at least two audio segments, and obtain a second audio signal according to the audio segment after the clipping detection processing.
The device according to claim 10, wherein the third acquiring unit is specifically configured to:

For each audio segment of the at least two audio segments, detecting whether the audio segment has clipping;

If there is clipping in the audio segment, discard the audio segment;

Acquiring the discarded remaining audio segments in the at least two audio segments;

According to the remaining audio segment, a second audio signal is obtained.
The device of claim 11, wherein the device further comprises:

A detecting unit, configured to detect whether the audio length of the second audio signal is greater than or equal to a first threshold;

A first determining unit, configured to determine that the second audio signal is an available audio signal if the audio length of the second audio signal is greater than or equal to the first threshold;

If the audio length of the second audio signal is less than the first threshold, the second voice signal is discarded.
The device according to claim 11, wherein each audio segment of the at least two audio segments includes at least one sampling point, and the third acquiring unit detects whether the audio segment has clipping Yes, obtain the amplitude value of each sampling point in at least one sampling point included in the audio segment;

If the amplitude value of the at least one sampling point satisfies the first condition, it is determined that the audio segment has a clipping, and the first condition includes: the amplitude value of the first number consecutively or consecutively greater than the first number of sampling points is greater than the first condition. Two thresholds.
The device of claim 10, wherein the device further comprises:

A fourth acquiring unit, configured to acquire the amplitude value of each of the N sampling points included in the first audio signal;

The second determining unit is configured to determine that the first audio signal has a clipping if the amplitude values of the N sampling points meet a first condition, and the first condition includes: a continuous first number or a continuous greater than the first The amplitude value of the number of sampling points is greater than the second threshold.
The device according to claim 14, wherein the amplitude value of each sampling point in the N sampling points belongs to the target sampling range;

The second threshold is the product of the maximum value of the target sampling range and the target ratio.
The device of claim 10, wherein the device further comprises:

The second dividing unit is configured to divide the target sampling range into at least two sub-ranges, the at least two sub-ranges do not overlap each other, and the target sampling range is the amplitude of the N sampling points included in the first audio signal The range of the value;

A statistical unit, configured to count the number of sampling points belonging to each of the at least two sub-ranges among the amplitude values of the N sampling points;

A construction unit, configured to construct a histogram, the horizontal axis of the histogram includes the at least two sub-ranges, and the vertical axis of the histogram includes the number of sampling points belonging to the sub-ranges;

The third determining unit is configured to determine that the first audio signal has a clipping if the change trend of the histogram meets the second condition.
The device according to claim 16, wherein the second acquiring unit is specifically configured to:

Determine a first sub-range from the at least two sub-ranges, where the amplitude value of the first sub-range is the sub-range with the largest amplitude value among the at least two sub-ranges;

Acquiring, among the N sampling points, the number of sampling points whose amplitude values belong to the first sub-range as the first number;

The ratio between the first number and the N is calculated, and the ratio is used as target data for representing the clipping ratio of the first audio signal.
9. The device of claim 10, wherein the maximum value of the target range is a third threshold, and the minimum value of the target range is a fourth threshold, and the device further comprises a fourth determining unit;

The fourth determining unit is specifically configured to discard the first audio signal if the target data is greater than the third threshold;

If the target data is less than the fourth threshold, the first audio signal is determined as an available audio signal.
An audio signal processing device, which is characterized by comprising a processor, a memory, and a communication interface, the processor, the memory, and the communication interface are connected to each other, wherein the communication interface is used for receiving and sending data, and the memory is used for A program code is stored, and the processor is used to call the program code to execute the method according to any one of claims 1 to 9.
A computer non-volatile readable storage medium, wherein the computer readable storage medium stores a computer program, and the computer program is executed by a processor to implement the method according to any one of claims 1 to 9 .