WO2021082083A1 - Audio signal processing method and device - Google Patents

Audio signal processing method and device Download PDF

Info

Publication number
WO2021082083A1
WO2021082083A1 PCT/CN2019/118444 CN2019118444W WO2021082083A1 WO 2021082083 A1 WO2021082083 A1 WO 2021082083A1 CN 2019118444 W CN2019118444 W CN 2019118444W WO 2021082083 A1 WO2021082083 A1 WO 2021082083A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio signal
audio
clipping
range
sampling
Prior art date
Application number
PCT/CN2019/118444
Other languages
French (fr)
Chinese (zh)
Inventor
张丝潆
彭俊清
王健宗
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021082083A1 publication Critical patent/WO2021082083A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/06Decision making techniques; Pattern matching strategies
    • G10L17/14Use of phonemic categorisation or speech recognition prior to speaker recognition or verification
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Definitions

  • This application relates to the field of communication technology, and in particular to an audio signal processing method and device.
  • the pre-processing of the audio signal is very critical, which has a great influence on the accuracy of subsequent recognition.
  • the pre-processing includes clipping detection of the audio signal.
  • the clipping of the audio signal is mainly due to the excessively high amplitude of the audio signal, which exceeds the maximum value of the sampling value range, and thus the clipping occurs, which is also called the phenomenon of clipping.
  • Clipping can cause damage to the information in the voice signal.
  • the voice signal is discarded. This method will cause the loss of many effective voice signals.
  • the embodiments of the present application provide an audio signal processing method and device, which can retain more effective audio signals, so that the usable rate of the audio signals is greatly improved.
  • an audio signal processing method including:
  • Acquiring target data used to represent a clipping ratio of the first audio signal where the clipping ratio is used to represent a ratio between the number of sample points with clipping in the N sampling points and the N;
  • the target data belongs to the target range, dividing the first audio signal into at least two audio segments;
  • Clipping detection processing is performed on the at least two audio segments, and a second audio signal is obtained according to the audio segment after the clipping detection processing.
  • an audio signal processing device including:
  • the first acquiring unit is configured to acquire a first audio signal with clipping, where the first audio signal includes N sampling points, where N is a positive integer;
  • the second acquiring unit is configured to acquire target data used to represent the clipping ratio of the first audio signal, where the clipping ratio is used to represent the number of sample points with clipping in the N sampling points and the total The ratio between N;
  • a first dividing unit configured to divide the first audio signal into at least two audio segments if the target data belongs to a target range
  • the third acquisition unit is configured to perform clipping detection processing on the at least two audio segments, and obtain a second audio signal according to the audio segment after the clipping detection processing.
  • an embodiment of the present application provides an audio signal processing device.
  • the audio signal processing device includes a processor, a memory, and a communication interface.
  • the processor, the memory, and the communication interface are connected to each other.
  • the memory is used to store program code
  • the processor is used to call the program code to execute the method described in the first aspect.
  • an embodiment of the present application provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and the computer program is executed by a processor to implement the above-mentioned method.
  • the processing method for the first audio signal is determined according to the target data used to indicate the clipping ratio of the first audio signal. If the target The data belongs to the target range, the first audio signal is divided into at least two audio segments, the at least two audio segments are subjected to clipping detection processing, and the second audio segment is obtained according to the audio segment after the clipping detection processing. audio signal.
  • the embodiments of this application do not simply discard the audio signal with clipping, but further process the audio signal with clipping, so as to retain as many valid audio signals as possible, so that the usable rate of the audio signal is larger. Promote.
  • FIG. 1 is a flowchart of an audio signal processing method provided by an embodiment of this application
  • Fig. 2 is a waveform diagram of an audio signal with clipping provided by an embodiment of the application
  • Fig. 3 is a flowchart of another audio signal processing method provided by an embodiment of the present application.
  • FIG. 4 is a flowchart of a method for obtaining target data representing the clipping ratio of a first audio signal according to an embodiment of the application
  • FIG. 5 is a flow chart for determining whether target data belongs to the content of the target range provided by an embodiment of the application
  • FIG. 6 is a flowchart of a method for performing clipping detection processing on an audio segment according to an embodiment of the application
  • FIG. 7 is a flowchart of a method for determining whether to discard a second audio signal according to an embodiment of the application
  • FIG. 8 is a flowchart of yet another audio signal processing method provided by an embodiment of this application.
  • FIG. 9 is a histogram without clipping provided by an embodiment of the application.
  • FIG. 10 is a histogram with cutout provided by an embodiment of the application.
  • FIG. 11 is a schematic structural diagram of an audio signal processing device provided by an embodiment of the application.
  • FIG. 12 is a schematic structural diagram of another audio signal processing device provided by an embodiment of the application.
  • FIG. 13 is a schematic structural diagram of another audio signal processing device provided by an embodiment of the application.
  • FIG. 1 provides a schematic flowchart of an audio signal processing method according to an embodiment of this application.
  • the audio signal processing method of the embodiment of the present application may include the following steps S101 to S104.
  • S101 Acquire a first audio signal with clipping, where the first audio signal includes N sampling points, where N is a positive integer;
  • the first audio signal may include a voice data signal in the instant messaging process, or may be a music data signal recorded on site, etc., which is not limited in the embodiment of the present application.
  • the method for acquiring the first audio signal in this embodiment may be to perform clipping detection processing on multiple audio signals, determine whether the audio signal has clipping, and then obtain at least one audio signal with clipping.
  • the implementation of this application The first audio signal in this example may be any one of the at least one audio signal.
  • the first audio signal includes N sampling points, and the amplitude value of each sampling point belongs to a preset target sampling range, and the target sampling range is determined by the number of bits used to store the amplitude value, for example, if 16bit is used to store the amplitude value ,
  • the target sampling range is 2-15 to 215-1, which is -32768 to 32767.
  • the process of sampling and quantizing the original analog signal to obtain the first audio signal may be sampling the original analog signal to obtain N sampling points.
  • the sampling frequency can be 8kHz, that is, there are 8000 sampling points within 1s.
  • the maximum value in the target sampling range Indicates that if the original amplitude value of a sampling point exceeds the minimum value of the target sampling range, it is expressed as the minimum value within the target sampling range.
  • the original amplitude value of each sampling point can be limited to N amplitude values within the target sampling range, and one sampling point corresponds to one amplitude value.
  • the first audio signal has a clipping.
  • sampling frequency can also be other frequencies, which can be customized according to the needs of the user.
  • number of bits used to store the amplitude value can also be other bits, which can be set according to the sampling range required by the user.
  • the original analog signal can be sampled and quantized by calculating the amplitude value function to obtain N amplitude values corresponding to the N sampling points contained in the first audio signal. For example, setting the sampling frequency of the calculated amplitude value function and using In order to store the number of bits of the amplitude value, and input the original analog signal into the calculated amplitude value function, the first audio signal is obtained.
  • the target data representing the clipping ratio of the first audio signal is obtained, where the target data may be
  • the clipping ratio itself may also be other data that can reflect the size of the clipping ratio.
  • the target data may be within a preset range of the clipping ratio.
  • the method of obtaining target data used to indicate the clipping ratio of the first audio signal may be: first, the amplitude of each of the N sampling points included in the first audio signal Analyze the value, determine the sampling points with clipping, and calculate the ratio between the number of clipping points and the total number of sampling points N. This ratio is the target data.
  • the target data is Is the cut-off ratio itself.
  • the method for analyzing the first audio signal to determine the sample points with clipping may be to determine whether there is a first number of consecutive or continuously greater than the first number of sample points whose amplitude value is greater than a second threshold, A quantity can be 3, and the second threshold can be 90% of the maximum value of the target sampling range. For example, if the amplitude values of 5 consecutive sampling points are all greater than the second threshold, then the 5 sampling points will be regarded as those with clipping Sampling point.
  • the method of obtaining target data used to indicate the clipping ratio of the first audio signal may also be, among multiple sampling points, counting multiple sampling points in the first audio signal The number of sampling points whose amplitude value in exceeds the first threshold, and then the ratio between this number and the number N of all sampling points is calculated, and the ratio is the target data. Since the amplitude values of the sample points with clipping are relatively large, and the amplitude value exceeds the first threshold, if the clipping ratio of the first audio signal is relatively large, the calculated ratio will also be relatively large. Therefore, it can be calculated by The ratio of indirectly reflects the size of the clipping ratio, but the ratio is not the clipping ratio itself. Among them, the value of the first threshold can be set, and check and update continuously to obtain a more reasonable first threshold.
  • the method of obtaining target data representing the cut-off ratio of the first audio signal may also be, referring to FIG. 3, which schematically illustrates obtaining the target data representing the first audio signal
  • the flow of the target data of the clip ratio includes but not limited to steps S21-S23;
  • the target sampling range to which the N sampling values corresponding to the N sampling points of the first audio signal belong is divided into at least two subranges, and the at least two subranges do not overlap with each other.
  • the division method is not limited in this application, and it can be divided equally or unevenly.
  • the aforementioned at least two sub-ranges can be 22, or 24, 30, or other numerical values.
  • the first sub-range is determined from the at least two non-overlapping sub-ranges, and the first sub-range refers to the sub-range with the largest amplitude value among the at least two sub-ranges.
  • the at least two sub-ranges include [0,7], [8,15], [16,23], the first sub-range is [16,23].
  • S23 Calculate the ratio between the first number and the N, and use the ratio as target data for representing the clipping ratio of the first audio signal.
  • the target data can reflect the size of the clipping ratio of the first audio signal.
  • the target data after the target data is acquired, it is determined whether the acquired target data belongs to the target range. If it belongs to the target range, the first audio signal is divided into at least two audio segments.
  • the division method may be It can be divided in units of 1 second, or can also be divided in other time units, such as 5 seconds.
  • the target range may be 60% to 80%. It is understood that the target range may also be other ranges, which are not limited in the embodiments of the present application.
  • FIG. 4 is a flowchart of processing target data belonging to different ranges according to an embodiment of this application. As shown in the figure, it includes steps S31-S35:
  • step S32 Determine whether the target data belongs to the target range. If the target data belongs to the target range, perform step S33; if the target data does not belong to the target range, perform step S34 or S35; if the target data is greater than the third threshold, perform step S35. If the data is less than the fourth threshold value, step S34 is executed; wherein, the third threshold value is the maximum value of the target range, and the fourth threshold value is the minimum value of the target range.
  • S33 Divide the first audio signal into at least two audio segments, and perform clipping detection processing on the at least two audio segments;
  • the target data belongs to the target range There are two conclusions for determining whether the target data belongs to the target range. One is: the target data belongs to the target range. For details, please refer to step S104, which will not be repeated here. The other is: the target data does not belong to the target range. If the target data is greater than the third threshold, it means that the ratio of the number of sample points with clipping to the total number of sample points in the first audio signal is too high, and there are samples with clipping There are too many points. If the first audio signal is used to train the voiceprint recognition model, the verification rate of the voiceprint recognition will be reduced, and the first audio signal is discarded.
  • the target data is less than the fourth threshold, it means that the ratio between the number of sample points with clipping in the first audio signal and the total number of sample points is relatively small, and the number of sample points with clipping is relatively small, which is not enough to affect the first audio signal.
  • the information of an audio signal is damaged and has almost no effect on the subsequent actual processing. Therefore, it is not necessary to divide the audio segment and the clipping detection processing of the audio segment, but directly enter the system for subsequent processing. For example, the first one can be directly used.
  • the voice signal is trained on the voiceprint recognition model.
  • S104 Perform clipping detection processing on the at least two audio segments, and obtain a second audio signal according to the audio segment after the clipping detection processing.
  • the first audio signal can be divided into at least two audio segments.
  • the method of division can be equal division, that is, the duration of each audio segment is the target duration, and the target duration can be 1s or 5s, etc., in order to detect whether there is a clip in each voice segment.
  • the detection method for detecting whether there is clipping in each audio segment may be to detect whether there is a first number of consecutive or greater than the first number of sampling points in each voice segment.
  • the absolute value of the amplitude value is greater than the second threshold.
  • the first number can be 3
  • the second threshold can be the product of the maximum value in the sampling value range and the target ratio.
  • the target ratio can be 90%, that is, 32768*0.9 ⁇ 29491. If there are three consecutive audio segments in an audio segment, If the absolute value of the amplitude value of one or more sampling points exceeds 90% of the maximum value in the sampling value range, it is determined that the audio segment has an amplitude cut, and the speech segment is discarded.
  • the above-mentioned 90% ratio can also be other ratios, such as 91%, 89%, 85%, 95%, etc., that is, around 90%.
  • the above-mentioned first number can be 3 or For other values, there can be a mutual constraint relationship between the first number, target ratio, and sampling frequency.
  • the audio segment is determined to be an available audio segment, and the audio segment is retained .
  • the above detection of whether there is clipping through the voice segment can avoid the discontinuity of the remaining voice segment.
  • the second audio signal can be obtained according to the results of the clipping detection processing on the at least two audio segments, for example, the audio segment with clipping in the at least two audio segments is discarded, and no clipping is retained. Then, all audio segments that do not have clipped amplitudes are combined into a second audio signal in chronological order.
  • step S104 refers to FIG. 5.
  • this application proposes a schematic diagram of performing clipping detection processing on an audio segment, including but not limited to steps S41-S44. ;
  • the determination method can refer to FIG. 6.
  • the flowchart of the method for determining whether to discard the second audio signal is shown in the figure, including but not limited to steps S51-S54;
  • step S52 Detect whether the audio length of the second audio signal is greater than or equal to the first threshold; if the audio length of the second audio signal is less than the first threshold, perform step S53, if the audio length of the second audio signal is greater than or equal to the first threshold, Go to step S54;
  • the first threshold mentioned above refers to the length of the audio signal that can be input to the subsequent system for processing.
  • the registered voice signal needs to be 20s in length, so the second audio signal can be judged. Whether the audio length is greater than or equal to 20S, if so, the second audio signal is retained, and the second audio signal is used to train the voiceprint recognition model.
  • the processing method for the first audio signal is determined according to the target data used to indicate the clipping ratio of the first audio signal. If the target The data belongs to the target range, the first audio signal is divided into at least two audio segments, the at least two audio segments are subjected to clipping detection processing, and the second audio segment is obtained according to the audio segment after the clipping detection processing. audio signal.
  • the embodiments of this application do not simply discard the audio signal with clipping, but further process the audio signal with clipping, so as to retain as many valid audio signals as possible, so that the usable rate of the audio signal is larger. Promote.
  • the first optional implementation manner please refer to Figure 7, including but not limited to steps S201-S202
  • the second optional implementation manner please refer to Figure 8
  • the two optional implementation manners are specifically described below:
  • the first optional implementation is:
  • the first audio signal is sampled.
  • the original analog signal can be sampled and quantized by calculating the amplitude value function to obtain N corresponding to the N sampling points contained in the first audio signal. Amplitude value.
  • the first condition includes: the amplitude value of the consecutive first number or consecutively greater than the first number of sampling points is greater than the second threshold. If the amplitude value of the sampling point of the first audio signal satisfies: the amplitude value of the consecutive first number or consecutively greater than the first number of sampling points is greater than the second threshold, it can be determined that there is a clipping in the first audio signal. For details, please refer to step S104, which will not be repeated here.
  • the second optional implementation is:
  • S301 Divide the target sampling range into at least two sub-ranges
  • S302 Count the number of sampling points belonging to each of the at least two sub-ranges among the amplitude values of the N sampling points;
  • the target sampling range to which the N sampling values corresponding to the N sampling points of the first audio signal belong is divided into at least two subranges, and the at least two subranges do not overlap with each other.
  • the division method is not limited in this application, and it can be divided equally or unevenly.
  • the aforementioned at least two sub-ranges can be 22, or 24, 30, or other numerical values.
  • Count the number of sampling points whose amplitude values belong to each sub-range among the N sampling points and construct a histogram.
  • the horizontal axis of the histogram may be the sub-range, and the vertical axis may be the number of sampling points whose amplitude values belong to each sub-range among the N sampling points in the first audio signal.
  • the target sampling range is equally divided into 22 sub-ranges in order of magnitude.
  • the number of occurrences of the amplitude value will gradually decrease; as shown in Figure 10, if there is a clipping in the first audio signal, as the value of the sub-range interval reaches the highest, the number of occurrences of the amplitude value also reaches the highest, and a histogram
  • the phenomenon that the last column is higher than all the previous columns, that is, the frequency value of the last sub-range of the histogram is the highest.
  • the frequency value represented by the last column is called the abnormally elevated part, and the second condition refers to the histogram There is an abnormally elevated part in the figure.
  • the waveform of the audio signal is relatively smooth, and most of the amplitude values of N sampling points are relatively small. If there is clipping in the first audio signal, the amplitude of the audio signal's waveform will be If the value is larger, the amplitude value of the N sampling points will be relatively large, resulting in the amplitude value of more sampling points appearing in the sub-range of the larger amplitude value in the histogram.
  • the first audio signal of the embodiment may be any one of the at least one audio signal, so as to obtain the first audio signal with clipped amplitude, and further process the audio signal with clipped amplitude.
  • the first audio signal of the embodiment may be any one of the at least one audio signal, so as to obtain the first audio signal with clipped amplitude, and further process the audio signal with clipped amplitude.
  • the audio signal processing apparatus of the embodiment of the present application may include:
  • the first acquiring unit 11 is configured to acquire a first audio signal with clipping, where the first audio signal includes N sampling points, where N is a positive integer;
  • the first audio signal may include a voice data signal in the instant messaging process, or may be a music data signal recorded on site, etc., which is not limited in the embodiment of the present application.
  • the method for acquiring the first audio signal in this embodiment may be to perform clipping detection processing on multiple audio signals, determine whether the audio signal has clipping, and then obtain at least one audio signal with clipping.
  • the implementation of this application The first audio signal in this example may be any one of the at least one audio signal.
  • the first audio signal includes N sampling points, and the amplitude value of each sampling point belongs to a preset target sampling range, and the target sampling range is determined by the number of bits used to store the amplitude value, for example, if 16bit is used to store the amplitude value ,
  • the target sampling range is 2-15 to 215-1, which is -32768 to 32767.
  • the process of sampling and quantizing the original analog signal to obtain the first audio signal may be sampling the original analog signal to obtain N sampling points.
  • the sampling frequency can be 8kHz, that is, there are 8000 sampling points within 1s.
  • the maximum value in the target sampling range Indicates that if the original amplitude value of a sampling point exceeds the minimum value of the target sampling range, it is expressed as the minimum value within the target sampling range.
  • the original amplitude value of each sampling point can be limited to N amplitude values within the target sampling range, and one sampling point corresponds to one amplitude value.
  • sampling frequency can also be other frequencies, which can be customized according to the needs of the user.
  • number of bits used to store the amplitude value can also be other bits, which can be set according to the sampling range required by the user.
  • the original analog signal can be sampled and quantized by calculating the amplitude value function to obtain N amplitude values corresponding to the N sampling points contained in the first audio signal. For example, setting the sampling frequency of the calculated amplitude value function and using In order to store the number of bits of the amplitude value, and input the original analog signal into the calculated amplitude value function, the first audio signal is obtained.
  • the second acquiring unit 12 is configured to acquire target data used to represent a clipping ratio of the first audio signal, where the clipping ratio is used to represent the number of sample points with clipping in the N sampling points and The ratio between said N;
  • the target data representing the clipping ratio of the first audio signal is obtained, where the target data may be
  • the clipping ratio itself may also be other data that can reflect the size of the clipping ratio.
  • the target data may be within a preset range of the clipping ratio.
  • the method of obtaining target data used to indicate the clipping ratio of the first audio signal may be: first, the amplitude of each of the N sampling points included in the first audio signal Value analysis, determine the sampling points with clipping, and calculate the ratio between the number of clipping points and the total number of sampling points N. This ratio is the target data.
  • the target data is Is the cut-off ratio itself.
  • the method for analyzing the first audio signal to determine the sample points with clipping may be to determine whether there is a first number of consecutive or continuously greater than the first number of sample points whose amplitude value is greater than a second threshold, A quantity can be 3, and the second threshold value can be 90% of the maximum value of the target sampling range. For example, if the amplitude values of 5 consecutive sampling points are all greater than the second threshold value, then the 5 sampling points are regarded as those with clipping Sampling point.
  • the method of obtaining target data used to indicate the clipping ratio of the first audio signal may also be, among multiple sampling points, counting multiple sampling points in the first audio signal The number of sampling points whose amplitude value in exceeds the first threshold, and then the ratio between this number and the number N of all sampling points is calculated, and the ratio is the target data. Since the amplitude values of the sample points with clipping are relatively large, and the amplitude value exceeds the first threshold, if the clipping ratio of the first audio signal is relatively large, the calculated ratio will also be relatively large. Therefore, it can be calculated by The ratio of indirectly reflects the size of the clipping ratio, but the ratio is not the clipping ratio itself. Among them, the value of the first threshold can be set, and check and update continuously to obtain a more reasonable first threshold.
  • the second acquiring unit is specifically configured to, referring to FIG. 3, schematically illustrate the process of acquiring target data used to represent the clipping ratio of the first audio signal, including but not limited to Steps S21-S23;
  • the target sampling range to which the N sampling values corresponding to the N sampling points of the first audio signal belong is divided into at least two subranges, and the at least two subranges do not overlap with each other.
  • the division method is not limited in this application, and it can be divided equally or unevenly.
  • the aforementioned at least two sub-ranges can be 22, or 24, 30, or other numerical values.
  • the first sub-range is determined from the at least two non-overlapping sub-ranges, and the first sub-range refers to the sub-range with the largest amplitude value among the at least two sub-ranges.
  • the at least two sub-ranges include [0,7], [8,15], [16,23], the first sub-range is [16,23].
  • S23 Calculate the ratio between the first number and the N, and use the ratio as target data for representing the clipping ratio of the first audio signal.
  • the target data can reflect the size of the clipping ratio of the first audio signal.
  • the first dividing unit 13 is configured to divide the first audio signal into at least two audio segments if the target data belongs to a target range;
  • the target data after the target data is acquired, it is determined whether the acquired target data belongs to the target range. If it belongs to the target range, the first audio signal is divided into at least two audio segments.
  • the division method may be It can be divided in units of 1 second, or can also be divided in other time units, such as 5 seconds.
  • the target range may be 60% to 80%. It is understood that the target range may also be other ranges, which are not limited in the embodiments of the present application.
  • FIG. 4 is a flowchart of processing target data belonging to different ranges according to an embodiment of this application. As shown in the figure, it includes steps S31-S35:
  • step S32 Determine whether the target data belongs to the target range. If the target data belongs to the target range, perform step S33; if the target data does not belong to the target range, perform step S34 or S35; if the target data is greater than the third threshold, perform step S35. If the data is less than the fourth threshold value, step S34 is executed; wherein, the third threshold value is the maximum value of the target range, and the fourth threshold value is the minimum value of the target range.
  • S33 Divide the first audio signal into at least two audio segments, and perform clipping detection processing on the at least two audio segments;
  • the target data belongs to the target range There are two conclusions for determining whether the target data belongs to the target range. One is: the target data belongs to the target range. For details, please refer to step S104, which will not be repeated here. The other is: the target data does not belong to the target range. If the target data is greater than the third threshold, it means that the ratio of the number of sample points with clipping to the total number of sample points in the first audio signal is too high, and there are samples with clipping There are too many points. If the first audio signal is used to train the voiceprint recognition model, the verification rate of the voiceprint recognition will be reduced, and the first audio signal is discarded.
  • the target data is less than the fourth threshold, it means that the ratio between the number of sample points with clipping in the first audio signal and the total number of sample points is relatively small, and the number of sample points with clipping is relatively small, which is not enough to affect the first audio signal.
  • the information of an audio signal is damaged and has almost no effect on the subsequent actual processing. Therefore, it is not necessary to divide the audio segment and the clipping detection processing of the audio segment, but directly enter the system for subsequent processing. For example, the first one can be directly used.
  • the voice signal is trained on the voiceprint recognition model.
  • the third acquiring unit 14 is configured to perform clipping detection processing on the at least two audio segments, and obtain a second audio signal according to the audio segment after the clipping detection processing;
  • the first audio signal can be divided into at least two audio segments.
  • the method of division can be equal division, that is, the duration of each audio segment is the target duration, and the target duration can be 1s or 5s, etc., in order to detect whether there is a clip in each voice segment.
  • the detection method for detecting whether there is clipping in each audio segment may be to detect whether there is a first number of consecutive or greater than the first number of sampling points in each voice segment.
  • the absolute value of the amplitude value is greater than the second threshold.
  • the first number can be 3
  • the second threshold can be the product of the maximum value in the sampling value range and the target ratio.
  • the target ratio can be 90%, that is, 32768*0.9 ⁇ 29491. If there are three consecutive audio segments in an audio segment, If the absolute value of the amplitude value of one or more sampling points exceeds 90% of the maximum value in the sampling value range, it is determined that the audio segment has an amplitude cut, and the speech segment is discarded.
  • the above-mentioned 90% ratio can also be other ratios, such as 91%, 89%, 85%, 95%, etc., that is, around 90%.
  • the above-mentioned first number can be 3 or For other values, there can be a mutual constraint relationship between the first number, target ratio, and sampling frequency.
  • the audio segment is determined to be an available audio segment, and the audio segment is retained .
  • the above detection of whether there is clipping through the voice segment can avoid the discontinuity of the remaining voice segment.
  • the third obtaining module is specifically configured to obtain the second audio signal according to the clipping detection processing result of the at least two audio segments, for example, in the at least two audio segments, The clipped audio segments are discarded, the audio segments that do not have clipped are retained, and all audio segments that do not have clipped are formed into a second audio signal in chronological order.
  • step S104 refers to FIG. 5.
  • this application proposes a schematic diagram of performing clipping detection processing on an audio segment, including but not limited to steps S41-S44. ;
  • the determination method can refer to FIG. 6.
  • the flowchart of the method for determining whether to discard the second audio signal is shown in the figure, including but not limited to steps S51-S54;
  • step S52 Detect whether the audio length of the second audio signal is greater than or equal to the first threshold; if the audio length of the second audio signal is less than the first threshold, perform step S53, if the audio length of the second audio signal is greater than or equal to the first threshold, Go to step S54;
  • the first threshold mentioned above refers to the length of the audio signal that can be input to the subsequent system for processing.
  • the registered voice signal needs to be 20s in length, so the second audio signal can be judged. Whether the audio length is greater than or equal to 20S, if so, the second audio signal is retained, and the second audio signal is used to train the voiceprint recognition model.
  • the third acquiring unit is specifically configured to:
  • the device further includes:
  • a detecting unit configured to detect whether the audio length of the second audio signal is greater than or equal to a first threshold
  • a first determining unit configured to determine that the second audio signal is an available audio signal if the audio length of the second audio signal is greater than or equal to the first threshold
  • the second voice signal is discarded.
  • each audio segment of the at least two audio segments includes at least one sampling point
  • the third acquiring unit detects whether there is clipping in the audio segment by acquiring the audio segment The amplitude value of each sampling point in at least one sampling point included;
  • the amplitude value of the at least one sampling point satisfies the first condition, it is determined that the audio segment has a clipping, and the first condition includes: the amplitude value of the first number consecutively or consecutively greater than the first number of sampling points is greater than the first condition. Two thresholds.
  • the device further includes:
  • a fourth acquiring unit configured to acquire the amplitude value of each of the N sampling points included in the first audio signal
  • the second determining unit is configured to determine that the first audio signal has a clipping if the amplitude values of the N sampling points meet a first condition, and the first condition includes: a continuous first number or a continuous greater than the first The amplitude value of the number of sampling points is greater than the second threshold.
  • the amplitude value of each sampling point in the N sampling points belongs to the target sampling range
  • the second threshold is the product of the maximum value of the target sampling range and the target ratio.
  • the device further includes:
  • the second dividing unit is configured to divide the target sampling range into at least two sub-ranges, the at least two sub-ranges do not overlap each other, and the target sampling range is the amplitude of the N sampling points included in the first audio signal The range of the value;
  • a statistical unit configured to count the number of sampling points belonging to each of the at least two sub-ranges among the amplitude values of the N sampling points;
  • a construction unit configured to construct a histogram, the horizontal axis of the histogram includes the at least two sub-ranges, and the vertical axis of the histogram includes the number of sampling points belonging to the sub-ranges;
  • the third determining unit is configured to determine that the first audio signal has a clipping if the change trend of the histogram meets the second condition.
  • the second acquiring unit is specifically configured to:
  • the ratio between the first number and the N is calculated, and the ratio is used as target data for representing the clipping ratio of the first audio signal.
  • the maximum value of the target range is a third threshold
  • the minimum value of the target range is a fourth threshold
  • the device further includes a fourth determining unit
  • the fourth determining unit is specifically configured to discard the first audio signal if the target data is greater than the third threshold
  • the first audio signal is determined as an available audio signal.
  • the processing method for the first audio signal is determined according to the target data used to indicate the clipping ratio of the first audio signal, and the first audio signal is detected according to clipping After processing, the second audio signal is obtained.
  • the embodiments of this application do not simply discard the audio signal with clipping, but further process the audio signal with clipping, so as to retain as many valid audio signals as possible, so that the usable rate of the audio signal is larger. Promote.
  • the audio signal processing device 1000 may include: at least one processor 1001, such as a CPU, at least one Communication interface 1003, memory 1004, and at least one communication bus 1002.
  • the communication bus 1002 is used to implement connection and communication between these components.
  • the communication interface 1003 may optionally include a standard wired interface and a wireless interface (such as a WI-FI interface).
  • the memory 1004 may be a high-speed RAM memory, or a non-volatile memory (non-volatile memory), such as at least one disk memory.
  • the memory 1004 may also be at least one storage device located far away from the foregoing processor 1001.
  • the memory 1004, which is a computer storage medium may include an operating system, a network communication module, and program instructions.
  • the processor 1001 may be used to load program instructions stored in the memory 1004, and specifically perform the following operations:
  • Acquiring target data used to represent a clipping ratio of the first audio signal where the clipping ratio is used to represent a ratio between the number of sample points with clipping in the N sampling points and the N;
  • the target data belongs to the target range, dividing the first audio signal into at least two audio segments;
  • Clipping detection processing is performed on the at least two audio segments, and a second audio signal is obtained according to the audio segment after the clipping detection processing.
  • the method before acquiring the first audio signal with clipped amplitude, the method further includes:
  • the first condition includes: the amplitude values of the first number consecutively or consecutively greater than the first number of sampling points Greater than the second threshold.
  • the method before acquiring the first audio signal with clipped amplitude, the method further includes:
  • the target sampling range is a range in which amplitude values of N sampling points included in the first audio signal are located;
  • the horizontal axis of the histogram includes the at least two sub-ranges, and the vertical axis of the histogram includes the number of sampling points belonging to the sub-ranges;
  • the acquiring target data used to represent the clipping ratio of the first audio signal includes:
  • the ratio between the first number and the N is calculated, and the ratio is used as target data for representing the clipping ratio of the first audio signal.
  • the performing clip detection processing on the at least two audio segments, and obtaining a second audio signal according to the audio segment after the clip detection processing includes:
  • each of the at least two audio segments includes at least one sampling point, and the determining whether the audio segment has clipping includes:
  • the amplitude value of the at least one sampling point satisfies the first condition, it is determined that the audio segment has a clipping, and the first condition includes: the amplitude value of the first number consecutively or consecutively greater than the first number of sampling points is greater than the first condition. Two thresholds.
  • the amplitude value of each sampling point in the N sampling points belongs to a target sampling range
  • the second threshold is the product of the maximum value of the target sampling range and the target ratio.
  • the maximum value of the target range is a third threshold
  • the minimum value of the target range is a fourth threshold
  • the method further includes:
  • the first audio signal is determined as an available audio signal.
  • processor 1001 may also be used to load program instructions stored in the memory 1004 to perform the following operations:
  • the audio length of the second audio signal is greater than or equal to the first threshold, determining that the second audio signal is an available audio signal
  • the second voice signal is discarded.
  • the embodiment of the present application also provides a computer storage medium.
  • the computer storage medium may store a plurality of instructions, and the instructions are suitable for being loaded by a processor and executing the method steps of the embodiment shown in FIG. For the process, reference may be made to the specific description of the embodiment shown in FIG. 1, which is not repeated here.
  • the program can be stored in a computer readable storage medium, and the program can be stored in a computer readable storage medium. When executed, it includes the processes of the above-mentioned method embodiments.
  • the storage medium may be a magnetic disk, an optical disc, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM), etc.

Abstract

The present application discloses an audio signal processing method and device. The audio signal processing method comprises: acquiring a first audio signal having amplitude clipping; acquiring target data for representing the amplitude clipping ratio of the first audio signal; if the target data belongs to a target range, dividing the first audio signal into at least two audio segments; performing amplitude clipping detection on the at least two audio segments, and obtaining a second audio signal according to the audio segments on which the amplitude clipping detection has been performed. By means of the technical solution of the present application, valid audio signals can be retained as many as possible, so that the usability of the audio signals is greatly improved.

Description

音频信号处理方法及装置Audio signal processing method and device
本申请要求于2019年10月29日提交中国专利局、申请号为201911034571.6、申请名称为“一种音频信号处理方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on October 29, 2019, the application number is 201911034571.6, and the application name is "a method and device for audio signal processing", the entire content of which is incorporated into this application by reference in.
技术领域Technical field
本申请涉及通信技术领域,尤其涉及一种音频信号处理方法及装置。This application relates to the field of communication technology, and in particular to an audio signal processing method and device.
背景技术Background technique
在声纹识别过程中,音频信号的前期预处理十分关键,对后续的识别准确率有极大的影响。其中,前期预处理包括对音频信号的截幅检测。音频信号的截幅主要是由于音频信号的幅值过高,超过了采样值范围的最大值,从而出现了截幅,也叫截顶现象。In the process of voiceprint recognition, the pre-processing of the audio signal is very critical, which has a great influence on the accuracy of subsequent recognition. Among them, the pre-processing includes clipping detection of the audio signal. The clipping of the audio signal is mainly due to the excessively high amplitude of the audio signal, which exceeds the maximum value of the sampling value range, and thus the clipping occurs, which is also called the phenomenon of clipping.
截幅会导致语音信号中的信息受损,现有技术中,一旦检测到一段语音信号存在截幅,就将该段语音信号丢弃,这种方式会导致很多有效语音信号的丢失。Clipping can cause damage to the information in the voice signal. In the prior art, once a clip of a voice signal is detected, the voice signal is discarded. This method will cause the loss of many effective voice signals.
发明内容Summary of the invention
本申请实施例提供一种音频信号处理方法及装置,能够保留更多的有效音频信号,使得音频信号的可使用率得到较大的提升。The embodiments of the present application provide an audio signal processing method and device, which can retain more effective audio signals, so that the usable rate of the audio signals is greatly improved.
第一方面,本申请实施例提供了一种音频信号处理方法,包括:In the first aspect, an embodiment of the present application provides an audio signal processing method, including:
获取存在截幅的第一音频信号,所述第一音频信号包括N个采样点,所述N为正整数;Acquiring a first audio signal with clipping, where the first audio signal includes N sampling points, where N is a positive integer;
获取用于表示所述第一音频信号的截幅比例的目标数据,所述截幅比例用于表示所述N个采样点中存在截幅的采样点的数量与所述N之间的比值;Acquiring target data used to represent a clipping ratio of the first audio signal, where the clipping ratio is used to represent a ratio between the number of sample points with clipping in the N sampling points and the N;
若所述目标数据属于目标范围,将所述第一音频信号划分为至少两个音频段;If the target data belongs to the target range, dividing the first audio signal into at least two audio segments;
对所述至少两个音频段进行截幅检测处理,并根据所述截幅检测处理后的音频段,获得第二音频信号。Clipping detection processing is performed on the at least two audio segments, and a second audio signal is obtained according to the audio segment after the clipping detection processing.
第二方面,本申请实施例提供一种音频信号处理装置,包括:In a second aspect, an embodiment of the present application provides an audio signal processing device, including:
第一获取单元,用于获取存在截幅的第一音频信号,所述第一音频信号包括N个采样点,所述N为正整数;The first acquiring unit is configured to acquire a first audio signal with clipping, where the first audio signal includes N sampling points, where N is a positive integer;
第二获取单元,用于获取用于表示所述第一音频信号的截幅比例的目标数据,所述截幅比例用于表示所述N个采样点中存在截幅的采样点的数量与所述N之间的比值;The second acquiring unit is configured to acquire target data used to represent the clipping ratio of the first audio signal, where the clipping ratio is used to represent the number of sample points with clipping in the N sampling points and the total The ratio between N;
第一划分单元,用于若所述目标数据属于目标范围,将所述第一音频信号划分为至少两个音频段;A first dividing unit, configured to divide the first audio signal into at least two audio segments if the target data belongs to a target range;
第三获取单元,用于对所述至少两个音频段进行截幅检测处理,并根据所述截幅检测处理后的音频段,获得第二音频信号。The third acquisition unit is configured to perform clipping detection processing on the at least two audio segments, and obtain a second audio signal according to the audio segment after the clipping detection processing.
第三方面,本申请实施例提供一种音频信号处理装置,所述音频信号处理装置包括处理器、存储器以及通信接口,所述处理器、存储器和通信接口相互连接,其中,所述通信接口用于接收和发送数据,所述存储器用于存储程序代码,所述处理器用于调用所述程序代码,执行第一方面所述的方法。In a third aspect, an embodiment of the present application provides an audio signal processing device. The audio signal processing device includes a processor, a memory, and a communication interface. The processor, the memory, and the communication interface are connected to each other. For receiving and sending data, the memory is used to store program code, and the processor is used to call the program code to execute the method described in the first aspect.
第四方面,本申请实施例提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行以实现上述所述的方法。In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and the computer program is executed by a processor to implement the above-mentioned method.
本申请实施例中,通过对获取到存在截幅的第一音频信号后,根据用于表示第一音频信号的截幅比例的目标数据来确定对第一音频信号的处理方式,若所述目标数据属于目标范围,将所述第一音频信号划分为至少两个音频段,对所述至少两个音频段进行截幅检测处理,并根据所述截幅检测处理后的音频段,获得第二音频信号。本申请实施例不是简单的将存在截幅的音频信号丢弃,而是对存在截幅的音频信号进行进一步处理,能够尽可能多地保留有效音频信号,使得音频信号的可使用率得到较大的提升。In the embodiment of the present application, after the first audio signal with clipping is obtained, the processing method for the first audio signal is determined according to the target data used to indicate the clipping ratio of the first audio signal. If the target The data belongs to the target range, the first audio signal is divided into at least two audio segments, the at least two audio segments are subjected to clipping detection processing, and the second audio segment is obtained according to the audio segment after the clipping detection processing. audio signal. The embodiments of this application do not simply discard the audio signal with clipping, but further process the audio signal with clipping, so as to retain as many valid audio signals as possible, so that the usable rate of the audio signal is larger. Promote.
附图说明Description of the drawings
为了说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍。In order to illustrate the technical solutions in the embodiments of the present application or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art.
图1为本申请实施例提供的一种音频信号处理方法的流程图;FIG. 1 is a flowchart of an audio signal processing method provided by an embodiment of this application;
图2为本申请实施例提供的一种存在截幅的音频信号的波形图;Fig. 2 is a waveform diagram of an audio signal with clipping provided by an embodiment of the application;
图3本申请实施例提供的另一种音频信号处理方法的流程图;Fig. 3 is a flowchart of another audio signal processing method provided by an embodiment of the present application;
图4为本申请实施例提供的一种获取用于表示第一音频信号的截幅比例的目标数据的方法流程图;FIG. 4 is a flowchart of a method for obtaining target data representing the clipping ratio of a first audio signal according to an embodiment of the application;
图5为本申请实施例提供的一种确定目标数据是否属于目标范围的内容的流程图;FIG. 5 is a flow chart for determining whether target data belongs to the content of the target range provided by an embodiment of the application;
图6为本申请实施例提供的一种对音频段进行截幅检测处理的方法的流程图;FIG. 6 is a flowchart of a method for performing clipping detection processing on an audio segment according to an embodiment of the application;
图7为本申请实施例提供的一种确定是否丢弃第二音频信号方法的流程图;FIG. 7 is a flowchart of a method for determining whether to discard a second audio signal according to an embodiment of the application;
图8为本申请实施例提供的又一种音频信号处理方法的流程图;FIG. 8 is a flowchart of yet another audio signal processing method provided by an embodiment of this application;
图9为本申请实施例提供的一种不存在截幅的直方图;FIG. 9 is a histogram without clipping provided by an embodiment of the application;
图10为本申请实施例提供的一种存在截幅的直方图;FIG. 10 is a histogram with cutout provided by an embodiment of the application;
图11为本申请实施例提供的一种音频信号处理装置的结构示意图;FIG. 11 is a schematic structural diagram of an audio signal processing device provided by an embodiment of the application;
图12为本申请实施例提供的另一种音频信号处理装置的结构示意图;FIG. 12 is a schematic structural diagram of another audio signal processing device provided by an embodiment of the application;
图13为本申请实施例提供的又一种音频信号处理装置的结构示意图。FIG. 13 is a schematic structural diagram of another audio signal processing device provided by an embodiment of the application.
具体实施方式Detailed ways
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行描述。The technical solutions in the embodiments of the present application will be described below in conjunction with the drawings in the embodiments of the present application.
下面将结合附图1-附图10,对本申请实施例提供的一种音频信号处理方法进行详细介绍。Hereinafter, an audio signal processing method provided by an embodiment of the present application will be introduced in detail with reference to FIG. 1 to FIG. 10.
请参见图1,为本申请实施例提供了一种音频信号处理方法的流程示意图。如图1所示,本申请实施例的所述音频信号处理方法可以包括以下步骤S101-步骤S104。Please refer to FIG. 1, which provides a schematic flowchart of an audio signal processing method according to an embodiment of this application. As shown in FIG. 1, the audio signal processing method of the embodiment of the present application may include the following steps S101 to S104.
S101,获取存在截幅的第一音频信号,所述第一音频信号包括N个采样点,所述N为正整数;S101: Acquire a first audio signal with clipping, where the first audio signal includes N sampling points, where N is a positive integer;
在本实施例中,第一音频信号可以包括即时通讯过程中的语音数据信号,也可以是现场录制的音乐数据信号等,本申请实施例不作限定。In this embodiment, the first audio signal may include a voice data signal in the instant messaging process, or may be a music data signal recorded on site, etc., which is not limited in the embodiment of the present application.
其中,本实施例中获取第一音频信号的方式可以是,对多个音频信号进行截幅检测处 理,确定该音频信号是否存在截幅,然后获取存在截幅的至少一个音频信号,本申请实施例的第一音频信号可以是该至少一个音频信号中的任意一个。Wherein, the method for acquiring the first audio signal in this embodiment may be to perform clipping detection processing on multiple audio signals, determine whether the audio signal has clipping, and then obtain at least one audio signal with clipping. The implementation of this application The first audio signal in this example may be any one of the at least one audio signal.
其中,第一音频信号包括N个采样点,每个采样点的幅度值属于预先设定的目标采样范围,该目标采样范围由用于存储幅度值的比特数确定,比如若采用16bit存储幅度值,则目标采样范围为2-15~215-1,即是-32768~32767。Wherein, the first audio signal includes N sampling points, and the amplitude value of each sampling point belongs to a preset target sampling range, and the target sampling range is determined by the number of bits used to store the amplitude value, for example, if 16bit is used to store the amplitude value , The target sampling range is 2-15 to 215-1, which is -32768 to 32767.
可选的,对原始模拟信号采样量化得到第一音频信号的过程可以是,对原始模拟信号进行采样,获得N个采样点。其中,采样频率可以是8kHz,即1s时间内有8000个采样点。然后对各个采样点中每个采样点的原始幅度值进行量化,如图2所示,若某个采样点的原始幅度值超过目标采样范围的最大值,则以该目标采样范围内的最大值表示,若某个采样点的原始幅度值超过该目标采样范围的最小值,则以该目标采样范围内的最小值表示。通过量化后,可以把各个采样点的原始幅度值限定在目标采样范围内的N个幅度值,一个采样点对应一个幅度值。如图2所示,通过采样量化步骤后,第一音频信号存在截幅。Optionally, the process of sampling and quantizing the original analog signal to obtain the first audio signal may be sampling the original analog signal to obtain N sampling points. Among them, the sampling frequency can be 8kHz, that is, there are 8000 sampling points within 1s. Then quantize the original amplitude value of each sampling point in each sampling point. As shown in Figure 2, if the original amplitude value of a sampling point exceeds the maximum value of the target sampling range, the maximum value in the target sampling range Indicates that if the original amplitude value of a sampling point exceeds the minimum value of the target sampling range, it is expressed as the minimum value within the target sampling range. After quantization, the original amplitude value of each sampling point can be limited to N amplitude values within the target sampling range, and one sampling point corresponds to one amplitude value. As shown in FIG. 2, after the sampling and quantization steps, the first audio signal has a clipping.
需要说明的是,上述采样频率也可以是其他频率,可以根据用户的需要自定义,另外,用于存储幅度值的比特数也可以是其他比特数,可以根据用户所需的采样范围进行设置。It should be noted that the above sampling frequency can also be other frequencies, which can be customized according to the needs of the user. In addition, the number of bits used to store the amplitude value can also be other bits, which can be set according to the sampling range required by the user.
可选的,可以通过计算幅度值函数对原始模拟信号进行采样量化得到第一音频信号中所包含的N个采样点对应的N个幅度值,比如,设置该计算幅度值函数的采样频率以及用于存储幅度值的比特数,并将原始模拟信号输入该计算幅度值函数,即得到第一音频信号。Optionally, the original analog signal can be sampled and quantized by calculating the amplitude value function to obtain N amplitude values corresponding to the N sampling points contained in the first audio signal. For example, setting the sampling frequency of the calculated amplitude value function and using In order to store the number of bits of the amplitude value, and input the original analog signal into the calculated amplitude value function, the first audio signal is obtained.
S102,获取用于表示所述第一音频信号的截幅比例的目标数据,所述截幅比例用于表示所述N个采样点中存在截幅的采样点的数量与所述N之间的比值;S102. Obtain target data that is used to represent a clipping ratio of the first audio signal, where the clipping ratio is used to represent the difference between the number of sample points with clipping in the N sampling points and the N ratio;
在本实施例中,通过对第一音频信号的N个采样点对应的N个幅度值进行分析,获得用于表示第一音频信号的截幅比例的目标数据,其中,所述目标数据可以是截幅比例本身,也可以是其他能够反应截幅比例大小的数据,例如该目标数据可以在截幅比例的预设范围之内。In this embodiment, by analyzing the N amplitude values corresponding to the N sampling points of the first audio signal, the target data representing the clipping ratio of the first audio signal is obtained, where the target data may be The clipping ratio itself may also be other data that can reflect the size of the clipping ratio. For example, the target data may be within a preset range of the clipping ratio.
一种可选的实施方式,获取用于表示所述第一音频信号的截幅比例的目标数据的方式可以是,首先,对该第一音频信号包含的N采样点中每个采样点的幅度值进行分析,确定存在截幅的采样点,并计算存在截幅的采样点的数量与采样点总的数量N之间的比值,该比值即是目标数据,在该实施方式中,目标数据即是截幅比例本身。可选的,对第一音频信号进行分析以确定存在截幅的采样点的方法可以是,确定是否存在连续第一数量或者连续大于第一数量的采样点的幅度值大于第二阈值,该第一数量可以是3,该第二阈值可以是目标采样范围最大值的90%,比如,若连续5个采样点的幅度值均大于第二阈值,则将该5个采样点作为存在截幅的采样点。In an optional implementation manner, the method of obtaining target data used to indicate the clipping ratio of the first audio signal may be: first, the amplitude of each of the N sampling points included in the first audio signal Analyze the value, determine the sampling points with clipping, and calculate the ratio between the number of clipping points and the total number of sampling points N. This ratio is the target data. In this embodiment, the target data is Is the cut-off ratio itself. Optionally, the method for analyzing the first audio signal to determine the sample points with clipping may be to determine whether there is a first number of consecutive or continuously greater than the first number of sample points whose amplitude value is greater than a second threshold, A quantity can be 3, and the second threshold can be 90% of the maximum value of the target sampling range. For example, if the amplitude values of 5 consecutive sampling points are all greater than the second threshold, then the 5 sampling points will be regarded as those with clipping Sampling point.
另一种可选的实施方式,获取用于表示所述第一音频信号的截幅比例的目标数据的方式还可以是,在多个采样点中,统计该第一音频信号中多个采样点中的幅度值超过第一阈值的采样点的数量,再计算该数量与所有采样点的数量N之间的比值,该比值即是目标数据。由于存在截幅的采样点的幅度值都比较大,幅度值超过第一阈值,若该第一音频信号的截幅比例比较大,则计算出的比值也会比较大,因此,可以通过计算出的比值间接反映截幅比例的大小,但是该比值不是截幅比例本身,其中,第一阈值的数值可以进行设置,并不断进行校验和更新,以取得更加合理的第一阈值。In another optional implementation manner, the method of obtaining target data used to indicate the clipping ratio of the first audio signal may also be, among multiple sampling points, counting multiple sampling points in the first audio signal The number of sampling points whose amplitude value in exceeds the first threshold, and then the ratio between this number and the number N of all sampling points is calculated, and the ratio is the target data. Since the amplitude values of the sample points with clipping are relatively large, and the amplitude value exceeds the first threshold, if the clipping ratio of the first audio signal is relatively large, the calculated ratio will also be relatively large. Therefore, it can be calculated by The ratio of indirectly reflects the size of the clipping ratio, but the ratio is not the clipping ratio itself. Among them, the value of the first threshold can be set, and check and update continuously to obtain a more reasonable first threshold.
又一种可选的实施方式,获取用于表示所述第一音频信号的截幅比例的目标数据的方式还可以是,请参照图3,示意性示出获取用于表示第一音频信号的截幅比例的目标数据的流程,包括但不限于步骤S21-S23;In yet another optional implementation manner, the method of obtaining target data representing the cut-off ratio of the first audio signal may also be, referring to FIG. 3, which schematically illustrates obtaining the target data representing the first audio signal The flow of the target data of the clip ratio includes but not limited to steps S21-S23;
S21,从至少两个子范围中确定第一子范围;S21: Determine a first sub-range from at least two sub-ranges;
具体可选的,将上述第一音频信号的N个采样点对应的N个采样值所属的目标采样范围划分为至少两个子范围,所述至少两个子范围相互之间不重叠。划分方式本申请不作限定,可以是均分,也可以是不均分。上述的至少两个子范围可以是22个,也可以是24、30或者其他数值。Specifically, optionally, the target sampling range to which the N sampling values corresponding to the N sampling points of the first audio signal belong is divided into at least two subranges, and the at least two subranges do not overlap with each other. The division method is not limited in this application, and it can be divided equally or unevenly. The aforementioned at least two sub-ranges can be 22, or 24, 30, or other numerical values.
从所述至少两个相互不重叠的子范围中确定第一子范围,所述第一子范围是指幅度值为至少两个子范围中幅度值最大的子范围,比如,该至少两个子范围包括[0,7]、[8,15]、[16,23],则该第一子范围即是[16,23]。The first sub-range is determined from the at least two non-overlapping sub-ranges, and the first sub-range refers to the sub-range with the largest amplitude value among the at least two sub-ranges. For example, the at least two sub-ranges include [0,7], [8,15], [16,23], the first sub-range is [16,23].
S22,获取上述N个采样点中幅度值属于上述第一子范围的采样点的数量,作为第一数量;S22: Acquire the number of sampling points whose amplitude values belong to the first sub-range among the N sampling points as the first number;
S23,计算上述第一数量与上述N之间的比值,并将上述比值作为用于表示上述第一音频信号的截幅比例的目标数据。S23: Calculate the ratio between the first number and the N, and use the ratio as target data for representing the clipping ratio of the first audio signal.
如上述所述,第一音频信号中采样点的幅度值属于第一子范围的数量中绝大部分是存在截幅的,所以计算第一数量与采样点的总数量N之间的比值,作为目标数据,该目标数据可以反映第一音频信号的截幅比例大小。As mentioned above, most of the amplitude values of the sampling points in the first audio signal that belong to the first sub-range are clipped, so the ratio between the first number and the total number of sampling points N is calculated as Target data, the target data can reflect the size of the clipping ratio of the first audio signal.
S103,若所述目标数据属于目标范围,将所述第一音频信号划分为至少两个音频段;S103: If the target data belongs to a target range, divide the first audio signal into at least two audio segments;
在本实施例中,获取到目标数据后,判断所述获取到的目标数据是否属于的目标范围,如果属于目标范围,则将第一音频信号划分为至少两个音频段,划分方式可以是以1秒为单位进行划分,或者也可以是其他时间单位进行划分,比如5秒。In this embodiment, after the target data is acquired, it is determined whether the acquired target data belongs to the target range. If it belongs to the target range, the first audio signal is divided into at least two audio segments. The division method may be It can be divided in units of 1 second, or can also be divided in other time units, such as 5 seconds.
其中,目标范围可以是60%至80%,可以理解的是,目标范围也可以是其他范围,本申请实施例不作限定。Wherein, the target range may be 60% to 80%. It is understood that the target range may also be other ranges, which are not limited in the embodiments of the present application.
请参照图4,为本申请实施例提供的一种目标数据属于不同范围的处理流程图,如图所示,包括步骤S31-S35:Please refer to FIG. 4, which is a flowchart of processing target data belonging to different ranges according to an embodiment of this application. As shown in the figure, it includes steps S31-S35:
S31,获取目标数据;S31: Obtain target data;
S32,确定目标数据是否属于目标范围,若目标数据属于目标范围执行步骤S33,若目标数据不属于目标范围,则执行步骤S34或者S35,若目标数据大于第三阈值,则执行步骤S35,若目标数据小于第四阈值,则执行步骤S34;其中,第三阈值为目标范围的最大值,第四阈值为目标范围的最小值。S32. Determine whether the target data belongs to the target range. If the target data belongs to the target range, perform step S33; if the target data does not belong to the target range, perform step S34 or S35; if the target data is greater than the third threshold, perform step S35. If the data is less than the fourth threshold value, step S34 is executed; wherein, the third threshold value is the maximum value of the target range, and the fourth threshold value is the minimum value of the target range.
S33,将第一音频信号划分为至少两个音频段,并对该至少两个音频段进行截幅检测处理;S33: Divide the first audio signal into at least two audio segments, and perform clipping detection processing on the at least two audio segments;
S34,将第一音频信号确定为可用音频信号;S34: Determine the first audio signal as an available audio signal;
S35,将第一音频信号丢弃。S35. Discard the first audio signal.
确定目标数据是否属于目标范围的结论有两种,一种是:目标数据属于目标范围,具体请参照步骤S104,在此不再赘述。另一种是:目标数据不属于目标范围,若目标数据大于第三阈值,说明第一音频信号中存在截幅的采样点的数量与总采样点的数量的比例太高, 存在截幅的采样点数量太多,若采用该第一音频信号进行声纹识别模型的训练,会降低声纹识别的验证率,则将第一音频信号丢弃。若目标数据小于第四阈值,说明第一音频信号中存在截幅的采样点的数量与总采样点的数量之间的比例比较小,存在截幅的采样点数量比较少,不足以影响到第一音频信号的信息受损,对后续实际处理几乎无影响,所以就可以不用进行音频段划分以及对音频段的截幅检测处理,而直接输入系统进行后续处理,例如,可以直接采用该第一语音信号进行声纹识别模型的训练。There are two conclusions for determining whether the target data belongs to the target range. One is: the target data belongs to the target range. For details, please refer to step S104, which will not be repeated here. The other is: the target data does not belong to the target range. If the target data is greater than the third threshold, it means that the ratio of the number of sample points with clipping to the total number of sample points in the first audio signal is too high, and there are samples with clipping There are too many points. If the first audio signal is used to train the voiceprint recognition model, the verification rate of the voiceprint recognition will be reduced, and the first audio signal is discarded. If the target data is less than the fourth threshold, it means that the ratio between the number of sample points with clipping in the first audio signal and the total number of sample points is relatively small, and the number of sample points with clipping is relatively small, which is not enough to affect the first audio signal. The information of an audio signal is damaged and has almost no effect on the subsequent actual processing. Therefore, it is not necessary to divide the audio segment and the clipping detection processing of the audio segment, but directly enter the system for subsequent processing. For example, the first one can be directly used. The voice signal is trained on the voiceprint recognition model.
S104,对所述至少两个音频段进行截幅检测处理,并根据所述截幅检测处理后的音频段,获得第二音频信号。S104: Perform clipping detection processing on the at least two audio segments, and obtain a second audio signal according to the audio segment after the clipping detection processing.
在本实施例中,若目标数据属于目标范围,可以将第一音频信号划分为至少两个音频段,划分的方法可以是平均划分,即每个音频段的时长为目标时长,目标时长可以是1s或5s等等,依次检测每个语音段是否存在截幅。In this embodiment, if the target data belongs to the target range, the first audio signal can be divided into at least two audio segments. The method of division can be equal division, that is, the duration of each audio segment is the target duration, and the target duration can be 1s or 5s, etc., in order to detect whether there is a clip in each voice segment.
其中,检测每个音频段是否存在截幅的检测方式可以是,可以检测每个语音段中,是否存在连续第一数量或者大于第一数量的采样点的幅度值的绝对值均大于第二阈值,其中,第一数量可以是3,第二阈值可以是采样值范围中最大值的与目标比例的乘积,目标比例可以是90%,即32768*0.9≈29491,若一个音频段中存在连续三个或三个以上采样点的幅度值的绝对值均超过采样值范围中最大值的90%,则确定该音频段存在截幅,将该语音段进行丢弃。需要说明的是,上述90%的比例还可以是其他比例,比如91%,89%,85%以及95%等等,即在90%周围即可,上述第一数量可以是3,也可以是其他数值,第一数量、目标比例以及采样频率之间可以存着相互约束关系。Wherein, the detection method for detecting whether there is clipping in each audio segment may be to detect whether there is a first number of consecutive or greater than the first number of sampling points in each voice segment. The absolute value of the amplitude value is greater than the second threshold. , Where the first number can be 3, and the second threshold can be the product of the maximum value in the sampling value range and the target ratio. The target ratio can be 90%, that is, 32768*0.9≈29491. If there are three consecutive audio segments in an audio segment, If the absolute value of the amplitude value of one or more sampling points exceeds 90% of the maximum value in the sampling value range, it is determined that the audio segment has an amplitude cut, and the speech segment is discarded. It should be noted that the above-mentioned 90% ratio can also be other ratios, such as 91%, 89%, 85%, 95%, etc., that is, around 90%. The above-mentioned first number can be 3 or For other values, there can be a mutual constraint relationship between the first number, target ratio, and sampling frequency.
若不存在连续三个采样点的幅度值的绝对值均超过采样值范围中最大值的90%,则说明该音频段不存在截幅,则确定该音频段为可用音频段,保留该音频段。上述通过语音段的方式去检测是否存在截幅可以避免剩余语音段不连续的情况。If there is no absolute value of the amplitude values of three consecutive sampling points that all exceed 90% of the maximum value in the sampling value range, it means that the audio segment does not have clipping, then the audio segment is determined to be an available audio segment, and the audio segment is retained . The above detection of whether there is clipping through the voice segment can avoid the discontinuity of the remaining voice segment.
可选的,根据对该至少两个音频段的截幅检测处理结果,可以获得第二音频信号,比如,将该至少两个音频段中,存在截幅的音频段丢弃,保留不存在截幅的音频段,再将所有不存在截幅的音频段按照时间先后顺序组成第二音频信号。Optionally, the second audio signal can be obtained according to the results of the clipping detection processing on the at least two audio segments, for example, the audio segment with clipping in the at least two audio segments is discarded, and no clipping is retained. Then, all audio segments that do not have clipped amplitudes are combined into a second audio signal in chronological order.
其中,对于上述实施例中所提及的步骤S104,可以参照图5,如图所示,为本申请提出的一种对音频段进行截幅检测处理的示意图,包括但不限于步骤S41-S44;For step S104 mentioned in the above embodiment, refer to FIG. 5. As shown in the figure, this application proposes a schematic diagram of performing clipping detection processing on an audio segment, including but not limited to steps S41-S44. ;
S41,针对所述至少两个音频段中的每个音频段,检测所述音频段是否存在截幅;S41: For each audio segment of the at least two audio segments, detect whether the audio segment has clipping;
S42,若所述音频段存在截幅,则将所述音频段丢弃;S42: If there is a clip in the audio segment, discard the audio segment;
S43,获取上述至少两个音频段中所述丢弃后的剩余音频段;S43. Obtain the discarded remaining audio segments in the at least two audio segments;
S44,根据所述剩余音频段,获得第二音频信号。S44: Obtain a second audio signal according to the remaining audio segment.
可选的,获得第二音频段后,为确保根据剩余音频段而获得的第二音频信号在后续的系统中是否可以继续使用,可以先检测第二音频信号的音频长度是否满足一定的条件,其确定方法可以参照图6,确定是否丢弃第二音频信号方法的流程图如图所示,包括但不限于步骤S51-S54;Optionally, after obtaining the second audio segment, in order to ensure whether the second audio signal obtained according to the remaining audio segment can continue to be used in subsequent systems, it can be first detected whether the audio length of the second audio signal meets a certain condition, The determination method can refer to FIG. 6. The flowchart of the method for determining whether to discard the second audio signal is shown in the figure, including but not limited to steps S51-S54;
S51,获得第二音频信号;S51: Obtain a second audio signal;
S52,检测第二音频信号的音频长度是否大于或者等于第一阈值;若第二音频信号的音频长度小于第一阈值,执行步骤S53,若第二音频信号的音频长度大于或者等于第一阈值, 执行步骤S54;S52: Detect whether the audio length of the second audio signal is greater than or equal to the first threshold; if the audio length of the second audio signal is less than the first threshold, perform step S53, if the audio length of the second audio signal is greater than or equal to the first threshold, Go to step S54;
S53,丢弃第二音频信号;S53: Discard the second audio signal;
S54,确定第二音频信号为可用的音频信号;S54: Determine that the second audio signal is an available audio signal;
其中,上述提及的第一阈值是指能够输入后续系统进行处理的音频信号的长度,比如在文本无关的声纹注册场景中,需要注册语音信号达到20s长度,因此可以判断第二音频信号的音频长度是否大于或者等于20S,若是,则保留该第二音频信号,并利用该第二音频信号进行声纹识别模型的训练。Among them, the first threshold mentioned above refers to the length of the audio signal that can be input to the subsequent system for processing. For example, in a text-independent voiceprint registration scenario, the registered voice signal needs to be 20s in length, so the second audio signal can be judged. Whether the audio length is greater than or equal to 20S, if so, the second audio signal is retained, and the second audio signal is used to train the voiceprint recognition model.
本申请实施例中,通过对获取到存在截幅的第一音频信号后,根据用于表示第一音频信号的截幅比例的目标数据来确定对第一音频信号的处理方式,若所述目标数据属于目标范围,将所述第一音频信号划分为至少两个音频段,对所述至少两个音频段进行截幅检测处理,并根据所述截幅检测处理后的音频段,获得第二音频信号。本申请实施例不是简单的将存在截幅的音频信号丢弃,而是对存在截幅的音频信号进行进一步处理,能够尽可能多地保留有效音频信号,使得音频信号的可使用率得到较大的提升。In the embodiment of the present application, after the first audio signal with clipping is obtained, the processing method for the first audio signal is determined according to the target data used to indicate the clipping ratio of the first audio signal. If the target The data belongs to the target range, the first audio signal is divided into at least two audio segments, the at least two audio segments are subjected to clipping detection processing, and the second audio segment is obtained according to the audio segment after the clipping detection processing. audio signal. The embodiments of this application do not simply discard the audio signal with clipping, but further process the audio signal with clipping, so as to retain as many valid audio signals as possible, so that the usable rate of the audio signal is larger. Promote.
在另一个实施例中,在步骤S101获取存在截幅的第一音频信号之前,可以先确定第一音频信号中是否存在截幅,可选的,确定第一音频信号是否存在截幅的检测方式包括但不限于以下两种可选的实施方式,第一种可选的实施方式请参照图7所示,包括但不限于步骤S201-S202,第二种可选的实施方式,请参照图8所示,包括但不限于步骤S301-S304,下面具体阐述该两种可选的实施方式:In another embodiment, before acquiring the first audio signal with clipping in step S101, it may be determined whether clipping exists in the first audio signal. Optionally, a detection method for determining whether clipping exists in the first audio signal may be performed. Including but not limited to the following two optional implementation manners, the first optional implementation manner, please refer to Figure 7, including but not limited to steps S201-S202, the second optional implementation manner, please refer to Figure 8 As shown, including but not limited to steps S301-S304, the two optional implementation manners are specifically described below:
第一种可选的实施方式为:The first optional implementation is:
S201,获取所述第一音频信号包括的N个采样点的幅度值;S201: Acquire amplitude values of N sampling points included in the first audio signal;
在本实施例中,对第一音频信号进行采样,具体内容参照步骤S101,可以通过计算幅度值函数对原始模拟信号进行采样量化得到第一音频信号中所包含的N个采样点对应的N个幅度值。In this embodiment, the first audio signal is sampled. For details, refer to step S101. The original analog signal can be sampled and quantized by calculating the amplitude value function to obtain N corresponding to the N sampling points contained in the first audio signal. Amplitude value.
S202,若所述第一音频信号的幅度值满足第一条件,则确定所述第一音频信号存在截幅。S202: If the amplitude value of the first audio signal satisfies a first condition, it is determined that the first audio signal has an amplitude clipping.
其中,所述第一条件包括:连续第一数量或者连续大于第一数量的采样点的幅度值大于第二阈值。若第一音频信号的采样点的幅度值满足:连续第一数量或者连续大于第一数量的采样点的幅度值大于第二阈值,则可以确定第一音频信号中存在截幅。具体请参照步骤S104,在此不再赘述。Wherein, the first condition includes: the amplitude value of the consecutive first number or consecutively greater than the first number of sampling points is greater than the second threshold. If the amplitude value of the sampling point of the first audio signal satisfies: the amplitude value of the consecutive first number or consecutively greater than the first number of sampling points is greater than the second threshold, it can be determined that there is a clipping in the first audio signal. For details, please refer to step S104, which will not be repeated here.
第二种可选的实施方式为:The second optional implementation is:
S301,将目标采样范围划分为至少两个子范围;S301: Divide the target sampling range into at least two sub-ranges;
S302,统计所述N个采样点的幅度值中属于上述至少两个子范围中每个子范围的采样点的数量;S302: Count the number of sampling points belonging to each of the at least two sub-ranges among the amplitude values of the N sampling points;
S302,构建直方图;S302, construct a histogram;
具体可选的,将上述第一音频信号的N个采样点对应的N个采样值所属的目标采样范围划分为至少两个子范围,所述至少两个子范围相互之间不重叠。划分方式本申请不作限定,可以是均分,也可以是不均分。上述的至少两个子范围可以是22个,也可以是24、30或者其他数值。Specifically, optionally, the target sampling range to which the N sampling values corresponding to the N sampling points of the first audio signal belong is divided into at least two subranges, and the at least two subranges do not overlap with each other. The division method is not limited in this application, and it can be divided equally or unevenly. The aforementioned at least two sub-ranges can be 22, or 24, 30, or other numerical values.
统计所述N个采样点中幅度值属于各子范围的采样点的数量,构建直方图。直方图的横轴可以为所述子范围,纵轴可以为第一音频信号中N个采样点中幅度值属于各子范围的采样点的数量。Count the number of sampling points whose amplitude values belong to each sub-range among the N sampling points, and construct a histogram. The horizontal axis of the histogram may be the sub-range, and the vertical axis may be the number of sampling points whose amplitude values belong to each sub-range among the N sampling points in the first audio signal.
S304,若所述直方图的变化趋势满足第二条件,则将确定所述第一音频信号存在截幅;S304: If the change trend of the histogram satisfies the second condition, it will be determined that the first audio signal has a clipped amplitude;
如图9和图10所示,将目标采样范围按照大小顺序均分为22子范围,如图9所示,若第一音频信号中不存在截幅,则随着子范围区间数值的升高,幅度值出现的次数就会逐渐减少;如图10所示,若第一音频信号中存在截幅,随着子范围区间数值的最高时,幅度值出现的次数也达到最高,就出现直方图的最后一个立柱高于前面所有立柱的现象,即直方图的最后一个子范围的频次值最高,将最后一个柱体所表示的频次值称为异常升高部分,所述第二条件是指直方图中存在异常升高部分。As shown in Figure 9 and Figure 10, the target sampling range is equally divided into 22 sub-ranges in order of magnitude. As shown in Figure 9, if there is no clipping in the first audio signal, as the value of the sub-range interval increases , The number of occurrences of the amplitude value will gradually decrease; as shown in Figure 10, if there is a clipping in the first audio signal, as the value of the sub-range interval reaches the highest, the number of occurrences of the amplitude value also reaches the highest, and a histogram The phenomenon that the last column is higher than all the previous columns, that is, the frequency value of the last sub-range of the histogram is the highest. The frequency value represented by the last column is called the abnormally elevated part, and the second condition refers to the histogram There is an abnormally elevated part in the figure.
若第一音频信号中不存在截幅,其音频信号的波形相对较平缓,N个采样点的幅度值大部分比较小,若第一音频信号中存在截幅,音频信号的波形的幅度就会比较大,其N个采样点的幅度值就会相对较大,导致较多采样点的幅度值出现在直方图中幅度值较大的子范围中。If there is no clipping in the first audio signal, the waveform of the audio signal is relatively smooth, and most of the amplitude values of N sampling points are relatively small. If there is clipping in the first audio signal, the amplitude of the audio signal's waveform will be If the value is larger, the amplitude value of the N sampling points will be relatively large, resulting in the amplitude value of more sampling points appearing in the sub-range of the larger amplitude value in the histogram.
在本实施例中,在步骤S101获取存在截幅的第一音频信号之前,先确定第一音频信号中是否存在截幅,从多个音频信号中获取存在截幅的至少一个音频信号,本申请实施例的第一音频信号可以是该至少一个音频信号中的任意一个,从而获得存在截幅的第一音频信号,对存在截幅的音频信号进行进一步处理,具体请参照上一实施例,能够尽可能多地保留有效音频信号,使得音频信号的可使用率得到较大的提升。In this embodiment, before acquiring the first audio signal with clipping in step S101, it is first determined whether there is clipping in the first audio signal, and at least one audio signal with clipping is obtained from a plurality of audio signals. The first audio signal of the embodiment may be any one of the at least one audio signal, so as to obtain the first audio signal with clipped amplitude, and further process the audio signal with clipped amplitude. For details, please refer to the previous embodiment. Keep as many effective audio signals as possible, so that the usable rate of audio signals is greatly improved.
请参见图11,为本申请实施例提供了一种音频信号处理装置的结构示意图。如图11所示,本申请实施例的所述音频信号处理装置可以包括:Refer to FIG. 11, which provides a schematic structural diagram of an audio signal processing apparatus according to an embodiment of the present application. As shown in FIG. 11, the audio signal processing apparatus of the embodiment of the present application may include:
第一获取单元11,用于获取存在截幅的第一音频信号,所述第一音频信号包括N个采样点,所述N为正整数;The first acquiring unit 11 is configured to acquire a first audio signal with clipping, where the first audio signal includes N sampling points, where N is a positive integer;
在本实施例中,第一音频信号可以包括即时通讯过程中的语音数据信号,也可以是现场录制的音乐数据信号等,本申请实施例不作限定。In this embodiment, the first audio signal may include a voice data signal in the instant messaging process, or may be a music data signal recorded on site, etc., which is not limited in the embodiment of the present application.
其中,本实施例中获取第一音频信号的方式可以是,对多个音频信号进行截幅检测处理,确定该音频信号是否存在截幅,然后获取存在截幅的至少一个音频信号,本申请实施例的第一音频信号可以是该至少一个音频信号中的任意一个。Wherein, the method for acquiring the first audio signal in this embodiment may be to perform clipping detection processing on multiple audio signals, determine whether the audio signal has clipping, and then obtain at least one audio signal with clipping. The implementation of this application The first audio signal in this example may be any one of the at least one audio signal.
其中,第一音频信号包括N个采样点,每个采样点的幅度值属于预先设定的目标采样范围,该目标采样范围由用于存储幅度值的比特数确定,比如若采用16bit存储幅度值,则目标采样范围为2-15~215-1,即是-32768~32767。Wherein, the first audio signal includes N sampling points, and the amplitude value of each sampling point belongs to a preset target sampling range, and the target sampling range is determined by the number of bits used to store the amplitude value, for example, if 16bit is used to store the amplitude value , The target sampling range is 2-15 to 215-1, which is -32768 to 32767.
可选的,对原始模拟信号采样量化得到第一音频信号的过程可以是,对原始模拟信号进行采样,获得N个采样点。其中,采样频率可以是8kHz,即1s时间内有8000个采样点。然后对各个采样点中每个采样点的原始幅度值进行量化,如图2所示,若某个采样点的原始幅度值超过目标采样范围的最大值,则以该目标采样范围内的最大值表示,若某个采样点的原始幅度值超过该目标采样范围的最小值,则以该目标采样范围内的最小值表示。通过量化后,可以把各个采样点的原始幅度值限定在目标采样范围内的N个幅度值,一个采 样点对应一个幅度值。Optionally, the process of sampling and quantizing the original analog signal to obtain the first audio signal may be sampling the original analog signal to obtain N sampling points. Among them, the sampling frequency can be 8kHz, that is, there are 8000 sampling points within 1s. Then quantize the original amplitude value of each sampling point in each sampling point. As shown in Figure 2, if the original amplitude value of a sampling point exceeds the maximum value of the target sampling range, the maximum value in the target sampling range Indicates that if the original amplitude value of a sampling point exceeds the minimum value of the target sampling range, it is expressed as the minimum value within the target sampling range. After quantization, the original amplitude value of each sampling point can be limited to N amplitude values within the target sampling range, and one sampling point corresponds to one amplitude value.
需要说明的是,上述采样频率也可以是其他频率,可以根据用户的需要自定义,另外,用于存储幅度值的比特数也可以是其他比特数,可以根据用户所需的采样范围进行设置。It should be noted that the above sampling frequency can also be other frequencies, which can be customized according to the needs of the user. In addition, the number of bits used to store the amplitude value can also be other bits, which can be set according to the sampling range required by the user.
可选的,可以通过计算幅度值函数对原始模拟信号进行采样量化得到第一音频信号中所包含的N个采样点对应的N个幅度值,比如,设置该计算幅度值函数的采样频率以及用于存储幅度值的比特数,并将原始模拟信号输入该计算幅度值函数,即得到第一音频信号。Optionally, the original analog signal can be sampled and quantized by calculating the amplitude value function to obtain N amplitude values corresponding to the N sampling points contained in the first audio signal. For example, setting the sampling frequency of the calculated amplitude value function and using In order to store the number of bits of the amplitude value, and input the original analog signal into the calculated amplitude value function, the first audio signal is obtained.
第二获取单元12,用于获取用于表示所述第一音频信号的截幅比例的目标数据,所述截幅比例用于表示所述N个采样点中存在截幅的采样点的数量与所述N之间的比值;The second acquiring unit 12 is configured to acquire target data used to represent a clipping ratio of the first audio signal, where the clipping ratio is used to represent the number of sample points with clipping in the N sampling points and The ratio between said N;
在本实施例中,通过对第一音频信号的N个采样点对应的N个幅度值进行分析,获得用于表示第一音频信号的截幅比例的目标数据,其中,所述目标数据可以是截幅比例本身,也可以是其他能够反应截幅比例大小的数据,例如该目标数据可以在截幅比例的预设范围之内。In this embodiment, by analyzing the N amplitude values corresponding to the N sampling points of the first audio signal, the target data representing the clipping ratio of the first audio signal is obtained, where the target data may be The clipping ratio itself may also be other data that can reflect the size of the clipping ratio. For example, the target data may be within a preset range of the clipping ratio.
一种可选的实施方式,获取用于表示所述第一音频信号的截幅比例的目标数据的方式可以是,首先,对该第一音频信号包含的N采样点中每个采样点的幅度值进行分析,确定存在截幅的采样点,并计算存在截幅的采样点的数量与采样点总的数量N之间的比值,该比值即是目标数据,在该实施方式中,目标数据即是截幅比例本身。可选的,对第一音频信号进行分析以确定存在截幅的采样点的方法可以是,确定是否存在连续第一数量或者连续大于第一数量的采样点的幅度值大于第二阈值,该第一数量可以是3,该第二阈值可以是目标采样范围最大值的90%,比如,若连续5个采样点的幅度值均大于第二阈值,则将该5个采样点作为存在截幅的采样点。In an optional implementation manner, the method of obtaining target data used to indicate the clipping ratio of the first audio signal may be: first, the amplitude of each of the N sampling points included in the first audio signal Value analysis, determine the sampling points with clipping, and calculate the ratio between the number of clipping points and the total number of sampling points N. This ratio is the target data. In this embodiment, the target data is Is the cut-off ratio itself. Optionally, the method for analyzing the first audio signal to determine the sample points with clipping may be to determine whether there is a first number of consecutive or continuously greater than the first number of sample points whose amplitude value is greater than a second threshold, A quantity can be 3, and the second threshold value can be 90% of the maximum value of the target sampling range. For example, if the amplitude values of 5 consecutive sampling points are all greater than the second threshold value, then the 5 sampling points are regarded as those with clipping Sampling point.
另一种可选的实施方式,获取用于表示所述第一音频信号的截幅比例的目标数据的方式还可以是,在多个采样点中,统计该第一音频信号中多个采样点中的幅度值超过第一阈值的采样点的数量,再计算该数量与所有采样点的数量N之间的比值,该比值即是目标数据。由于存在截幅的采样点的幅度值都比较大,幅度值超过第一阈值,若该第一音频信号的截幅比例比较大,则计算出的比值也会比较大,因此,可以通过计算出的比值间接反映截幅比例的大小,但是该比值不是截幅比例本身,其中,第一阈值的数值可以进行设置,并不断进行校验和更新,以取得更加合理的第一阈值。In another optional implementation manner, the method of obtaining target data used to indicate the clipping ratio of the first audio signal may also be, among multiple sampling points, counting multiple sampling points in the first audio signal The number of sampling points whose amplitude value in exceeds the first threshold, and then the ratio between this number and the number N of all sampling points is calculated, and the ratio is the target data. Since the amplitude values of the sample points with clipping are relatively large, and the amplitude value exceeds the first threshold, if the clipping ratio of the first audio signal is relatively large, the calculated ratio will also be relatively large. Therefore, it can be calculated by The ratio of indirectly reflects the size of the clipping ratio, but the ratio is not the clipping ratio itself. Among them, the value of the first threshold can be set, and check and update continuously to obtain a more reasonable first threshold.
又一种可选的实施方式,所述第二获取单元具体用于,请参照图3,示意性示出获取用于表示第一音频信号的截幅比例的目标数据的流程,包括但不限于步骤S21-S23;In yet another optional implementation manner, the second acquiring unit is specifically configured to, referring to FIG. 3, schematically illustrate the process of acquiring target data used to represent the clipping ratio of the first audio signal, including but not limited to Steps S21-S23;
S21,从至少两个子范围中确定第一子范围;S21: Determine a first sub-range from at least two sub-ranges;
具体可选的,将上述第一音频信号的N个采样点对应的N个采样值所属的目标采样范围划分为至少两个子范围,所述至少两个子范围相互之间不重叠。划分方式本申请不作限定,可以是均分,也可以是不均分。上述的至少两个子范围可以是22个,也可以是24、30或者其他数值。Specifically, optionally, the target sampling range to which the N sampling values corresponding to the N sampling points of the first audio signal belong is divided into at least two subranges, and the at least two subranges do not overlap with each other. The division method is not limited in this application, and it can be divided equally or unevenly. The aforementioned at least two sub-ranges can be 22, or 24, 30, or other numerical values.
从所述至少两个相互不重叠的子范围中确定第一子范围,所述第一子范围是指幅度值为至少两个子范围中幅度值最大的子范围,比如,该至少两个子范围包括[0,7]、[8,15]、[16,23],则该第一子范围即是[16,23]。The first sub-range is determined from the at least two non-overlapping sub-ranges, and the first sub-range refers to the sub-range with the largest amplitude value among the at least two sub-ranges. For example, the at least two sub-ranges include [0,7], [8,15], [16,23], the first sub-range is [16,23].
S22,获取上述N个采样点中幅度值属于上述第一子范围的采样点的数量,作为第一 数量;S22: Acquire the number of sampling points whose amplitude values belong to the first sub-range among the N sampling points as the first number;
S23,计算上述第一数量与上述N之间的比值,并将上述比值作为用于表示上述第一音频信号的截幅比例的目标数据。S23: Calculate the ratio between the first number and the N, and use the ratio as target data for representing the clipping ratio of the first audio signal.
如上述所述,第一音频信号中采样点的幅度值属于第一子范围的数量中绝大部分是存在截幅的,所以计算第一数量与采样点的总数量N之间的比值,作为目标数据,该目标数据可以反映第一音频信号的截幅比例大小。As mentioned above, most of the amplitude values of the sampling points in the first audio signal that belong to the first sub-range are clipped, so the ratio between the first number and the total number of sampling points N is calculated as Target data, the target data can reflect the size of the clipping ratio of the first audio signal.
第一划分单元13,用于若所述目标数据属于目标范围,将所述第一音频信号划分为至少两个音频段;The first dividing unit 13 is configured to divide the first audio signal into at least two audio segments if the target data belongs to a target range;
在本实施例中,获取到目标数据后,判断所述获取到的目标数据是否属于的目标范围,如果属于目标范围,则将第一音频信号划分为至少两个音频段,划分方式可以是以1秒为单位进行划分,或者也可以是其他时间单位进行划分,比如5秒。In this embodiment, after the target data is acquired, it is determined whether the acquired target data belongs to the target range. If it belongs to the target range, the first audio signal is divided into at least two audio segments. The division method may be It can be divided in units of 1 second, or can also be divided in other time units, such as 5 seconds.
其中,目标范围可以是60%至80%,可以理解的是,目标范围也可以是其他范围,本申请实施例不作限定。Wherein, the target range may be 60% to 80%. It is understood that the target range may also be other ranges, which are not limited in the embodiments of the present application.
请参照图4,为本申请实施例提供的一种目标数据属于不同范围的处理流程图,如图所示,包括步骤S31-S35:Please refer to FIG. 4, which is a flowchart of processing target data belonging to different ranges according to an embodiment of this application. As shown in the figure, it includes steps S31-S35:
S31,获取目标数据;S31: Obtain target data;
S32,确定目标数据是否属于目标范围,若目标数据属于目标范围执行步骤S33,若目标数据不属于目标范围,则执行步骤S34或者S35,若目标数据大于第三阈值,则执行步骤S35,若目标数据小于第四阈值,则执行步骤S34;其中,第三阈值为目标范围的最大值,第四阈值为目标范围的最小值。S32. Determine whether the target data belongs to the target range. If the target data belongs to the target range, perform step S33; if the target data does not belong to the target range, perform step S34 or S35; if the target data is greater than the third threshold, perform step S35. If the data is less than the fourth threshold value, step S34 is executed; wherein, the third threshold value is the maximum value of the target range, and the fourth threshold value is the minimum value of the target range.
S33,将第一音频信号划分为至少两个音频段,并对该至少两个音频段进行截幅检测处理;S33: Divide the first audio signal into at least two audio segments, and perform clipping detection processing on the at least two audio segments;
S34,将第一音频信号确定为可用音频信号;S34: Determine the first audio signal as an available audio signal;
S35,将第一音频信号丢弃。S35. Discard the first audio signal.
确定目标数据是否属于目标范围的结论有两种,一种是:目标数据属于目标范围,具体请参照步骤S104,在此不再赘述。另一种是:目标数据不属于目标范围,若目标数据大于第三阈值,说明第一音频信号中存在截幅的采样点的数量与总采样点的数量的比例太高,存在截幅的采样点数量太多,若采用该第一音频信号进行声纹识别模型的训练,会降低声纹识别的验证率,则将第一音频信号丢弃。若目标数据小于第四阈值,说明第一音频信号中存在截幅的采样点的数量与总采样点的数量之间的比例比较小,存在截幅的采样点数量比较少,不足以影响到第一音频信号的信息受损,对后续实际处理几乎无影响,所以就可以不用进行音频段划分以及对音频段的截幅检测处理,而直接输入系统进行后续处理,例如,可以直接采用该第一语音信号进行声纹识别模型的训练。There are two conclusions for determining whether the target data belongs to the target range. One is: the target data belongs to the target range. For details, please refer to step S104, which will not be repeated here. The other is: the target data does not belong to the target range. If the target data is greater than the third threshold, it means that the ratio of the number of sample points with clipping to the total number of sample points in the first audio signal is too high, and there are samples with clipping There are too many points. If the first audio signal is used to train the voiceprint recognition model, the verification rate of the voiceprint recognition will be reduced, and the first audio signal is discarded. If the target data is less than the fourth threshold, it means that the ratio between the number of sample points with clipping in the first audio signal and the total number of sample points is relatively small, and the number of sample points with clipping is relatively small, which is not enough to affect the first audio signal. The information of an audio signal is damaged and has almost no effect on the subsequent actual processing. Therefore, it is not necessary to divide the audio segment and the clipping detection processing of the audio segment, but directly enter the system for subsequent processing. For example, the first one can be directly used. The voice signal is trained on the voiceprint recognition model.
第三获取单元14,用于对所述至少两个音频段进行截幅检测处理,并根据所述截幅检测处理后的音频段,获得第二音频信号;The third acquiring unit 14 is configured to perform clipping detection processing on the at least two audio segments, and obtain a second audio signal according to the audio segment after the clipping detection processing;
在本实施例中,若目标数据属于目标范围,可以将第一音频信号划分为至少两个音频段,划分的方法可以是平均划分,即每个音频段的时长为目标时长,目标时长可以是1s或5s等等,依次检测每个语音段是否存在截幅。In this embodiment, if the target data belongs to the target range, the first audio signal can be divided into at least two audio segments. The method of division can be equal division, that is, the duration of each audio segment is the target duration, and the target duration can be 1s or 5s, etc., in order to detect whether there is a clip in each voice segment.
其中,检测每个音频段是否存在截幅的检测方式可以是,可以检测每个语音段中,是否存在连续第一数量或者大于第一数量的采样点的幅度值的绝对值均大于第二阈值,其中,第一数量可以是3,第二阈值可以是采样值范围中最大值的与目标比例的乘积,目标比例可以是90%,即32768*0.9≈29491,若一个音频段中存在连续三个或三个以上采样点的幅度值的绝对值均超过采样值范围中最大值的90%,则确定该音频段存在截幅,将该语音段进行丢弃。需要说明的是,上述90%的比例还可以是其他比例,比如91%,89%,85%以及95%等等,即在90%周围即可,上述第一数量可以是3,也可以是其他数值,第一数量、目标比例以及采样频率之间可以存着相互约束关系。Wherein, the detection method for detecting whether there is clipping in each audio segment may be to detect whether there is a first number of consecutive or greater than the first number of sampling points in each voice segment. The absolute value of the amplitude value is greater than the second threshold. , Where the first number can be 3, and the second threshold can be the product of the maximum value in the sampling value range and the target ratio. The target ratio can be 90%, that is, 32768*0.9≈29491. If there are three consecutive audio segments in an audio segment, If the absolute value of the amplitude value of one or more sampling points exceeds 90% of the maximum value in the sampling value range, it is determined that the audio segment has an amplitude cut, and the speech segment is discarded. It should be noted that the above-mentioned 90% ratio can also be other ratios, such as 91%, 89%, 85%, 95%, etc., that is, around 90%. The above-mentioned first number can be 3 or For other values, there can be a mutual constraint relationship between the first number, target ratio, and sampling frequency.
若不存在连续三个采样点的幅度值的绝对值均超过采样值范围中最大值的90%,则说明该音频段不存在截幅,则确定该音频段为可用音频段,保留该音频段。上述通过语音段的方式去检测是否存在截幅可以避免剩余语音段不连续的情况。If there is no absolute value of the amplitude values of three consecutive sampling points that all exceed 90% of the maximum value in the sampling value range, it means that the audio segment does not have clipping, then the audio segment is determined to be an available audio segment, and the audio segment is retained . The above detection of whether there is clipping through the voice segment can avoid the discontinuity of the remaining voice segment.
可选的,其中,所述第三获取模块具体用于,根据对该至少两个音频段的截幅检测处理结果,可以获得第二音频信号,比如,将该至少两个音频段中,存在截幅的音频段丢弃,保留不存在截幅的音频段,再将所有不存在截幅的音频段按照时间先后顺序组成第二音频信号。Optionally, the third obtaining module is specifically configured to obtain the second audio signal according to the clipping detection processing result of the at least two audio segments, for example, in the at least two audio segments, The clipped audio segments are discarded, the audio segments that do not have clipped are retained, and all audio segments that do not have clipped are formed into a second audio signal in chronological order.
其中,对于上述实施例中所提及的步骤S104,可以参照图5,如图所示,为本申请提出的一种对音频段进行截幅检测处理的示意图,包括但不限于步骤S41-S44;For step S104 mentioned in the above embodiment, refer to FIG. 5. As shown in the figure, this application proposes a schematic diagram of performing clipping detection processing on an audio segment, including but not limited to steps S41-S44. ;
S41,针对所述至少两个音频段中的每个音频段,检测所述音频段是否存在截幅;S41: For each audio segment of the at least two audio segments, detect whether the audio segment has clipping;
S42,若所述音频段存在截幅,则将所述音频段丢弃;S42: If there is a clip in the audio segment, discard the audio segment;
S43,获取上述至少两个音频段中所述丢弃后的剩余音频段;S43. Obtain the discarded remaining audio segments in the at least two audio segments;
S44,根据所述剩余音频段,获得第二音频信号。S44: Obtain a second audio signal according to the remaining audio segment.
可选的,获得第二音频段后,为确保根据剩余音频段而获得的第二音频信号在后续的系统中是否可以继续使用,可以先检测第二音频信号的音频长度是否满足一定的条件,其确定方法可以参照图6,确定是否丢弃第二音频信号方法的流程图如图所示,包括但不限于步骤S51-S54;Optionally, after obtaining the second audio segment, in order to ensure whether the second audio signal obtained according to the remaining audio segment can continue to be used in subsequent systems, it can be first detected whether the audio length of the second audio signal meets a certain condition, The determination method can refer to FIG. 6. The flowchart of the method for determining whether to discard the second audio signal is shown in the figure, including but not limited to steps S51-S54;
S51,获得第二音频信号;S51: Obtain a second audio signal;
S52,检测第二音频信号的音频长度是否大于或者等于第一阈值;若第二音频信号的音频长度小于第一阈值,执行步骤S53,若第二音频信号的音频长度大于或者等于第一阈值,执行步骤S54;S52: Detect whether the audio length of the second audio signal is greater than or equal to the first threshold; if the audio length of the second audio signal is less than the first threshold, perform step S53, if the audio length of the second audio signal is greater than or equal to the first threshold, Go to step S54;
S53,丢弃第二音频信号;S53: Discard the second audio signal;
S54,确定第二音频信号为可用的音频信号;S54: Determine that the second audio signal is an available audio signal;
其中,上述提及的第一阈值是指能够输入后续系统进行处理的音频信号的长度,比如在文本无关的声纹注册场景中,需要注册语音信号达到20s长度,因此可以判断第二音频信号的音频长度是否大于或者等于20S,若是,则保留该第二音频信号,并利用该第二音频信号进行声纹识别模型的训练。Among them, the first threshold mentioned above refers to the length of the audio signal that can be input to the subsequent system for processing. For example, in a text-independent voiceprint registration scenario, the registered voice signal needs to be 20s in length, so the second audio signal can be judged. Whether the audio length is greater than or equal to 20S, if so, the second audio signal is retained, and the second audio signal is used to train the voiceprint recognition model.
在一个实施例中,所述第三获取单元具体用于:In an embodiment, the third acquiring unit is specifically configured to:
针对所述至少两个音频段中的每个音频段,检测所述音频段是否存在截幅;For each audio segment of the at least two audio segments, detecting whether the audio segment has clipping;
若所述音频段存在截幅,则将所述音频段丢弃;If there is clipping in the audio segment, discard the audio segment;
获取所述至少两个音频段中所述丢弃后的剩余音频段;Acquiring the discarded remaining audio segments in the at least two audio segments;
根据所述剩余音频段,获得第二音频信号。According to the remaining audio segment, a second audio signal is obtained.
可选的,如图12所示,所述装置还包括:Optionally, as shown in FIG. 12, the device further includes:
检测单元,用于检测所述第二音频信号的音频长度是否大于或者等于第一阈值;A detecting unit, configured to detect whether the audio length of the second audio signal is greater than or equal to a first threshold;
第一确定单元,用于若所述第二音频信号的音频长度大于或者等于所述第一阈值,则确定所述第二音频信号为可用的音频信号;A first determining unit, configured to determine that the second audio signal is an available audio signal if the audio length of the second audio signal is greater than or equal to the first threshold;
若所述第二音频信号的音频长度小于所述第一阈值,则将所述第二语音信号丢弃。If the audio length of the second audio signal is less than the first threshold, the second voice signal is discarded.
在一个实施例中,所述至少两个音频段中的每个音频段包括至少一个采样点,所述第三获取单元检测所述音频段是否存在截幅的检测方式是,获取所述音频段包括的至少一个采样点中每个采样点的幅度值;In an embodiment, each audio segment of the at least two audio segments includes at least one sampling point, and the third acquiring unit detects whether there is clipping in the audio segment by acquiring the audio segment The amplitude value of each sampling point in at least one sampling point included;
若所述至少一个采样点的幅度值满足第一条件,则确定所述音频段存在截幅,所述第一条件包括:连续第一数量或者连续大于第一数量的采样点的幅度值大于第二阈值。If the amplitude value of the at least one sampling point satisfies the first condition, it is determined that the audio segment has a clipping, and the first condition includes: the amplitude value of the first number consecutively or consecutively greater than the first number of sampling points is greater than the first condition. Two thresholds.
可选的,如图12所示,所述装置还包括:Optionally, as shown in FIG. 12, the device further includes:
第四获取单元,用于获取所述第一音频信号包括的N个采样点中每个采样点的幅度值;A fourth acquiring unit, configured to acquire the amplitude value of each of the N sampling points included in the first audio signal;
第二确定单元,用于若所述N个采样点的幅度值满足第一条件,则确定所述第一音频信号存在截幅,所述第一条件包括:连续第一数量或者连续大于第一数量的采样点的幅度值大于第二阈值。The second determining unit is configured to determine that the first audio signal has a clipping if the amplitude values of the N sampling points meet a first condition, and the first condition includes: a continuous first number or a continuous greater than the first The amplitude value of the number of sampling points is greater than the second threshold.
在一个实施例中,所述N个采样点中每个采样点的幅度值属于目标采样范围;In an embodiment, the amplitude value of each sampling point in the N sampling points belongs to the target sampling range;
所述第二阈值为所述目标采样范围的最大值与目标比例的乘积。The second threshold is the product of the maximum value of the target sampling range and the target ratio.
可选的,如图12所示,所述装置还包括:Optionally, as shown in FIG. 12, the device further includes:
第二划分单元,用于将目标采样范围划分为至少两个子范围,所述至少两个子范围之间互不重叠,所述目标采样范围为所述第一音频信号包括的N个采样点的幅度值所在的范围;The second dividing unit is configured to divide the target sampling range into at least two sub-ranges, the at least two sub-ranges do not overlap each other, and the target sampling range is the amplitude of the N sampling points included in the first audio signal The range of the value;
统计单元,用于统计所述N个采样点的幅度值中属于所述至少两个子范围中每个子范围的采样点的数量;A statistical unit, configured to count the number of sampling points belonging to each of the at least two sub-ranges among the amplitude values of the N sampling points;
构建单元,用于构建直方图,所述直方图的横轴包括所述至少两个子范围,所述直方图的纵轴包括属于所述子范围的采样点的数量;A construction unit, configured to construct a histogram, the horizontal axis of the histogram includes the at least two sub-ranges, and the vertical axis of the histogram includes the number of sampling points belonging to the sub-ranges;
第三确定单元,用于若所述直方图的变化趋势满足第二条件,则确定所述第一音频信号存在截幅。The third determining unit is configured to determine that the first audio signal has a clipping if the change trend of the histogram meets the second condition.
在一个实施例中,所述第二获取单元具体用于:In an embodiment, the second acquiring unit is specifically configured to:
从所述至少两个子范围中确定第一子范围,所述第一子范围的幅度值为所述至少两个子范围中幅度值最大的子范围;Determine a first sub-range from the at least two sub-ranges, where the amplitude value of the first sub-range is the sub-range with the largest amplitude value among the at least two sub-ranges;
获取所述N个采样点中幅度值属于所述第一子范围的采样点的数量,作为第一数量;Acquiring, among the N sampling points, the number of sampling points whose amplitude values belong to the first sub-range as the first number;
计算所述第一数量与所述N之间的比值,并将所述比值作为用于表示所述第一音频信号的截幅比例的目标数据。The ratio between the first number and the N is calculated, and the ratio is used as target data for representing the clipping ratio of the first audio signal.
可选的,所述目标范围的最大值为第三阈值,所述目标范围的最小值为第四阈值,所述装置还包括第四确定单元;Optionally, the maximum value of the target range is a third threshold, and the minimum value of the target range is a fourth threshold, and the device further includes a fourth determining unit;
所述第四确定单元具体用于,若所述目标数据大于所述第三阈值,则将所述第一音频 信号丢弃;The fourth determining unit is specifically configured to discard the first audio signal if the target data is greater than the third threshold;
若所述目标数据小于所述第四阈值,将所述第一音频信号确定为可用的音频信号。If the target data is less than the fourth threshold, the first audio signal is determined as an available audio signal.
本申请实施例中,通过对获取到存在截幅的第一音频信号后,根据用于表示第一音频信号的截幅比例的目标数据来确定对第一音频信号的处理方式,根据截幅检测处理后获得第二音频信号。本申请实施例不是简单的将存在截幅的音频信号丢弃,而是对存在截幅的音频信号进行进一步处理,能够尽可能多地保留有效音频信号,使得音频信号的可使用率得到较大的提升。In the embodiment of the present application, after the first audio signal with clipping is obtained, the processing method for the first audio signal is determined according to the target data used to indicate the clipping ratio of the first audio signal, and the first audio signal is detected according to clipping After processing, the second audio signal is obtained. The embodiments of this application do not simply discard the audio signal with clipping, but further process the audio signal with clipping, so as to retain as many valid audio signals as possible, so that the usable rate of the audio signal is larger. Promote.
请参照图13,为本申请实施例提供的另一种音频信号处理装置的结构示意图,如图13所示,所述音频信号处理装置1000可以包括:至少一个处理器1001,例如CPU,至少一个通信接口1003,存储器1004,至少一个通信总线1002。其中,通信总线1002用于实现这些组件之间的连接通信。通信接口1003可选的可以包括标准的有线接口、无线接口(如WI-FI接口)。存储器1004可以是高速RAM存储器,也可以是非不稳定的存储器(non-volatile memory),例如至少一个磁盘存储器。存储器1004可选的还可以是至少一个位于远离前述处理器1001的存储装置。如图13所示,作为一种计算机存储介质的存储器1004中可以包括操作系统、网络通信模块以及程序指令。Please refer to FIG. 13, which is a schematic structural diagram of another audio signal processing device provided by an embodiment of this application. As shown in FIG. 13, the audio signal processing device 1000 may include: at least one processor 1001, such as a CPU, at least one Communication interface 1003, memory 1004, and at least one communication bus 1002. Among them, the communication bus 1002 is used to implement connection and communication between these components. The communication interface 1003 may optionally include a standard wired interface and a wireless interface (such as a WI-FI interface). The memory 1004 may be a high-speed RAM memory, or a non-volatile memory (non-volatile memory), such as at least one disk memory. Optionally, the memory 1004 may also be at least one storage device located far away from the foregoing processor 1001. As shown in FIG. 13, the memory 1004, which is a computer storage medium, may include an operating system, a network communication module, and program instructions.
在图13所示的音频信号处理装置1000中,处理器1001可以用于加载存储器1004中存储的程序指令,并具体执行以下操作:In the audio signal processing device 1000 shown in FIG. 13, the processor 1001 may be used to load program instructions stored in the memory 1004, and specifically perform the following operations:
获取存在截幅的第一音频信号,所述第一音频信号包括N个采样点,所述N为正整数;Acquiring a first audio signal with clipping, where the first audio signal includes N sampling points, where N is a positive integer;
获取用于表示所述第一音频信号的截幅比例的目标数据,所述截幅比例用于表示所述N个采样点中存在截幅的采样点的数量与所述N之间的比值;Acquiring target data used to represent a clipping ratio of the first audio signal, where the clipping ratio is used to represent a ratio between the number of sample points with clipping in the N sampling points and the N;
若所述目标数据属于目标范围,将所述第一音频信号划分为至少两个音频段;If the target data belongs to the target range, dividing the first audio signal into at least two audio segments;
对所述至少两个音频段进行截幅检测处理,并根据所述截幅检测处理后的音频段,获得第二音频信号。Clipping detection processing is performed on the at least two audio segments, and a second audio signal is obtained according to the audio segment after the clipping detection processing.
可选地,在获取存在截幅的第一音频信号之前,还包括:Optionally, before acquiring the first audio signal with clipped amplitude, the method further includes:
获取所述第一音频信号包括的N个采样点中每个采样点的幅度值;Acquiring the amplitude value of each of the N sampling points included in the first audio signal;
若所述N个采样点的幅度值满足第一条件,则确定所述第一音频信号存在截幅,所述第一条件包括:连续第一数量或者连续大于第一数量的采样点的幅度值大于第二阈值。If the amplitude values of the N sampling points meet the first condition, it is determined that the first audio signal has a clipping, and the first condition includes: the amplitude values of the first number consecutively or consecutively greater than the first number of sampling points Greater than the second threshold.
可选地,在获取存在截幅的第一音频信号之前,还包括:Optionally, before acquiring the first audio signal with clipped amplitude, the method further includes:
将目标采样范围划分为至少两个子范围,所述至少两个子范围之间互不重叠,所述目标采样范围为所述第一音频信号包括的N个采样点的幅度值所在的范围;Dividing the target sampling range into at least two sub-ranges, the at least two sub-ranges do not overlap each other, and the target sampling range is a range in which amplitude values of N sampling points included in the first audio signal are located;
统计所述N个采样点的幅度值中属于所述至少两个子范围中每个子范围的采样点的数量;Counting the number of sampling points belonging to each of the at least two sub-ranges among the amplitude values of the N sampling points;
构建直方图,所述直方图的横轴包括所述至少两个子范围,所述直方图的纵轴包括属于所述子范围的采样点的数量;Constructing a histogram, the horizontal axis of the histogram includes the at least two sub-ranges, and the vertical axis of the histogram includes the number of sampling points belonging to the sub-ranges;
若所述直方图的变化趋势满足第二条件,则确定所述第一音频信号存在截幅。If the change trend of the histogram satisfies the second condition, it is determined that the first audio signal has a clipping.
可选的,所述获取用于表示所述第一音频信号的截幅比例的目标数据包括:Optionally, the acquiring target data used to represent the clipping ratio of the first audio signal includes:
从所述至少两个子范围中确定第一子范围,所述第一子范围的幅度值为所述至少两个 子范围中幅度值最大的子范围;Determining a first sub-range from the at least two sub-ranges, where the amplitude value of the first sub-range is the sub-range with the largest amplitude value among the at least two sub-ranges;
获取所述N个采样点中幅度值属于所述第一子范围的采样点的数量,作为第一数量;Acquiring, among the N sampling points, the number of sampling points whose amplitude values belong to the first sub-range as the first number;
计算所述第一数量与所述N之间的比值,并将所述比值作为用于表示所述第一音频信号的截幅比例的目标数据。The ratio between the first number and the N is calculated, and the ratio is used as target data for representing the clipping ratio of the first audio signal.
可选的,所述所述对所述至少两个音频段进行截幅检测处理,并根据所述截幅检测处理后的音频段,获得第二音频信号,包括:Optionally, the performing clip detection processing on the at least two audio segments, and obtaining a second audio signal according to the audio segment after the clip detection processing includes:
针对所述至少两个音频段中的每个音频段,检测所述音频段是否存在截幅;For each audio segment of the at least two audio segments, detecting whether the audio segment has clipping;
若所述音频段存在截幅,则将所述音频段丢弃;If there is clipping in the audio segment, discard the audio segment;
获取所述至少两个音频段中所述丢弃后的剩余音频段;Acquiring the discarded remaining audio segments in the at least two audio segments;
根据所述剩余音频段,获得第二音频信号。According to the remaining audio segment, a second audio signal is obtained.
可选的,所述至少两个音频段中的每个音频段包括至少一个采样点,所述确定所述音频段是否存在截幅,包括:Optionally, each of the at least two audio segments includes at least one sampling point, and the determining whether the audio segment has clipping includes:
获取所述音频段包括的至少一个采样点中每个采样点的幅度值;Acquiring the amplitude value of each sampling point in at least one sampling point included in the audio segment;
若所述至少一个采样点的幅度值满足第一条件,则确定所述音频段存在截幅,所述第一条件包括:连续第一数量或者连续大于第一数量的采样点的幅度值大于第二阈值。If the amplitude value of the at least one sampling point satisfies the first condition, it is determined that the audio segment has a clipping, and the first condition includes: the amplitude value of the first number consecutively or consecutively greater than the first number of sampling points is greater than the first condition. Two thresholds.
可选的,所述N个采样点中每个采样点的幅度值属于目标采样范围,所述第二阈值为所述目标采样范围的最大值与目标比例的乘积。Optionally, the amplitude value of each sampling point in the N sampling points belongs to a target sampling range, and the second threshold is the product of the maximum value of the target sampling range and the target ratio.
可选的,所述目标范围的最大值为第三阈值,所述目标范围的最小值为第四阈值,所述方法还包括:Optionally, the maximum value of the target range is a third threshold, and the minimum value of the target range is a fourth threshold, and the method further includes:
若所述目标数据大于所述第三阈值,则将所述第一音频信号丢弃;If the target data is greater than the third threshold, discard the first audio signal;
若所述目标数据小于所述第四阈值,将所述第一音频信号确定为可用的音频信号。If the target data is less than the fourth threshold, the first audio signal is determined as an available audio signal.
可选的,处理器1001还可以用于加载存储器1004中存储的程序指令,用于执行以下操作:Optionally, the processor 1001 may also be used to load program instructions stored in the memory 1004 to perform the following operations:
检测所述第二音频信号的音频长度是否大于或者等于第一阈值;Detecting whether the audio length of the second audio signal is greater than or equal to a first threshold;
若所述第二音频信号的音频长度大于或者等于所述第一阈值,则确定所述第二音频信号为可用的音频信号;If the audio length of the second audio signal is greater than or equal to the first threshold, determining that the second audio signal is an available audio signal;
若所述第二音频信号的音频长度小于所述第一阈值,则将所述第二语音信号丢弃。If the audio length of the second audio signal is less than the first threshold, the second voice signal is discarded.
需要说明的是,具体执行过程可以参见图1所示方法实施例的具体说明,在此不进行赘述。It should be noted that, for the specific execution process, reference may be made to the specific description of the method embodiment shown in FIG. 1, which will not be repeated here.
具体执行步骤可以参见前述实施例的描述,此处不在赘述。For specific execution steps, please refer to the description of the foregoing embodiment, which will not be repeated here.
本申请实施例还提供了一种计算机存储介质,所述计算机存储介质可以存储有多条指令,所述指令适于由处理器加载并执行如上述图1所示实施例的方法步骤,具体执行过程可以参见图1所示实施例的具体说明,在此不进行赘述。The embodiment of the present application also provides a computer storage medium. The computer storage medium may store a plurality of instructions, and the instructions are suitable for being loaded by a processor and executing the method steps of the embodiment shown in FIG. For the process, reference may be made to the specific description of the embodiment shown in FIG. 1, which is not repeated here.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于计算机可读取存储介质中,该程序在执行时,包括如上述各方法的实施例的流程。其中,所述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)或随机存储记忆体(Random Access Memory,RAM)等。A person of ordinary skill in the art can understand that all or part of the processes in the above-mentioned embodiment methods can be implemented by instructing relevant hardware through a computer program. The program can be stored in a computer readable storage medium, and the program can be stored in a computer readable storage medium. When executed, it includes the processes of the above-mentioned method embodiments. Wherein, the storage medium may be a magnetic disk, an optical disc, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM), etc.

Claims (20)

  1. 一种音频信号处理方法,其特征在于,包括:An audio signal processing method, characterized in that it comprises:
    获取存在截幅的第一音频信号,所述第一音频信号包括N个采样点,所述N为正整数;Acquiring a first audio signal with clipping, where the first audio signal includes N sampling points, where N is a positive integer;
    获取用于表示所述第一音频信号的截幅比例的目标数据,所述截幅比例用于表示所述N个采样点中存在截幅的采样点的数量与所述N之间的比值;Acquiring target data used to represent a clipping ratio of the first audio signal, where the clipping ratio is used to represent a ratio between the number of sample points with clipping in the N sampling points and the N;
    若所述目标数据属于目标范围,将所述第一音频信号划分为至少两个音频段;If the target data belongs to the target range, dividing the first audio signal into at least two audio segments;
    对所述至少两个音频段进行截幅检测处理,并根据所述截幅检测处理后的音频段,获得第二音频信号。Clipping detection processing is performed on the at least two audio segments, and a second audio signal is obtained according to the audio segment after the clipping detection processing.
  2. 如权利要求1所述的方法,其特征在于,所述对所述至少两个音频段进行截幅检测处理,并根据所述截幅检测处理后的音频段,获得第二音频信号,包括:The method according to claim 1, wherein the performing clipping detection processing on the at least two audio segments, and obtaining the second audio signal according to the audio segment after the clipping detection processing, comprises:
    针对所述至少两个音频段中的每个音频段,检测所述音频段是否存在截幅;For each audio segment of the at least two audio segments, detecting whether the audio segment has clipping;
    若所述音频段存在截幅,则将所述音频段丢弃;If there is clipping in the audio segment, discard the audio segment;
    获取所述至少两个音频段中所述丢弃后的剩余音频段;Acquiring the discarded remaining audio segments in the at least two audio segments;
    根据所述剩余音频段,获得第二音频信号。According to the remaining audio segment, a second audio signal is obtained.
  3. 如权利要求2所述的方法,其特征在于,所述方法还包括:The method according to claim 2, wherein the method further comprises:
    检测所述第二音频信号的音频长度是否大于或者等于第一阈值;Detecting whether the audio length of the second audio signal is greater than or equal to a first threshold;
    若所述第二音频信号的音频长度大于或者等于所述第一阈值,则确定所述第二音频信号为可用的音频信号;If the audio length of the second audio signal is greater than or equal to the first threshold, determining that the second audio signal is an available audio signal;
    若所述第二音频信号的音频长度小于所述第一阈值,则将所述第二语音信号丢弃。If the audio length of the second audio signal is less than the first threshold, the second voice signal is discarded.
  4. 如权利要求2所述的方法,其特征在于,所述至少两个音频段中的每个音频段包括至少一个采样点,所述检测所述音频段是否存在截幅,包括:The method according to claim 2, wherein each of the at least two audio segments includes at least one sampling point, and the detecting whether the audio segment has clipping includes:
    获取所述音频段包括的至少一个采样点中每个采样点的幅度值;Acquiring the amplitude value of each sampling point in at least one sampling point included in the audio segment;
    若所述至少一个采样点的幅度值满足第一条件,则确定所述音频段存在截幅,所述第一条件包括:连续第一数量或者连续大于第一数量的采样点的幅度值大于第二阈值。If the amplitude value of the at least one sampling point satisfies the first condition, it is determined that the audio segment has a clipping, and the first condition includes: the amplitude value of the sampling point consecutively greater than the first number or greater than the first number Two thresholds.
  5. 根据权利要求1所述的方法,其特征在于,所述获取存在截幅的第一音频信号之前,还包括:The method according to claim 1, characterized in that before said acquiring the first audio signal with clipped amplitude, the method further comprises:
    获取所述第一音频信号包括的N个采样点中每个采样点的幅度值;Acquiring the amplitude value of each of the N sampling points included in the first audio signal;
    若所述N个采样点的幅度值满足第一条件,则确定所述第一音频信号存在截幅,所述第一条件包括:连续第一数量或者连续大于第一数量的采样点的幅度值大于第二阈值。If the amplitude values of the N sampling points meet a first condition, it is determined that the first audio signal has an amplitude clipping, and the first condition includes: the amplitude values of the first number consecutively or consecutively greater than the first number of sampling points Greater than the second threshold.
  6. 如权利要求5所述的方法,其特征在于,所述N个采样点中每个采样点的幅度值属于目标采样范围;The method according to claim 5, wherein the amplitude value of each sampling point in the N sampling points belongs to the target sampling range;
    所述第二阈值为所述目标采样范围的最大值与目标比例的乘积。The second threshold is the product of the maximum value of the target sampling range and the target ratio.
  7. 根据权利要求1所述的方法,其特征在于,所述获取存在截幅的第一音频信号之前,还包括:The method according to claim 1, characterized in that before said acquiring the first audio signal with clipped amplitude, the method further comprises:
    将目标采样范围划分为至少两个子范围,所述至少两个子范围之间互不重叠,所述目标采样范围为所述第一音频信号包括的N个采样点的幅度值所在的范围;Dividing the target sampling range into at least two sub-ranges, the at least two sub-ranges do not overlap each other, and the target sampling range is a range in which amplitude values of N sampling points included in the first audio signal are located;
    统计所述N个采样点的幅度值中属于所述至少两个子范围中每个子范围的采样点的数量;Counting the number of sampling points belonging to each of the at least two sub-ranges among the amplitude values of the N sampling points;
    构建直方图,所述直方图的横轴包括所述至少两个子范围,所述直方图的纵轴包括属于所述子范围的采样点的数量;Constructing a histogram, the horizontal axis of the histogram includes the at least two sub-ranges, and the vertical axis of the histogram includes the number of sampling points belonging to the sub-ranges;
    若所述直方图的变化趋势满足第二条件,则确定所述第一音频信号存在截幅。If the changing trend of the histogram satisfies the second condition, it is determined that the first audio signal has a clipping.
  8. 根据权利要求7所述的方法,其特征在于,所述获取用于表示所述第一音频信号的截幅比例的目标数据,包括:8. The method according to claim 7, wherein the acquiring target data used to represent the clipping ratio of the first audio signal comprises:
    从所述至少两个子范围中确定第一子范围,所述第一子范围的幅度值为所述至少两个子范围中幅度值最大的子范围;Determine a first sub-range from the at least two sub-ranges, where the amplitude value of the first sub-range is the sub-range with the largest amplitude value among the at least two sub-ranges;
    获取所述N个采样点中幅度值属于所述第一子范围的采样点的数量,作为第一数量;Acquiring, among the N sampling points, the number of sampling points whose amplitude values belong to the first sub-range as the first number;
    计算所述第一数量与所述N之间的比值,并将所述比值作为用于表示所述第一音频信号的截幅比例的目标数据。The ratio between the first number and the N is calculated, and the ratio is used as target data for representing the clipping ratio of the first audio signal.
  9. 如权利要求1所述的方法,其特征在于,所述目标范围的最大值为第三阈值,所述目标范围的最小值为第四阈值,所述方法还包括:The method according to claim 1, wherein the maximum value of the target range is a third threshold, and the minimum value of the target range is a fourth threshold, and the method further comprises:
    若所述目标数据大于所述第三阈值,则将所述第一音频信号丢弃;If the target data is greater than the third threshold, discard the first audio signal;
    若所述目标数据小于所述第四阈值,将所述第一音频信号确定为可用的音频信号。If the target data is less than the fourth threshold, the first audio signal is determined as an available audio signal.
  10. 一种音频信号处理装置,其特征在于,包括:An audio signal processing device, characterized in that it comprises:
    第一获取单元,用于获取存在截幅的第一音频信号,所述第一音频信号包括N个采样点,所述N为正整数;The first acquiring unit is configured to acquire a first audio signal with clipping, where the first audio signal includes N sampling points, where N is a positive integer;
    第二获取单元,用于获取用于表示所述第一音频信号的截幅比例的目标数据,所述截幅比例用于表示所述N个采样点中存在截幅的采样点的数量与所述N之间的比值;The second acquiring unit is configured to acquire target data used to represent the clipping ratio of the first audio signal, where the clipping ratio is used to represent the number of sample points with clipping in the N sampling points and the total The ratio between N;
    第一划分单元,用于若所述目标数据属于目标范围,将所述第一音频信号划分为至少两个音频段;A first dividing unit, configured to divide the first audio signal into at least two audio segments if the target data belongs to a target range;
    第三获取单元,用于对所述至少两个音频段进行截幅检测处理,并根据所述截幅检测处理后的音频段,获得第二音频信号。The third acquisition unit is configured to perform clipping detection processing on the at least two audio segments, and obtain a second audio signal according to the audio segment after the clipping detection processing.
  11. 如权利要求10所述的装置,其特征在于,所述第三获取单元具体用于:The device according to claim 10, wherein the third acquiring unit is specifically configured to:
    针对所述至少两个音频段中的每个音频段,检测所述音频段是否存在截幅;For each audio segment of the at least two audio segments, detecting whether the audio segment has clipping;
    若所述音频段存在截幅,则将所述音频段丢弃;If there is clipping in the audio segment, discard the audio segment;
    获取所述至少两个音频段中所述丢弃后的剩余音频段;Acquiring the discarded remaining audio segments in the at least two audio segments;
    根据所述剩余音频段,获得第二音频信号。According to the remaining audio segment, a second audio signal is obtained.
  12. 如权利要求11所述的装置,其特征在于,所述装置还包括:The device of claim 11, wherein the device further comprises:
    检测单元,用于检测所述第二音频信号的音频长度是否大于或者等于第一阈值;A detecting unit, configured to detect whether the audio length of the second audio signal is greater than or equal to a first threshold;
    第一确定单元,用于若所述第二音频信号的音频长度大于或者等于所述第一阈值,则确定所述第二音频信号为可用的音频信号;A first determining unit, configured to determine that the second audio signal is an available audio signal if the audio length of the second audio signal is greater than or equal to the first threshold;
    若所述第二音频信号的音频长度小于所述第一阈值,则将所述第二语音信号丢弃。If the audio length of the second audio signal is less than the first threshold, the second voice signal is discarded.
  13. 如权利要求11所述的装置,其特征在于,所述至少两个音频段中的每个音频段包括至少一个采样点,所述第三获取单元检测所述音频段是否存在截幅的检测方式是,获取所述音频段包括的至少一个采样点中每个采样点的幅度值;The device according to claim 11, wherein each audio segment of the at least two audio segments includes at least one sampling point, and the third acquiring unit detects whether the audio segment has clipping Yes, obtain the amplitude value of each sampling point in at least one sampling point included in the audio segment;
    若所述至少一个采样点的幅度值满足第一条件,则确定所述音频段存在截幅,所述第一条件包括:连续第一数量或者连续大于第一数量的采样点的幅度值大于第二阈值。If the amplitude value of the at least one sampling point satisfies the first condition, it is determined that the audio segment has a clipping, and the first condition includes: the amplitude value of the first number consecutively or consecutively greater than the first number of sampling points is greater than the first condition. Two thresholds.
  14. 如权利要求10所述的装置,其特征在于,所述装置还包括:The device of claim 10, wherein the device further comprises:
    第四获取单元,用于获取所述第一音频信号包括的N个采样点中每个采样点的幅度值;A fourth acquiring unit, configured to acquire the amplitude value of each of the N sampling points included in the first audio signal;
    第二确定单元,用于若所述N个采样点的幅度值满足第一条件,则确定所述第一音频信号存在截幅,所述第一条件包括:连续第一数量或者连续大于第一数量的采样点的幅度值大于第二阈值。The second determining unit is configured to determine that the first audio signal has a clipping if the amplitude values of the N sampling points meet a first condition, and the first condition includes: a continuous first number or a continuous greater than the first The amplitude value of the number of sampling points is greater than the second threshold.
  15. 如权利要求14所述的装置,其特征在于,所述N个采样点中每个采样点的幅度值属于目标采样范围;The device according to claim 14, wherein the amplitude value of each sampling point in the N sampling points belongs to the target sampling range;
    所述第二阈值为所述目标采样范围的最大值与目标比例的乘积。The second threshold is the product of the maximum value of the target sampling range and the target ratio.
  16. 如权利要求10所述的装置,其特征在于,所述装置还包括:The device of claim 10, wherein the device further comprises:
    第二划分单元,用于将目标采样范围划分为至少两个子范围,所述至少两个子范围之间互不重叠,所述目标采样范围为所述第一音频信号包括的N个采样点的幅度值所在的范围;The second dividing unit is configured to divide the target sampling range into at least two sub-ranges, the at least two sub-ranges do not overlap each other, and the target sampling range is the amplitude of the N sampling points included in the first audio signal The range of the value;
    统计单元,用于统计所述N个采样点的幅度值中属于所述至少两个子范围中每个子范围的采样点的数量;A statistical unit, configured to count the number of sampling points belonging to each of the at least two sub-ranges among the amplitude values of the N sampling points;
    构建单元,用于构建直方图,所述直方图的横轴包括所述至少两个子范围,所述直方图的纵轴包括属于所述子范围的采样点的数量;A construction unit, configured to construct a histogram, the horizontal axis of the histogram includes the at least two sub-ranges, and the vertical axis of the histogram includes the number of sampling points belonging to the sub-ranges;
    第三确定单元,用于若所述直方图的变化趋势满足第二条件,则确定所述第一音频信号存在截幅。The third determining unit is configured to determine that the first audio signal has a clipping if the change trend of the histogram meets the second condition.
  17. 如权利要求16所述的装置,其特征在于,所述第二获取单元具体用于:The device according to claim 16, wherein the second acquiring unit is specifically configured to:
    从所述至少两个子范围中确定第一子范围,所述第一子范围的幅度值为所述至少两个子范围中幅度值最大的子范围;Determine a first sub-range from the at least two sub-ranges, where the amplitude value of the first sub-range is the sub-range with the largest amplitude value among the at least two sub-ranges;
    获取所述N个采样点中幅度值属于所述第一子范围的采样点的数量,作为第一数量;Acquiring, among the N sampling points, the number of sampling points whose amplitude values belong to the first sub-range as the first number;
    计算所述第一数量与所述N之间的比值,并将所述比值作为用于表示所述第一音频信号的截幅比例的目标数据。The ratio between the first number and the N is calculated, and the ratio is used as target data for representing the clipping ratio of the first audio signal.
  18. 如权利要求10所述的装置,其特征在于,所述目标范围的最大值为第三阈值,所述目标范围的最小值为第四阈值,所述装置还包括第四确定单元;9. The device of claim 10, wherein the maximum value of the target range is a third threshold, and the minimum value of the target range is a fourth threshold, and the device further comprises a fourth determining unit;
    所述第四确定单元具体用于,若所述目标数据大于所述第三阈值,则将所述第一音频信号丢弃;The fourth determining unit is specifically configured to discard the first audio signal if the target data is greater than the third threshold;
    若所述目标数据小于所述第四阈值,将所述第一音频信号确定为可用的音频信号。If the target data is less than the fourth threshold, the first audio signal is determined as an available audio signal.
  19. 一种音频信号处理装置,其特征在于,包括处理器、存储器以及通信接口,所述处理器、存储器和通信接口相互连接,其中,所述通信接口用于接收和发送数据,所述存储器用于存储程序代码,所述处理器用于调用所述程序代码,执行如权利要求1至9任一项所述的方法。An audio signal processing device, which is characterized by comprising a processor, a memory, and a communication interface, the processor, the memory, and the communication interface are connected to each other, wherein the communication interface is used for receiving and sending data, and the memory is used for A program code is stored, and the processor is used to call the program code to execute the method according to any one of claims 1 to 9.
  20. 一种计算机非易失性可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行以实现权利要求1至9任一项所述的方法。A computer non-volatile readable storage medium, wherein the computer readable storage medium stores a computer program, and the computer program is executed by a processor to implement the method according to any one of claims 1 to 9 .
PCT/CN2019/118444 2019-10-29 2019-11-14 Audio signal processing method and device WO2021082083A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911034571.6A CN110931021B (en) 2019-10-29 2019-10-29 Audio signal processing method and device
CN201911034571.6 2019-10-29

Publications (1)

Publication Number Publication Date
WO2021082083A1 true WO2021082083A1 (en) 2021-05-06

Family

ID=69849667

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/118444 WO2021082083A1 (en) 2019-10-29 2019-11-14 Audio signal processing method and device

Country Status (2)

Country Link
CN (1) CN110931021B (en)
WO (1) WO2021082083A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113852893A (en) * 2020-06-28 2021-12-28 北京小米移动软件有限公司 Data processing method and device, terminal and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103117063A (en) * 2012-12-27 2013-05-22 安徽科大讯飞信息科技股份有限公司 Music content cut-frame detection method based on software implementation
CN105989853A (en) * 2015-02-28 2016-10-05 科大讯飞股份有限公司 Audio quality evaluation method and system
CN108091352A (en) * 2017-12-27 2018-05-29 腾讯音乐娱乐科技(深圳)有限公司 A kind of audio file processing method, device and storage medium
CN109859745A (en) * 2019-03-27 2019-06-07 北京爱数智慧科技有限公司 A kind of audio-frequency processing method, equipment and computer-readable medium
US20190259367A1 (en) * 2018-02-22 2019-08-22 Cirrus Logic International Semiconductor Ltd. Methods and apparatus for processing stereophonic audio content

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106782613B (en) * 2016-12-22 2020-01-21 广州酷狗计算机科技有限公司 Signal detection method and device
CN108804072A (en) * 2018-06-13 2018-11-13 广州酷狗计算机科技有限公司 Audio-frequency processing method, device, storage medium and terminal

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103117063A (en) * 2012-12-27 2013-05-22 安徽科大讯飞信息科技股份有限公司 Music content cut-frame detection method based on software implementation
CN105989853A (en) * 2015-02-28 2016-10-05 科大讯飞股份有限公司 Audio quality evaluation method and system
CN108091352A (en) * 2017-12-27 2018-05-29 腾讯音乐娱乐科技(深圳)有限公司 A kind of audio file processing method, device and storage medium
US20190259367A1 (en) * 2018-02-22 2019-08-22 Cirrus Logic International Semiconductor Ltd. Methods and apparatus for processing stereophonic audio content
CN109859745A (en) * 2019-03-27 2019-06-07 北京爱数智慧科技有限公司 A kind of audio-frequency processing method, equipment and computer-readable medium

Also Published As

Publication number Publication date
CN110931021A (en) 2020-03-27
CN110931021B (en) 2023-10-13

Similar Documents

Publication Publication Date Title
CN109087632B (en) Speech processing method, device, computer equipment and storage medium
US9940934B2 (en) Adaptive voice authentication system and method
WO2019227547A1 (en) Voice segmenting method and apparatus, and computer device and storage medium
US20090271197A1 (en) Identifying features in a portion of a signal representing speech
US10269371B2 (en) Techniques for decreasing echo and transmission periods for audio communication sessions
US9466310B2 (en) Compensating for identifiable background content in a speech recognition device
CN108039181B (en) Method and device for analyzing emotion information of sound signal
US11282514B2 (en) Method and apparatus for recognizing voice
WO2021082083A1 (en) Audio signal processing method and device
US9916846B2 (en) Method and system for speech detection
CN113782036A (en) Audio quality evaluation method and device, electronic equipment and storage medium
CN110060667B (en) Batch processing method and device for voice information, computer equipment and storage medium
US11551707B2 (en) Speech processing method, information device, and computer program product
WO2023098103A9 (en) Audio processing method and audio processing apparatus
Craciun et al. Correlation coefficient-based voice activity detector algorithm
WO2021179470A1 (en) Method, device and system for recognizing sampling rate of pure voice data
JP2004310047A (en) Device and method for voice activity detection
US10600432B1 (en) Methods for voice enhancement
CN110059059B (en) Batch screening method and device for voice information, computer equipment and storage medium
CN111148005A (en) Method and device for detecting mic sequence
US20230253010A1 (en) Voice activity detection (vad) based on multiple indicia
JP6320962B2 (en) Speech recognition system, speech recognition method, program
US11790931B2 (en) Voice activity detection using zero crossing detection
US20220130405A1 (en) Low Complexity Voice Activity Detection Algorithm
CN116910582A (en) Data monitoring method and device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19950773

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19950773

Country of ref document: EP

Kind code of ref document: A1