CN110931021A - Audio signal processing method and device - Google Patents

Audio signal processing method and device Download PDF

Info

Publication number
CN110931021A
CN110931021A CN201911034571.6A CN201911034571A CN110931021A CN 110931021 A CN110931021 A CN 110931021A CN 201911034571 A CN201911034571 A CN 201911034571A CN 110931021 A CN110931021 A CN 110931021A
Authority
CN
China
Prior art keywords
audio signal
amplitude
audio
range
sub
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911034571.6A
Other languages
Chinese (zh)
Other versions
CN110931021B (en
Inventor
张丝潆
彭俊清
王健宗
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201911034571.6A priority Critical patent/CN110931021B/en
Priority to PCT/CN2019/118444 priority patent/WO2021082083A1/en
Publication of CN110931021A publication Critical patent/CN110931021A/en
Application granted granted Critical
Publication of CN110931021B publication Critical patent/CN110931021B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/06Decision making techniques; Pattern matching strategies
    • G10L17/14Use of phonemic categorisation or speech recognition prior to speaker recognition or verification
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Game Theory and Decision Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Tone Control, Compression And Expansion, Limiting Amplitude (AREA)
  • Signal Processing For Digital Recording And Reproducing (AREA)

Abstract

The application discloses an audio signal processing method and device, wherein the audio signal processing method comprises the following steps: acquiring a first audio signal with an amplitude truncation; acquiring target data representing a clipping ratio of the first audio signal; if the target data belong to a target range, dividing the first audio signal into at least two audio segments; and carrying out amplitude-cutting detection processing on the at least two audio segments, and obtaining a second audio signal according to the audio segments after the amplitude-cutting detection processing. By adopting the technical scheme, effective audio signals can be reserved as much as possible, so that the availability ratio of the audio signals is greatly improved.

Description

Audio signal processing method and device
Technical Field
The present invention relates to the field of communications technologies, and in particular, to an audio signal processing method and apparatus.
Background
In the voiceprint recognition process, the preprocessing of the audio signal in the early stage is very critical, and the subsequent recognition accuracy is greatly influenced. Wherein the pre-processing comprises amplitude clipping detection of the audio signal. The amplitude of the audio signal is mainly due to the fact that the amplitude of the audio signal is too high and exceeds the maximum value of a sampling value range, and therefore amplitude truncation, also called truncation, occurs.
The truncation results in the loss of information in the speech signal, and in the prior art, once the truncation of a section of speech signal is detected, the section of speech signal is discarded, which results in the loss of a lot of valid speech signals.
Disclosure of Invention
Embodiments of the present invention provide an audio signal processing method and apparatus, which can retain more effective audio signals, so that the availability of the audio signals is greatly improved.
In a first aspect, an embodiment of the present invention provides an audio signal processing method, including:
acquiring a first audio signal with amplitude clipping, wherein the first audio signal comprises N sampling points, and N is a positive integer;
acquiring target data for representing an amplitude truncation ratio of the first audio signal, wherein the amplitude truncation ratio is used for representing a ratio between the number of sampling points with amplitude truncation in the N sampling points and the N;
if the target data belong to a target range, dividing the first audio signal into at least two audio segments;
and carrying out amplitude-cutting detection processing on the at least two audio segments, and obtaining a second audio signal according to the audio segments after the amplitude-cutting detection processing.
In a possible implementation manner, the performing amplitude-clipping detection processing on the at least two audio segments and obtaining a second audio signal according to the audio segment after amplitude-clipping detection processing includes:
for each of the at least two audio segments, detecting whether a clipping exists for the audio segment;
if the audio segment has amplitude truncation, discarding the audio segment;
obtaining the discarded residual audio segments of the at least two audio segments;
and obtaining a second audio signal according to the residual audio segment.
In one possible implementation, detecting whether the audio length of the second audio signal is greater than or equal to a first threshold;
if the audio length of the second audio signal is greater than or equal to the first threshold, determining that the second audio signal is an available audio signal;
and if the audio length of the second audio signal is smaller than the first threshold value, discarding the second voice signal.
In one possible implementation, each of the at least two audio segments includes at least one sampling point, and the detecting whether the audio segment has a truncation includes:
acquiring the amplitude value of each sampling point in at least one sampling point included in the audio segment;
if the amplitude value of the at least one sampling point meets a first condition, determining that amplitude truncation exists in the audio segment, wherein the first condition comprises: the amplitude value of the first number of consecutive sample points or the number of consecutive sample points greater than the first number is greater than the second threshold value.
In another possible implementation manner, before the obtaining the first audio signal with the truncated amplitude, the method further includes:
acquiring an amplitude value of each of N sampling points included in the first audio signal;
if the amplitude values of the N sampling points meet a first condition, determining that amplitude truncation exists in the first audio signal, wherein the first condition comprises the following steps: the amplitude value of the first number of consecutive sample points or the number of consecutive sample points greater than the first number is greater than the second threshold value.
In one possible implementation, the amplitude value of each of the N sampling points belongs to a target sampling range;
the second threshold is a product of a maximum value of the target sampling range and a target proportion.
In another possible implementation manner, before the obtaining the first audio signal with the truncated amplitude, the method further includes:
dividing a target sampling range into at least two sub-ranges, wherein the at least two sub-ranges are not overlapped with each other, and the target sampling range is a range in which amplitude values of N sampling points included in the first audio signal are located;
counting the number of sampling points belonging to each sub-range of the at least two sub-ranges in the amplitude values of the N sampling points;
constructing a histogram, a horizontal axis of the histogram comprising the at least two sub-ranges and a vertical axis of the histogram comprising the number of sample points belonging to the sub-ranges;
and if the variation trend of the histogram meets a second condition, determining that the amplitude truncation exists in the first audio signal.
In one possible implementation, the obtaining target data representing a clipping ratio of the first audio signal includes:
determining a first sub-range from the at least two sub-ranges, wherein the amplitude value of the first sub-range is the sub-range with the largest amplitude value in the at least two sub-ranges;
acquiring the number of sampling points of which the amplitude values belong to the first sub-range from the N sampling points as a first number;
calculating a ratio between the first number and the N, and using the ratio as target data for representing a clipping ratio of the first audio signal.
In one possible implementation, the maximum value of the target range is a third threshold, and the minimum value of the target range is a fourth threshold, and the method further includes:
if the target data is greater than the third threshold, discarding the first audio signal;
and if the target data is smaller than the fourth threshold value, determining the first audio signal as a usable audio signal.
In a second aspect, an embodiment of the present invention provides an audio signal processing apparatus, including:
the device comprises a first acquisition unit, a second acquisition unit and a processing unit, wherein the first acquisition unit is used for acquiring a first audio signal with amplitude truncation, the first audio signal comprises N sampling points, and N is a positive integer;
a second acquisition unit configured to acquire target data indicating an amplitude truncation ratio of the first audio signal, the amplitude truncation ratio indicating a ratio between the number of samples having an amplitude truncation among the N samples and the N;
the first dividing unit is used for dividing the first audio signal into at least two audio segments if the target data belong to a target range;
and the third acquisition unit is used for carrying out amplitude-cutting detection processing on the at least two audio segments and acquiring a second audio signal according to the audio segments after the amplitude-cutting detection processing.
In a possible implementation manner, the third obtaining unit is specifically configured to:
for each of the at least two audio segments, detecting whether a clipping exists for the audio segment;
if the audio segment has amplitude truncation, discarding the audio segment;
obtaining the discarded residual audio segments of the at least two audio segments;
and obtaining a second audio signal according to the residual audio segment.
In one possible implementation, the apparatus further includes:
a detection unit for detecting whether the audio length of the second audio signal is greater than or equal to a first threshold;
a first determining unit, configured to determine that the second audio signal is an available audio signal if the audio length of the second audio signal is greater than or equal to the first threshold;
and if the audio length of the second audio signal is smaller than the first threshold value, discarding the second voice signal.
In one possible implementation manner, each of the at least two audio segments includes at least one sampling point, and the third obtaining unit detects whether amplitude truncation exists in the audio segment by obtaining an amplitude value of each of the at least one sampling point included in the audio segment;
if the amplitude value of the at least one sampling point meets a first condition, determining that amplitude truncation exists in the audio segment, wherein the first condition comprises: the amplitude value of the first number of consecutive sample points or the number of consecutive sample points greater than the first number is greater than the second threshold value.
In one possible implementation, the apparatus further includes:
a fourth obtaining unit, configured to obtain an amplitude value of each of N sampling points included in the first audio signal;
a second determining unit, configured to determine that amplitude truncation exists in the first audio signal if the amplitude values of the N sampling points satisfy a first condition, where the first condition includes: the amplitude value of the first number of consecutive sample points or the number of consecutive sample points greater than the first number is greater than the second threshold value.
In one possible implementation, the amplitude value of each of the N sampling points belongs to a target sampling range;
the second threshold is a product of a maximum value of the target sampling range and a target proportion.
In one possible implementation, the apparatus further includes:
the second dividing unit is used for dividing a target sampling range into at least two sub-ranges, wherein the at least two sub-ranges are not overlapped with each other, and the target sampling range is a range in which amplitude values of N sampling points included in the first audio signal are located;
the counting unit is used for counting the number of sampling points which belong to each sub-range of the at least two sub-ranges in the amplitude values of the N sampling points;
a construction unit for constructing a histogram, a horizontal axis of the histogram comprising the at least two sub-ranges and a vertical axis of the histogram comprising the number of sample points belonging to the sub-ranges;
and the third determining unit is used for determining that the amplitude truncation exists in the first audio signal if the variation trend of the histogram meets a second condition.
In an implementation manner that may be implemented, the second obtaining unit is specifically configured to:
determining a first sub-range from the at least two sub-ranges, wherein the amplitude value of the first sub-range is the sub-range with the largest amplitude value in the at least two sub-ranges;
acquiring the number of sampling points of which the amplitude values belong to the first sub-range from the N sampling points as a first number;
calculating a ratio between the first number and the N, and using the ratio as target data for representing a clipping ratio of the first audio signal.
In one possible implementation, the maximum value of the target range is a third threshold value, and the minimum value of the target range is a fourth threshold value, and the apparatus further includes a fourth determining unit;
the fourth determining unit is specifically configured to discard the first audio signal if the target data is greater than the third threshold;
and if the target data is smaller than the fourth threshold value, determining the first audio signal as a usable audio signal.
In a third aspect, an embodiment of the present invention provides an audio signal processing apparatus, where the audio signal processing apparatus includes a processor, a memory, and a communication interface, where the processor, the memory, and the communication interface are connected to each other, where the communication interface is used to receive and send data, the memory is used to store program codes, and the processor is used to call the program codes to execute the method according to the first aspect.
In a fourth aspect, the present invention provides a computer-readable storage medium, which stores a computer program, and the computer program is executed by a processor to implement the method described above.
In the embodiment of the invention, after a first audio signal with amplitude clipping is acquired, a processing mode of the first audio signal is determined according to target data used for representing the amplitude clipping proportion of the first audio signal, if the target data belongs to a target range, the first audio signal is divided into at least two audio segments, amplitude clipping detection processing is carried out on the at least two audio segments, and a second audio signal is obtained according to the audio segments after amplitude clipping detection processing. The embodiment of the application does not simply discard the audio signal with the amplitude clipping, but further processes the audio signal with the amplitude clipping, so that the effective audio signal can be kept as much as possible, and the availability of the audio signal is greatly improved.
Drawings
In order to illustrate embodiments of the present invention or technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.
Fig. 1 is a flowchart of an audio signal processing method according to an embodiment of the present invention;
FIG. 2 is a waveform diagram of an audio signal with amplitude clipping according to an embodiment of the present invention;
FIG. 3 is a flow chart of another audio signal processing method according to an embodiment of the present invention;
FIG. 4 is a flowchart of a method for obtaining target data representing an amplitude truncation ratio of a first audio signal according to an embodiment of the present invention;
FIG. 5 is a flowchart of determining whether target data belongs to the content of a target scope according to an embodiment of the present invention;
fig. 6 is a flowchart of a method for performing amplitude truncation detection processing on an audio segment according to an embodiment of the present invention;
FIG. 7 is a flowchart of a method for determining whether to discard a second audio signal according to an embodiment of the present invention;
FIG. 8 is a flowchart of another audio signal processing method according to an embodiment of the present invention;
FIG. 9 is a histogram without clipping according to an embodiment of the present invention;
FIG. 10 is a histogram with clipping according to an embodiment of the present invention;
fig. 11 is a schematic structural diagram of an audio signal processing apparatus according to an embodiment of the present invention;
fig. 12 is a schematic structural diagram of another audio signal processing apparatus according to an embodiment of the present invention;
fig. 13 is a schematic structural diagram of another audio signal processing apparatus according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention.
An audio signal processing method according to an embodiment of the present invention will be described in detail below with reference to fig. 1 to 10.
Referring to fig. 1, a flow chart of an audio signal processing method according to an embodiment of the invention is shown. As shown in fig. 1, the audio signal processing method of an embodiment of the present invention may include the following steps S101 to S104.
S101, acquiring a first audio signal with amplitude clipping, wherein the first audio signal comprises N sampling points, and N is a positive integer;
in this embodiment, the first audio signal may include a voice data signal in an instant messaging process, or a music data signal recorded on the spot, and the embodiment of the present application is not limited thereto.
In this embodiment, a manner of acquiring the first audio signal may be to perform amplitude clipping detection processing on a plurality of audio signals, determine whether amplitude clipping exists in the audio signals, and then acquire at least one audio signal in which amplitude clipping exists.
The first audio signal comprises N sampling points, the amplitude value of each sampling point belongs to a preset target sampling range, the target sampling range is determined by the number of bits used for storing the amplitude value, for example, if 16 bits are used for storing the amplitude value, the target sampling range is 2-15 to 215-1, namely-32768 to 32767.
Optionally, the process of sampling and quantizing the original analog signal to obtain the first audio signal may be to sample the original analog signal to obtain N sampling points. Wherein, the sampling frequency can be 8kHz, and 8000 sampling points exist in 1s time. Then, the original amplitude value of each sampling point in each sampling point is quantized, as shown in fig. 2, if the original amplitude value of a certain sampling point exceeds the maximum value of the target sampling range, the maximum value in the target sampling range is used for representing, and if the original amplitude value of a certain sampling point exceeds the minimum value of the target sampling range, the minimum value in the target sampling range is used for representing. After quantization, the original amplitude value of each sampling point can be limited to N amplitude values within the target sampling range, and one sampling point corresponds to one amplitude value. As shown in fig. 2, after the sampling quantization step, there is a clipping of the first audio signal.
It should be noted that the sampling frequency may be other frequencies, which may be customized according to the needs of the user, and in addition, the bit number for storing the amplitude value may also be other bit numbers, which may be set according to the sampling range needed by the user.
Optionally, the original analog signal may be sampled and quantized by calculating an amplitude value function to obtain N amplitude values corresponding to N sampling points included in the first audio signal, for example, setting a sampling frequency of the amplitude value calculation function and a bit number used for storing the amplitude values, and inputting the original analog signal into the amplitude value calculation function, so as to obtain the first audio signal.
S102, acquiring target data for representing an amplitude truncation ratio of the first audio signal, wherein the amplitude truncation ratio is used for representing the ratio of the number of amplitude-truncated sampling points in the N sampling points to N;
in this embodiment, target data used for representing the amplitude truncation ratio of the first audio signal is obtained by analyzing N amplitude values corresponding to N sampling points of the first audio signal, where the target data may be the amplitude truncation ratio itself or other data capable of reflecting the amplitude truncation ratio, and for example, the target data may be within a preset range of the amplitude truncation ratio.
In an alternative embodiment, the target data for representing the clipping ratio of the first audio signal may be obtained by first analyzing the amplitude value of each of the N sampling points included in the first audio signal, determining the sampling points with clipping, and calculating a ratio between the number of the sampling points with clipping and the total number N of the sampling points, where the ratio is the target data. Optionally, the method for analyzing the first audio signal to determine that there are samples with clipping may be to determine whether there are consecutive first numbers of samples or amplitude values of consecutive samples greater than the first number, where the first number may be 3, greater than a second threshold value, where the second threshold value may be 90% of the maximum value of the target sampling range, for example, if the amplitude values of consecutive 5 samples are greater than the second threshold value, the 5 samples are regarded as the samples with clipping.
In another alternative embodiment, the target data representing the amplitude truncation ratio of the first audio signal may be obtained by counting, among the plurality of sampling points, the number of sampling points whose amplitude values exceed the first threshold in the plurality of sampling points in the first audio signal, and then calculating a ratio between the number and the number N of all the sampling points, where the ratio is the target data. Because the amplitude values of the amplitude-truncated sampling points are all larger and exceed the first threshold, if the amplitude-truncated proportion of the first audio signal is larger, the calculated ratio is also larger, so that the amplitude-truncated proportion can be indirectly reflected through the calculated ratio, but the ratio is not the amplitude-truncated proportion, wherein the value of the first threshold can be set, and is continuously checked and updated to obtain a more reasonable first threshold.
In yet another alternative embodiment, the manner of obtaining the target data representing the amplitude-cut ratio of the first audio signal may also be, please refer to fig. 3, which schematically shows a flow of obtaining the target data representing the amplitude-cut ratio of the first audio signal, including but not limited to steps S21-S23;
s21, determining a first sub-range from at least two sub-ranges;
specifically, optionally, a target sampling range to which N sampling values corresponding to N sampling points of the first audio signal belong is divided into at least two sub-ranges, and the at least two sub-ranges are not overlapped with each other. The division mode is not limited in the present application, and may be equal or unequal. The at least two subranges mentioned above can be 22, or 24, 30 or other values.
A first sub-range is determined from the at least two mutually non-overlapping sub-ranges, where the first sub-range refers to a sub-range having an amplitude value with a maximum amplitude value of the at least two sub-ranges, for example, the at least two sub-ranges include [0,7], [8,15], [16,23], and the first sub-range is [16,23 ].
S22, acquiring the number of sampling points of which the amplitude values belong to the first sub-range from the N sampling points as a first number;
s23, calculating a ratio between the first number and the N, and using the ratio as target data for representing the clipping ratio of the first audio signal.
As described above, most of the number of amplitude values of the sampling points in the first audio signal that belong to the first sub-range is truncated, so the ratio between the first number and the total number N of sampling points is calculated as target data that can reflect the size of the truncation ratio of the first audio signal.
S103, if the target data belong to a target range, dividing the first audio signal into at least two audio segments;
in this embodiment, after target data is acquired, whether the acquired target data belongs to a target range is determined, and if the acquired target data belongs to the target range, the first audio signal is divided into at least two audio segments, where the division may be performed in a manner of dividing by taking 1 second as a unit, or may be performed in other time units, such as 5 seconds.
The target range may be 60% to 80%, and it is understood that the target range may be other ranges, and the embodiments of the present application are not limited thereto.
Referring to fig. 4, a flowchart of a process for providing target data belonging to different ranges according to an embodiment of the present application is shown, which includes steps S31-S35:
s31, acquiring target data;
s32, determining whether the target data belongs to the target range, if the target data belongs to the target range, performing step S33, if the target data does not belong to the target range, performing step S34 or S35, if the target data is greater than the third threshold, performing step S35, if the target data is less than the fourth threshold, performing step S34; wherein the third threshold is a maximum value of the target range, and the fourth threshold is a minimum value of the target range.
S33, dividing the first audio signal into at least two audio segments, and carrying out amplitude-cutting detection processing on the at least two audio segments;
s34, determining the first audio signal as a usable audio signal;
s35, the first audio signal is discarded.
There are two conclusions to determine whether the target data falls within the target range, one is: the target data belongs to the target range, and please refer to step S104 specifically, which is not described herein again. The other is as follows: the target data do not belong to the target range, if the target data are larger than the third threshold, it is indicated that the ratio of the number of the amplitude-truncated sampling points to the total number of the sampling points in the first audio signal is too high, the number of the amplitude-truncated sampling points is too large, if the training of the voiceprint recognition model is performed by using the first audio signal, the verification rate of voiceprint recognition can be reduced, and the first audio signal is discarded. If the target data is smaller than the fourth threshold, it is indicated that the ratio between the number of amplitude-truncated sampling points in the first audio signal and the number of total sampling points is small, the number of amplitude-truncated sampling points is small, which is not enough to affect the damage of the information of the first audio signal, and has almost no influence on the subsequent actual processing, so that the audio signal is directly input into the system for the subsequent processing without audio segment division and amplitude detection processing of the audio segment, for example, the first speech signal can be directly used for training a voiceprint recognition model.
S104, performing amplitude-cutting detection processing on the at least two audio segments, and obtaining a second audio signal according to the audio segments after the amplitude-cutting detection processing.
In this embodiment, if the target data belongs to the target range, the first audio signal may be divided into at least two audio segments, the dividing method may be average dividing, that is, the duration of each audio segment is the target duration, the target duration may be 1s or 5s, and the like, and whether there is a truncation in each speech segment is detected in turn.
The detection method for detecting whether each audio segment has an amplitude cut may be to detect whether, in each speech segment, absolute values of amplitude values of a first number of consecutive sample points or of the sample points greater than the first number are both greater than a second threshold, where the first number may be 3, the second threshold may be a product of a maximum value in a sample value range and a target proportion, the target proportion may be 90%, that is, 32768 × 0.9 ≈ 29491, and if absolute values of amplitude values of three or more consecutive sample points in one audio segment all exceed 90% of the maximum value in the sample value range, it is determined that the audio segment has an amplitude cut, and the speech segment is discarded. It should be noted that the 90% ratio may be other ratios, such as 91%, 89%, 85%, 95%, etc., that is, it may be around 90%, the first number may be 3, or may be other values, and there may be a mutual constraint relationship between the first number, the target ratio, and the sampling frequency.
And if the absolute values of the amplitude values of the three continuous sampling points do not exceed 90% of the maximum value in the sampling value range, the fact that the amplitude of the audio segment is not intercepted is indicated, the audio segment is determined to be an available audio segment, and the audio segment is reserved. The above-mentioned mode of going to detect whether there is the truncation through the speech section can avoid the discontinuous condition of remaining speech section.
Optionally, the second audio signal may be obtained according to the result of the amplitude-cut detection processing on the at least two audio segments, for example, the audio segments with amplitude cut in the at least two audio segments are discarded, the audio segments without amplitude cut are retained, and then all the audio segments without amplitude cut are combined into the second audio signal according to the time sequence.
As for the step S104 mentioned in the above embodiment, reference may be made to fig. 5, which is a schematic diagram of the processing for amplitude-cut detection of an audio segment proposed by the present application, including, but not limited to, steps S41-S44;
s41, detecting whether the audio segment has amplitude truncation or not for each audio segment in the at least two audio segments;
s42, if the audio segment has clipping, discarding the audio segment;
s43, obtaining the discarded residual audio segment in the at least two audio segments;
and S44, obtaining a second audio signal according to the residual audio segment.
Optionally, after obtaining the second audio segment, in order to ensure whether the second audio signal obtained from the remaining audio segments can be used in subsequent systems, whether the audio length of the second audio signal meets a certain condition may be detected, and the determining method may refer to fig. 6, where a flowchart of a method for determining whether to discard the second audio signal is shown in the figure, including but not limited to steps S51-S54;
s51, obtaining a second audio signal;
s52, detecting whether the audio length of the second audio signal is larger than or equal to a first threshold value; if the audio length of the second audio signal is smaller than the first threshold, go to step S53, and if the audio length of the second audio signal is greater than or equal to the first threshold, go to step S54;
s53, discarding the second audio signal;
s54, determining the second audio signal as a usable audio signal;
the first threshold mentioned above refers to the length of an audio signal that can be input into a subsequent system for processing, for example, in a text-independent voiceprint registration scene, a length of a speech signal that needs to be registered reaches 20S, so it can be determined whether the audio length of the second audio signal is greater than or equal to 20S, if so, the second audio signal is retained, and a voiceprint recognition model is trained by using the second audio signal.
In the embodiment of the invention, after a first audio signal with amplitude clipping is acquired, a processing mode of the first audio signal is determined according to target data used for representing the amplitude clipping proportion of the first audio signal, if the target data belongs to a target range, the first audio signal is divided into at least two audio segments, amplitude clipping detection processing is carried out on the at least two audio segments, and a second audio signal is obtained according to the audio segments after amplitude clipping detection processing. The embodiment of the application does not simply discard the audio signal with the amplitude clipping, but further processes the audio signal with the amplitude clipping, so that the effective audio signal can be kept as much as possible, and the availability of the audio signal is greatly improved.
In another embodiment, before the first audio signal with the clipping amplitude is obtained in step S101, it may be determined whether the clipping amplitude exists in the first audio signal, and optionally, the detection manner for determining whether the clipping amplitude exists in the first audio signal includes, but is not limited to, the following two optional embodiments, the first optional embodiment is shown in fig. 7 and includes, but is not limited to, steps S201 to S202, and the second optional embodiment is shown in fig. 8 and includes, but is not limited to, steps S301 to S304, which are specifically set forth below:
the first alternative embodiment is:
s201, acquiring amplitude values of N sampling points included in the first audio signal;
in this embodiment, the first audio signal is sampled, and referring to step S101, the original analog signal may be sampled and quantized by calculating an amplitude value function, so as to obtain N amplitude values corresponding to N sampling points included in the first audio signal.
S202, if the amplitude value of the first audio signal meets a first condition, determining that amplitude truncation exists in the first audio signal.
Wherein the first condition comprises: the amplitude value of the first number of consecutive sample points or the number of consecutive sample points greater than the first number is greater than the second threshold value. If the amplitude values of the sampling points of the first audio signal satisfy: the amplitude values of a first number of consecutive sample points or a number of consecutive sample points greater than the first number are greater than a second threshold value, it may be determined that clipping is present in the first audio signal. Please refer to step S104, which is not described herein.
The second alternative embodiment is:
s301, dividing a target sampling range into at least two sub-ranges;
s302, counting the number of sampling points belonging to each sub-range of the at least two sub-ranges in the amplitude values of the N sampling points;
s302, constructing a histogram;
specifically, optionally, a target sampling range to which N sampling values corresponding to N sampling points of the first audio signal belong is divided into at least two sub-ranges, and the at least two sub-ranges are not overlapped with each other. The division mode is not limited in the present application, and may be equal or unequal. The at least two subranges mentioned above can be 22, or 24, 30 or other values.
And counting the number of sampling points of which the amplitude values belong to each sub-range in the N sampling points, and constructing a histogram. The horizontal axis of the histogram may be the sub-range, and the vertical axis may be the number of sampling points, of the N sampling points in the first audio signal, whose amplitude values belong to the respective sub-ranges.
S304, if the variation trend of the histogram meets a second condition, determining that the first audio signal has amplitude truncation;
as shown in fig. 9 and 10, the target sampling range is divided equally into 22 sub-ranges in order of magnitude, and as shown in fig. 9, if there is no clipping in the first audio signal, the number of occurrences of the amplitude value gradually decreases as the value of the sub-range interval increases; as shown in fig. 10, if the amplitude truncation exists in the first audio signal, when the value of the sub-range interval is the highest, the frequency of occurrence of the amplitude value also reaches the highest, a phenomenon that the last column of the histogram is higher than all the preceding columns occurs, that is, the frequency value of the last sub-range of the histogram is the highest, the frequency value represented by the last column is referred to as an abnormally-rising portion, and the second condition is that the abnormally-rising portion exists in the histogram.
If the amplitude truncation does not exist in the first audio signal, the waveform of the audio signal is relatively gentle, and most amplitude values of the N sampling points are relatively small.
In this embodiment, before the first audio signal with clipping exists in step S101, it is determined whether clipping exists in the first audio signal, and at least one audio signal with clipping exists is obtained from the multiple audio signals.
Fig. 11 is a schematic structural diagram of an audio signal processing apparatus according to an embodiment of the present invention. As shown in fig. 11, the audio signal processing apparatus according to an embodiment of the present invention may include:
a first obtaining unit 11, configured to obtain a first audio signal with an amplitude truncation, where the first audio signal includes N sampling points, where N is a positive integer;
in this embodiment, the first audio signal may include a voice data signal in an instant messaging process, or a music data signal recorded on the spot, and the embodiment of the present application is not limited thereto.
In this embodiment, a manner of acquiring the first audio signal may be to perform amplitude clipping detection processing on a plurality of audio signals, determine whether amplitude clipping exists in the audio signals, and then acquire at least one audio signal in which amplitude clipping exists.
The first audio signal comprises N sampling points, the amplitude value of each sampling point belongs to a preset target sampling range, the target sampling range is determined by the number of bits used for storing the amplitude value, for example, if 16 bits are used for storing the amplitude value, the target sampling range is 2-15 to 215-1, namely-32768 to 32767.
Optionally, the process of sampling and quantizing the original analog signal to obtain the first audio signal may be to sample the original analog signal to obtain N sampling points. Wherein, the sampling frequency can be 8kHz, and 8000 sampling points exist in 1s time. Then, the original amplitude value of each sampling point in each sampling point is quantized, as shown in fig. 2, if the original amplitude value of a certain sampling point exceeds the maximum value of the target sampling range, the maximum value in the target sampling range is used for representing, and if the original amplitude value of a certain sampling point exceeds the minimum value of the target sampling range, the minimum value in the target sampling range is used for representing. After quantization, the original amplitude value of each sampling point can be limited to N amplitude values within the target sampling range, and one sampling point corresponds to one amplitude value.
It should be noted that the sampling frequency may be other frequencies, which may be customized according to the needs of the user, and in addition, the bit number for storing the amplitude value may also be other bit numbers, which may be set according to the sampling range needed by the user.
Optionally, the original analog signal may be sampled and quantized by calculating an amplitude value function to obtain N amplitude values corresponding to N sampling points included in the first audio signal, for example, setting a sampling frequency of the amplitude value calculation function and a bit number used for storing the amplitude values, and inputting the original analog signal into the amplitude value calculation function, so as to obtain the first audio signal.
A second obtaining unit 12 configured to obtain target data indicating an amplitude truncation ratio of the first audio signal, where the amplitude truncation ratio is used to indicate a ratio between the number of samples having amplitude truncation among the N samples and the N;
in this embodiment, target data used for representing the amplitude truncation ratio of the first audio signal is obtained by analyzing N amplitude values corresponding to N sampling points of the first audio signal, where the target data may be the amplitude truncation ratio itself or other data capable of reflecting the amplitude truncation ratio, and for example, the target data may be within a preset range of the amplitude truncation ratio.
In an alternative embodiment, the target data for representing the clipping ratio of the first audio signal may be obtained by first analyzing the amplitude value of each of the N sampling points included in the first audio signal, determining the sampling points with clipping, and calculating a ratio between the number of the sampling points with clipping and the total number N of the sampling points, where the ratio is the target data. Optionally, the method for analyzing the first audio signal to determine that there are samples with clipping may be to determine whether there are consecutive first numbers of samples or amplitude values of consecutive samples greater than the first number, where the first number may be 3, greater than a second threshold value, where the second threshold value may be 90% of the maximum value of the target sampling range, for example, if the amplitude values of consecutive 5 samples are greater than the second threshold value, the 5 samples are regarded as the samples with clipping.
In another alternative embodiment, the target data representing the amplitude truncation ratio of the first audio signal may be obtained by counting, among the plurality of sampling points, the number of sampling points whose amplitude values exceed the first threshold in the plurality of sampling points in the first audio signal, and then calculating a ratio between the number and the number N of all the sampling points, where the ratio is the target data. Because the amplitude values of the amplitude-truncated sampling points are all larger and exceed the first threshold, if the amplitude-truncated proportion of the first audio signal is larger, the calculated ratio is also larger, so that the amplitude-truncated proportion can be indirectly reflected through the calculated ratio, but the ratio is not the amplitude-truncated proportion, wherein the value of the first threshold can be set, and is continuously checked and updated to obtain a more reasonable first threshold.
In yet another alternative embodiment, the second obtaining unit is specifically configured to, please refer to fig. 3, schematically show a process of obtaining target data representing a clipping ratio of the first audio signal, including but not limited to steps S21-S23;
s21, determining a first sub-range from at least two sub-ranges;
specifically, optionally, a target sampling range to which N sampling values corresponding to N sampling points of the first audio signal belong is divided into at least two sub-ranges, and the at least two sub-ranges are not overlapped with each other. The division mode is not limited in the present application, and may be equal or unequal. The at least two subranges mentioned above can be 22, or 24, 30 or other values.
A first sub-range is determined from the at least two mutually non-overlapping sub-ranges, where the first sub-range refers to a sub-range having an amplitude value with a maximum amplitude value of the at least two sub-ranges, for example, the at least two sub-ranges include [0,7], [8,15], [16,23], and the first sub-range is [16,23 ].
S22, acquiring the number of sampling points of which the amplitude values belong to the first sub-range from the N sampling points as a first number;
s23, calculating a ratio between the first number and the N, and using the ratio as target data for representing the clipping ratio of the first audio signal.
As described above, most of the number of amplitude values of the sampling points in the first audio signal that belong to the first sub-range is truncated, so the ratio between the first number and the total number N of sampling points is calculated as target data that can reflect the size of the truncation ratio of the first audio signal.
A first dividing unit 13, configured to divide the first audio signal into at least two audio segments if the target data belongs to a target range;
in this embodiment, after target data is acquired, whether the acquired target data belongs to a target range is determined, and if the acquired target data belongs to the target range, the first audio signal is divided into at least two audio segments, where the division may be performed in a manner of dividing by taking 1 second as a unit, or may be performed in other time units, such as 5 seconds.
The target range may be 60% to 80%, and it is understood that the target range may be other ranges, and the embodiments of the present application are not limited thereto.
Referring to fig. 4, a flowchart of a process for providing target data belonging to different ranges according to an embodiment of the present application is shown, which includes steps S31-S35:
s31, acquiring target data;
s32, determining whether the target data belongs to the target range, if the target data belongs to the target range, performing step S33, if the target data does not belong to the target range, performing step S34 or S35, if the target data is greater than the third threshold, performing step S35, if the target data is less than the fourth threshold, performing step S34; wherein the third threshold is a maximum value of the target range, and the fourth threshold is a minimum value of the target range.
S33, dividing the first audio signal into at least two audio segments, and carrying out amplitude-cutting detection processing on the at least two audio segments;
s34, determining the first audio signal as a usable audio signal;
s35, the first audio signal is discarded.
There are two conclusions to determine whether the target data falls within the target range, one is: the target data belongs to the target range, and please refer to step S104 specifically, which is not described herein again. The other is as follows: the target data do not belong to the target range, if the target data are larger than the third threshold, it is indicated that the ratio of the number of the amplitude-truncated sampling points to the total number of the sampling points in the first audio signal is too high, the number of the amplitude-truncated sampling points is too large, if the training of the voiceprint recognition model is performed by using the first audio signal, the verification rate of voiceprint recognition can be reduced, and the first audio signal is discarded. If the target data is smaller than the fourth threshold, it is indicated that the ratio between the number of amplitude-truncated sampling points in the first audio signal and the number of total sampling points is small, the number of amplitude-truncated sampling points is small, which is not enough to affect the damage of the information of the first audio signal, and has almost no influence on the subsequent actual processing, so that the audio signal is directly input into the system for the subsequent processing without audio segment division and amplitude detection processing of the audio segment, for example, the first speech signal can be directly used for training a voiceprint recognition model.
A third obtaining unit 14, configured to perform amplitude-clipping detection processing on the at least two audio segments, and obtain a second audio signal according to the audio segment after the amplitude-clipping detection processing;
in this embodiment, if the target data belongs to the target range, the first audio signal may be divided into at least two audio segments, the dividing method may be average dividing, that is, the duration of each audio segment is the target duration, the target duration may be 1s or 5s, and the like, and whether there is a truncation in each speech segment is detected in turn.
The detection method for detecting whether each audio segment has an amplitude cut may be to detect whether, in each speech segment, absolute values of amplitude values of a first number of consecutive sample points or of the sample points greater than the first number are both greater than a second threshold, where the first number may be 3, the second threshold may be a product of a maximum value in a sample value range and a target proportion, the target proportion may be 90%, that is, 32768 × 0.9 ≈ 29491, and if absolute values of amplitude values of three or more consecutive sample points in one audio segment all exceed 90% of the maximum value in the sample value range, it is determined that the audio segment has an amplitude cut, and the speech segment is discarded. It should be noted that the 90% ratio may be other ratios, such as 91%, 89%, 85%, 95%, etc., that is, it may be around 90%, the first number may be 3, or may be other values, and there may be a mutual constraint relationship between the first number, the target ratio, and the sampling frequency.
And if the absolute values of the amplitude values of the three continuous sampling points do not exceed 90% of the maximum value in the sampling value range, the fact that the amplitude of the audio segment is not intercepted is indicated, the audio segment is determined to be an available audio segment, and the audio segment is reserved. The above-mentioned mode of going to detect whether there is the truncation through the speech section can avoid the discontinuous condition of remaining speech section.
Optionally, the third obtaining module is specifically configured to obtain the second audio signal according to the result of the amplitude-cut detection processing on the at least two audio segments, for example, discard audio segments with amplitude cut in the at least two audio segments, retain audio segments without amplitude cut, and then combine all audio segments without amplitude cut into the second audio signal according to a time sequence.
As for the step S104 mentioned in the above embodiment, reference may be made to fig. 5, which is a schematic diagram of the processing for amplitude-cut detection of an audio segment proposed by the present application, including, but not limited to, steps S41-S44;
s41, detecting whether the audio segment has amplitude truncation or not for each audio segment in the at least two audio segments;
s42, if the audio segment has clipping, discarding the audio segment;
s43, obtaining the discarded residual audio segment in the at least two audio segments;
and S44, obtaining a second audio signal according to the residual audio segment.
Optionally, after obtaining the second audio segment, in order to ensure whether the second audio signal obtained from the remaining audio segments can be used in subsequent systems, whether the audio length of the second audio signal meets a certain condition may be detected, and the determining method may refer to fig. 6, where a flowchart of a method for determining whether to discard the second audio signal is shown in the figure, including but not limited to steps S51-S54;
s51, obtaining a second audio signal;
s52, detecting whether the audio length of the second audio signal is larger than or equal to a first threshold value; if the audio length of the second audio signal is smaller than the first threshold, go to step S53, and if the audio length of the second audio signal is greater than or equal to the first threshold, go to step S54;
s53, discarding the second audio signal;
s54, determining the second audio signal as a usable audio signal;
the first threshold mentioned above refers to the length of an audio signal that can be input into a subsequent system for processing, for example, in a text-independent voiceprint registration scene, a length of a speech signal that needs to be registered reaches 20S, so it can be determined whether the audio length of the second audio signal is greater than or equal to 20S, if so, the second audio signal is retained, and a voiceprint recognition model is trained by using the second audio signal.
In an embodiment, the third obtaining unit is specifically configured to:
for each of the at least two audio segments, detecting whether a clipping exists for the audio segment;
if the audio segment has amplitude truncation, discarding the audio segment;
obtaining the discarded residual audio segments of the at least two audio segments;
and obtaining a second audio signal according to the residual audio segment.
Optionally, as shown in fig. 12, the apparatus further includes:
a detection unit for detecting whether the audio length of the second audio signal is greater than or equal to a first threshold;
a first determining unit, configured to determine that the second audio signal is an available audio signal if the audio length of the second audio signal is greater than or equal to the first threshold;
and if the audio length of the second audio signal is smaller than the first threshold value, discarding the second voice signal.
In one embodiment, each of the at least two audio segments includes at least one sampling point, and the third obtaining unit detects whether amplitude truncation exists in the audio segment by obtaining an amplitude value of each of the at least one sampling point included in the audio segment;
if the amplitude value of the at least one sampling point meets a first condition, determining that amplitude truncation exists in the audio segment, wherein the first condition comprises: the amplitude value of the first number of consecutive sample points or the number of consecutive sample points greater than the first number is greater than the second threshold value.
Optionally, as shown in fig. 12, the apparatus further includes:
a fourth obtaining unit, configured to obtain an amplitude value of each of N sampling points included in the first audio signal;
a second determining unit, configured to determine that amplitude truncation exists in the first audio signal if the amplitude values of the N sampling points satisfy a first condition, where the first condition includes: the amplitude value of the first number of consecutive sample points or the number of consecutive sample points greater than the first number is greater than the second threshold value.
In one embodiment, the amplitude value of each of the N sampling points belongs to a target sampling range;
the second threshold is a product of a maximum value of the target sampling range and a target proportion.
Optionally, as shown in fig. 12, the apparatus further includes:
the second dividing unit is used for dividing a target sampling range into at least two sub-ranges, wherein the at least two sub-ranges are not overlapped with each other, and the target sampling range is a range in which amplitude values of N sampling points included in the first audio signal are located;
the counting unit is used for counting the number of sampling points which belong to each sub-range of the at least two sub-ranges in the amplitude values of the N sampling points;
a construction unit for constructing a histogram, a horizontal axis of the histogram comprising the at least two sub-ranges and a vertical axis of the histogram comprising the number of sample points belonging to the sub-ranges;
and the third determining unit is used for determining that the amplitude truncation exists in the first audio signal if the variation trend of the histogram meets a second condition.
In an embodiment, the second obtaining unit is specifically configured to:
determining a first sub-range from the at least two sub-ranges, wherein the amplitude value of the first sub-range is the sub-range with the largest amplitude value in the at least two sub-ranges;
acquiring the number of sampling points of which the amplitude values belong to the first sub-range from the N sampling points as a first number;
calculating a ratio between the first number and the N, and using the ratio as target data for representing a clipping ratio of the first audio signal.
Optionally, the maximum value of the target range is a third threshold, and the minimum value of the target range is a fourth threshold, and the apparatus further includes a fourth determining unit;
the fourth determining unit is specifically configured to discard the first audio signal if the target data is greater than the third threshold;
and if the target data is smaller than the fourth threshold value, determining the first audio signal as a usable audio signal.
In the embodiment of the invention, after the first audio signal with the amplitude clipping is obtained, the processing mode of the first audio signal is determined according to the target data for representing the amplitude clipping proportion of the first audio signal, and the second audio signal is obtained after the amplitude clipping detection processing. The embodiment of the application does not simply discard the audio signal with the amplitude clipping, but further processes the audio signal with the amplitude clipping, so that the effective audio signal can be kept as much as possible, and the availability of the audio signal is greatly improved.
Referring to fig. 13, which is a schematic structural diagram of another audio signal processing apparatus according to an embodiment of the present invention, as shown in fig. 13, the audio signal processing apparatus 1000 may include: at least one processor 1001, such as a CPU, at least one communication interface 1003, memory 1004, at least one communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The communication interface 1003 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1004 may be a high-speed RAM memory or a non-volatile memory (e.g., at least one disk memory). The memory 1004 may optionally be at least one storage device located remotely from the processor 1001. As shown in fig. 13, memory 1004, which is a type of computer storage medium, may include an operating system, a network communication module, and program instructions.
In the audio signal processing apparatus 1000 shown in fig. 13, the processor 1001 may be configured to load program instructions stored in the memory 1004 and specifically perform the following operations:
acquiring a first audio signal with amplitude clipping, wherein the first audio signal comprises N sampling points, and N is a positive integer;
acquiring target data for representing an amplitude truncation ratio of the first audio signal, wherein the amplitude truncation ratio is used for representing a ratio between the number of sampling points with amplitude truncation in the N sampling points and the N;
if the target data belong to a target range, dividing the first audio signal into at least two audio segments;
and carrying out amplitude-cutting detection processing on the at least two audio segments, and obtaining a second audio signal according to the audio segments after the amplitude-cutting detection processing.
Optionally, before acquiring the first audio signal with the truncated amplitude, the method further includes:
acquiring an amplitude value of each of N sampling points included in the first audio signal;
if the amplitude values of the N sampling points meet a first condition, determining that amplitude truncation exists in the first audio signal, wherein the first condition comprises the following steps: the amplitude value of the first number of consecutive sample points or the number of consecutive sample points greater than the first number is greater than the second threshold value.
Optionally, before acquiring the first audio signal with the truncated amplitude, the method further includes:
dividing a target sampling range into at least two sub-ranges, wherein the at least two sub-ranges are not overlapped with each other, and the target sampling range is a range in which amplitude values of N sampling points included in the first audio signal are located;
counting the number of sampling points belonging to each sub-range of the at least two sub-ranges in the amplitude values of the N sampling points;
constructing a histogram, a horizontal axis of the histogram comprising the at least two sub-ranges and a vertical axis of the histogram comprising the number of sample points belonging to the sub-ranges;
and if the variation trend of the histogram meets a second condition, determining that the amplitude truncation exists in the first audio signal.
Optionally, the obtaining target data representing the clipping ratio of the first audio signal includes:
determining a first sub-range from the at least two sub-ranges, wherein the amplitude value of the first sub-range is the sub-range with the largest amplitude value in the at least two sub-ranges;
acquiring the number of sampling points of which the amplitude values belong to the first sub-range from the N sampling points as a first number;
calculating a ratio between the first number and the N, and using the ratio as target data for representing a clipping ratio of the first audio signal.
Optionally, the performing amplitude-clipping detection processing on the at least two audio segments and obtaining a second audio signal according to the audio segment after amplitude-clipping detection processing includes:
for each of the at least two audio segments, detecting whether a clipping exists for the audio segment;
if the audio segment has amplitude truncation, discarding the audio segment;
obtaining the discarded residual audio segments of the at least two audio segments;
and obtaining a second audio signal according to the residual audio segment.
Optionally, each of the at least two audio segments includes at least one sampling point, and the determining whether there is a truncation of the audio segment includes:
acquiring the amplitude value of each sampling point in at least one sampling point included in the audio segment;
if the amplitude value of the at least one sampling point meets a first condition, determining that amplitude truncation exists in the audio segment, wherein the first condition comprises: the amplitude value of the first number of consecutive sample points or the number of consecutive sample points greater than the first number is greater than the second threshold value.
Optionally, the amplitude value of each of the N sampling points belongs to a target sampling range, and the second threshold is a product of a maximum value of the target sampling range and a target proportion.
Optionally, the maximum value of the target range is a third threshold, and the minimum value of the target range is a fourth threshold, and the method further includes:
if the target data is greater than the third threshold, discarding the first audio signal;
and if the target data is smaller than the fourth threshold value, determining the first audio signal as a usable audio signal.
Optionally, the processor 1001 may also be configured to load program instructions stored in the memory 1004 for performing the following operations:
detecting whether the audio length of the second audio signal is greater than or equal to a first threshold;
if the audio length of the second audio signal is greater than or equal to the first threshold, determining that the second audio signal is an available audio signal;
and if the audio length of the second audio signal is smaller than the first threshold value, discarding the second voice signal.
It should be noted that, for a specific implementation process, reference may be made to specific descriptions of the method embodiment shown in fig. 1, which are not described herein again.
For specific execution steps, reference may be made to the description of the foregoing embodiments, which are not described herein again.
An embodiment of the present invention further provides a computer storage medium, where the computer storage medium may store a plurality of instructions, where the instructions are suitable for being loaded by a processor and executing the method steps in the embodiment shown in fig. 1, and a specific execution process may refer to a specific description of the embodiment shown in fig. 1, which is not described herein again.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and includes processes of the embodiments of the methods described above when the computer program is executed. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

Claims (10)

1. An audio signal processing method, comprising:
acquiring a first audio signal with amplitude clipping, wherein the first audio signal comprises N sampling points, and N is a positive integer;
acquiring target data for representing an amplitude truncation ratio of the first audio signal, wherein the amplitude truncation ratio is used for representing a ratio between the number of sampling points with amplitude truncation in the N sampling points and the N;
if the target data belong to a target range, dividing the first audio signal into at least two audio segments;
and carrying out amplitude-cutting detection processing on the at least two audio segments, and obtaining a second audio signal according to the audio segments after the amplitude-cutting detection processing.
2. The method of claim 1, wherein said performing amplitude clipping detection processing on said at least two audio segments and obtaining a second audio signal based on said amplitude-clipped, detection processed audio segments comprises:
for each of the at least two audio segments, detecting whether a clipping exists for the audio segment;
if the audio segment has amplitude truncation, discarding the audio segment;
obtaining the discarded residual audio segments of the at least two audio segments;
and obtaining a second audio signal according to the residual audio segment.
3. The method of claim 2, wherein the method further comprises:
detecting whether the audio length of the second audio signal is greater than or equal to a first threshold;
if the audio length of the second audio signal is greater than or equal to the first threshold, determining that the second audio signal is an available audio signal;
and if the audio length of the second audio signal is smaller than the first threshold value, discarding the second voice signal.
4. The method of claim 2, wherein each of the at least two audio segments includes at least one sample point, the detecting whether an amplitude truncation exists for the audio segment comprising:
acquiring the amplitude value of each sampling point in at least one sampling point included in the audio segment;
if the amplitude value of the at least one sampling point meets a first condition, determining that amplitude truncation exists in the audio segment, wherein the first condition comprises: the amplitude value of the first number of consecutive sample points or the number of consecutive sample points greater than the first number is greater than the second threshold value.
5. The method of claim 1, wherein prior to obtaining the first audio signal in which the clipping is present, further comprising:
acquiring an amplitude value of each of N sampling points included in the first audio signal;
if the amplitude values of the N sampling points meet a first condition, determining that amplitude truncation exists in the first audio signal, wherein the first condition comprises the following steps: the amplitude value of the first number of consecutive sample points or the number of consecutive sample points greater than the first number is greater than the second threshold value.
6. The method of claim 1, wherein prior to obtaining the first audio signal in which the clipping is present, further comprising:
dividing a target sampling range into at least two sub-ranges, wherein the at least two sub-ranges are not overlapped with each other, and the target sampling range is a range in which amplitude values of N sampling points included in the first audio signal are located;
counting the number of sampling points belonging to each sub-range of the at least two sub-ranges in the amplitude values of the N sampling points;
constructing a histogram, a horizontal axis of the histogram comprising the at least two sub-ranges and a vertical axis of the histogram comprising the number of sample points belonging to the sub-ranges;
and if the variation trend of the histogram meets a second condition, determining that the amplitude truncation exists in the first audio signal.
7. The method of claim 6, wherein obtaining target data representing a truncation ratio of the first audio signal comprises:
determining a first sub-range from the at least two sub-ranges, wherein the amplitude value of the first sub-range is the sub-range with the largest amplitude value in the at least two sub-ranges;
acquiring the number of sampling points of which the amplitude values belong to the first sub-range from the N sampling points as a first number;
calculating a ratio between the first number and the N, and using the ratio as target data for representing a clipping ratio of the first audio signal.
8. An audio signal processing apparatus, comprising:
the device comprises a first acquisition unit, a second acquisition unit and a processing unit, wherein the first acquisition unit is used for acquiring a first audio signal with amplitude truncation, the first audio signal comprises N sampling points, and N is a positive integer;
a second acquisition unit configured to acquire target data indicating an amplitude truncation ratio of the first audio signal, the amplitude truncation ratio indicating a ratio between the number of samples having an amplitude truncation among the N samples and the N;
the first dividing unit is used for dividing the first audio signal into at least two audio segments if the target data belong to a target range;
and the third acquisition unit is used for carrying out amplitude-cutting detection processing on the at least two audio segments and acquiring a second audio signal according to the audio segments after the amplitude-cutting detection processing.
9. An audio signal processing apparatus comprising a processor, a memory and a communication interface, the processor, the memory and the communication interface being interconnected, wherein the communication interface is configured to receive and transmit data, the memory is configured to store program code, and the processor is configured to invoke the program code to perform the method of any of claims 1 to 7.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which is executed by a processor to implement the method of any one of claims 1 to 7.
CN201911034571.6A 2019-10-29 2019-10-29 Audio signal processing method and device Active CN110931021B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201911034571.6A CN110931021B (en) 2019-10-29 2019-10-29 Audio signal processing method and device
PCT/CN2019/118444 WO2021082083A1 (en) 2019-10-29 2019-11-14 Audio signal processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911034571.6A CN110931021B (en) 2019-10-29 2019-10-29 Audio signal processing method and device

Publications (2)

Publication Number Publication Date
CN110931021A true CN110931021A (en) 2020-03-27
CN110931021B CN110931021B (en) 2023-10-13

Family

ID=69849667

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911034571.6A Active CN110931021B (en) 2019-10-29 2019-10-29 Audio signal processing method and device

Country Status (2)

Country Link
CN (1) CN110931021B (en)
WO (1) WO2021082083A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113852893A (en) * 2020-06-28 2021-12-28 北京小米移动软件有限公司 Data processing method and device, terminal and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103117063A (en) * 2012-12-27 2013-05-22 安徽科大讯飞信息科技股份有限公司 Music content cut-frame detection method based on software implementation
CN105989853A (en) * 2015-02-28 2016-10-05 科大讯飞股份有限公司 Audio quality evaluation method and system
CN106782613A (en) * 2016-12-22 2017-05-31 广州酷狗计算机科技有限公司 Signal detecting method and device
CN108804072A (en) * 2018-06-13 2018-11-13 广州酷狗计算机科技有限公司 Audio-frequency processing method, device, storage medium and terminal
CN109859745A (en) * 2019-03-27 2019-06-07 北京爱数智慧科技有限公司 A kind of audio-frequency processing method, equipment and computer-readable medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108091352B (en) * 2017-12-27 2020-10-13 腾讯音乐娱乐科技(深圳)有限公司 Audio file processing method and device, storage medium and terminal equipment
US10380989B1 (en) * 2018-02-22 2019-08-13 Cirrus Logic, Inc. Methods and apparatus for processing stereophonic audio content

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103117063A (en) * 2012-12-27 2013-05-22 安徽科大讯飞信息科技股份有限公司 Music content cut-frame detection method based on software implementation
CN105989853A (en) * 2015-02-28 2016-10-05 科大讯飞股份有限公司 Audio quality evaluation method and system
CN106782613A (en) * 2016-12-22 2017-05-31 广州酷狗计算机科技有限公司 Signal detecting method and device
CN108804072A (en) * 2018-06-13 2018-11-13 广州酷狗计算机科技有限公司 Audio-frequency processing method, device, storage medium and terminal
CN109859745A (en) * 2019-03-27 2019-06-07 北京爱数智慧科技有限公司 A kind of audio-frequency processing method, equipment and computer-readable medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113852893A (en) * 2020-06-28 2021-12-28 北京小米移动软件有限公司 Data processing method and device, terminal and storage medium

Also Published As

Publication number Publication date
CN110931021B (en) 2023-10-13
WO2021082083A1 (en) 2021-05-06

Similar Documents

Publication Publication Date Title
CN109087632B (en) Speech processing method, device, computer equipment and storage medium
US8744842B2 (en) Method and apparatus for detecting voice activity by using signal and noise power prediction values
US10097687B2 (en) Nuisance call detection device and method
CN108564948B (en) Voice recognition method and electronic equipment
CN109616097B (en) Voice data processing method, device, equipment and storage medium
CN109644192B (en) Audio delivery method and apparatus with speech detection period duration compensation
CN108039181B (en) Method and device for analyzing emotion information of sound signal
CN110264999B (en) Audio processing method, equipment and computer readable medium
US11282514B2 (en) Method and apparatus for recognizing voice
CN110797031A (en) Voice change detection method, system, mobile terminal and storage medium
CN110288085B (en) Data processing method, device and system and storage medium
CN109032823A (en) A kind of extremely self-healing method and device of voice module
CN109960484B (en) Audio volume acquisition method and device, storage medium and terminal
CN109584881B (en) Number recognition method and device based on voice processing and terminal equipment
CN110807093A (en) Voice processing method and device and terminal equipment
US8886527B2 (en) Speech recognition system to evaluate speech signals, method thereof, and storage medium storing the program for speech recognition to evaluate speech signals
CN110111811A (en) Audio signal detection method, device and storage medium
CN111108551B (en) Voiceprint identification method and related device
CN110851333B (en) Root partition monitoring method and device and monitoring server
CN110689885B (en) Machine synthesized voice recognition method, device, storage medium and electronic equipment
CN116386612A (en) Training method of voice detection model, voice detection method, device and equipment
CN110503974B (en) Confrontation voice recognition method, device, equipment and computer readable storage medium
CN110931021A (en) Audio signal processing method and device
CN111640450A (en) Multi-person audio processing method, device, equipment and readable storage medium
CN113658581B (en) Acoustic model training method, acoustic model processing method, acoustic model training device, acoustic model processing equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40019543

Country of ref document: HK

SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant