CN114724581A - Audio processing method, device, equipment and storage medium - Google Patents

Audio processing method, device, equipment and storage medium Download PDF

Info

Publication number
CN114724581A
CN114724581A CN202210403461.8A CN202210403461A CN114724581A CN 114724581 A CN114724581 A CN 114724581A CN 202210403461 A CN202210403461 A CN 202210403461A CN 114724581 A CN114724581 A CN 114724581A
Authority
CN
China
Prior art keywords
audio data
data
audio
played
detected
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210403461.8A
Other languages
Chinese (zh)
Inventor
付建林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Spreadtrum Communications Shanghai Co Ltd
Original Assignee
Spreadtrum Communications Shanghai Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Spreadtrum Communications Shanghai Co Ltd filed Critical Spreadtrum Communications Shanghai Co Ltd
Priority to CN202210403461.8A priority Critical patent/CN114724581A/en
Publication of CN114724581A publication Critical patent/CN114724581A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0316Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude
    • G10L21/0324Details of processing therefor
    • G10L21/034Automatic adjustment
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/003Changing voice quality, e.g. pitch or formants
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/21Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being power information
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing For Digital Recording And Reproducing (AREA)

Abstract

The embodiment of the application provides an audio processing method, an audio processing device, audio processing equipment and a storage medium, wherein the method comprises the steps of obtaining an audio data stream to be played; detecting whether the data variation between adjacent audio data exists in the audio data stream to be played is larger than a preset threshold, if so, performing gradient processing on the target audio data by adopting a preset gradient step length to obtain detected audio data; the target audio data is one of two adjacent audio data with data variation larger than a preset threshold; and transmitting the detected audio data in the audio data stream to be played to a player. This approach reduces the likelihood of pop tones being present when the audio data is played.

Description

Audio processing method, device, equipment and storage medium
Technical Field
The present application relates to the field of audio technologies, and in particular, to an audio processing method, apparatus, device, and storage medium.
Background
pop sound is a plosive generated at the moment when audio is played, and is also a kind of noise. The noise is sharp, which brings very poor hearing feeling to the user and can bring damage to the hearing of the user after a long time. pop sounds are mostly generated by hardware, and are generated by transient impacts generated at the moment of powering up and powering down on an audio device, and are generally solved by modifying the power-up timing of a codec or adding a capacitor to a circuit.
However, there is a pop sound because of abrupt changes in audio data, and the audio data is instantaneously changed from a stable value to another value, for example, instantaneously decreased from a stable value to a small value, or instantaneously changed from a small value to a large value, and then the energy of the sound is suddenly released or accumulated in a short time, and the sound effect is a similar "explosion sound" in response to the fact that the sound effect is similar to the pop sound effect generated by hardware, and the sound effect is also an impact effect to the ear of the user, and the hearing is damaged. However, such pop tones cannot be suppressed or eliminated by hardware circuits.
Disclosure of Invention
In view of this, the present application provides an audio processing method, apparatus, device and storage medium, so as to solve the problem of pop sound existing during playing audio data.
In a first aspect, an embodiment of the present application provides an audio processing method, including:
acquiring a video data stream to be played;
detecting whether the data variation between adjacent audio data exists in the audio data stream to be played is larger than a preset threshold, if so, performing gradient processing on the target audio data by adopting a preset gradient step length to obtain detected audio data; the target audio data is one of two adjacent audio data with data variation larger than a preset threshold;
and transmitting the detected audio data in the audio data stream to be played to a player.
Preferably, the detecting whether a data variation between adjacent audio data exists in the audio data stream to be played is greater than a preset threshold, and if so, performing a gradual change process on the target audio data by using a preset gradual change step length to obtain detected audio data includes:
circularly executing the audio data detection step until the audio data in the audio data stream to be played are determined to be detected audio data; wherein the audio data detecting step includes:
determining first audio data and second audio data adjacent to the first audio data in the audio data to be detected of the audio data stream to be played according to a preset sequence;
detecting whether the data variation between the first audio data and the second audio data is larger than a preset threshold value or not;
if so, determining the second audio data as target audio data, and performing gradient processing on the target audio data to obtain at least one gradient data of the target audio data;
and determining at least one gradient data of the first audio data and the target audio data as detected audio data, and updating the audio data to be detected.
Preferably, the determining first audio data and second audio data adjacent to the first audio data in the audio data to be detected of the audio data stream to be played includes:
and determining audio data of a first preset time period as first audio data and determining audio data of a second preset time period adjacent to the first preset time period as second audio data in the audio data to be detected of the audio data stream to be played according to a time sequence.
Preferably, the detecting whether a data variation between the first audio data and the second audio data is greater than a preset threshold includes:
calculating the data variation between each frame of audio data in a first preset time period and the corresponding audio data in a second preset time period;
calculating a data variation average value of the audio data between the first preset time period and the second preset time period according to the data variation between each frame of audio data in the first preset time period and the corresponding audio data in the second preset time period;
and detecting whether the data variation average value of the audio data between the first preset time period and the second preset time period is greater than a preset threshold value.
Preferably, the performing a gradient process on the target audio data to obtain at least one gradient data of the target audio data includes:
utilizing a formula data [ i ] for the target audio data]=i*data[i]TargetPerforming gradient processing to obtain at least one gradient data of the target audio data; wherein, the data [ i [ ]]TargetThe ith data, data [ i ], representing the target audio data stream]And indicating the ith gradation data, wherein i is an integer which is more than 0 and not more than N, and N is the number of the preset gradation data.
Preferably, the method further comprises the following steps:
and when the data variation between the first audio data and the second audio data is not larger than a preset threshold value, determining the first audio data as detected audio data, and updating the audio data to be detected.
Preferably, before the acquiring the audio data stream to be played, the method further includes:
acquiring an audio signal to be played;
the acquiring of the audio data stream to be played comprises:
and decoding and sampling the audio signal to be played to obtain an audio data stream.
In a second aspect, an embodiment of the present application provides an audio processing apparatus, including:
the acquisition unit is used for acquiring a to-be-played audio data stream;
the processing unit is used for detecting whether the data variation between adjacent audio data exists in the audio data stream to be played is larger than a preset threshold value, if so, performing gradient processing on the target audio data by adopting a preset gradient step length to obtain detected audio data; the target audio data is one of two adjacent audio data with data variation larger than a preset threshold;
and the transmission unit is used for transmitting the detected audio data in the audio data stream to be played to a player.
In a third aspect, the present application provides a computer program product comprising a memory for storing computer program instructions and a processor for executing the program instructions, wherein the computer program instructions, when executed by the processor, trigger the electronic device to perform the method of any of the above first aspects.
In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, where the computer-readable storage medium includes a stored program, where the program, when executed, controls an apparatus in which the computer-readable storage medium is located to perform the method according to any one of the above first aspects.
By adopting the scheme provided by the embodiment of the application, the audio data stream to be played is obtained; detecting whether data variation between adjacent audio data exists in the audio data stream to be played and is larger than a preset threshold, if so, performing gradient processing on target audio data by adopting a preset gradient step length to obtain detected audio data; and transmitting the detected audio data in the audio data stream to be played to the player. Therefore, in the embodiment of the present application, when it is detected that the data variation between adjacent audio data in the audio data stream is greater than the preset threshold, it is determined that the adjacent audio data has a data sudden change, and at this time, the target audio data in the adjacent audio data may be subjected to a gradual change process by using a gradual change step length to obtain the detected audio data, so as to reduce the possibility of the data sudden change between the adjacent audio data. And transmitting the detected audio data in the audio data stream to be played to the player. That is, the audio data with data abrupt change is gradually changed in the application, so as to reduce the occurrence of data abrupt change in the audio stream, thereby reducing pop caused by data abrupt change and reducing the possibility of pop existing during the playing of the audio data.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive labor.
Fig. 1 is a schematic flowchart of an audio processing method according to an embodiment of the present application;
fig. 2a is a schematic view of an audio processing scenario provided in an embodiment of the present application;
fig. 2b is a schematic view of another audio processing scenario provided in the embodiment of the present application;
fig. 3 is a schematic flowchart of another audio processing method according to an embodiment of the present application;
fig. 4 is a schematic flowchart of another audio processing method according to an embodiment of the present application;
fig. 5 is a schematic flowchart of another audio processing method according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of an audio processing apparatus according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
For better understanding of the technical solutions of the present application, the following detailed descriptions of the embodiments of the present application are provided with reference to the accompanying drawings.
It should be understood that the embodiments described are only a few embodiments of the present application, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The terminology used in the embodiments of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the examples of this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be understood that the term "and/or" as used herein is merely a relationship that describes an associated object, meaning that three relationships may exist, e.g., A and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.
Before specifically describing the embodiments of the present application, terms applied or likely to be applied to the embodiments of the present application will be explained first.
pop sound is a plosive generated at the moment when audio is played, and is also a kind of noise.
In the related art, when the audio data is suddenly changed from a stable value to another value, for example, from a stable value to a small value or from a small value to a large value, the energy of the sound is suddenly released or accumulated in a short time, and the sound effect is a similar "explosion sound", which is similar to the pop effect generated by hardware and can also bring impact effect to the ear of the user and damage the hearing. However, such pop tones cannot be suppressed or eliminated by hardware circuits.
In view of the foregoing problems, embodiments of the present application provide an audio processing method, apparatus, device, and storage medium, which obtain an audio data stream to be played; detecting whether data variation between adjacent audio data exists in the audio data stream to be played and is larger than a preset threshold, if so, performing gradient processing on target audio data by adopting a preset gradient step length to obtain detected audio data; and transmitting the detected audio data in the audio data stream to be played to the player. Therefore, in the embodiment of the present application, when it is detected that the data variation between adjacent audio data in the audio data stream is greater than the preset threshold, it is indicated that the adjacent audio data has data abrupt change, and at this time, the target audio data in the adjacent audio data may be subjected to the gradual change processing by using the gradual change step size to obtain the detected audio data, so as to reduce the possibility of data abrupt change between the adjacent audio data. And transmitting the detected audio data in the audio data stream to be played to the player. That is, the audio data with data abrupt change is gradually changed in the application, so that the occurrence of data abrupt change in the audio stream is reduced, pop sound caused by data abrupt change can be reduced, and the possibility of pop sound existing during the playing of the audio data is reduced. The details will be described below.
Referring to fig. 1, a schematic flowchart of an audio processing method provided in an embodiment of the present application is shown. As shown in fig. 1, the method includes:
step S101, obtaining the audio data stream to be played.
In this embodiment of the application, when a user needs to play audio, the audio processing device may acquire the audio signal. The audio signal is decoded to obtain a stream of audio data to be played. The stream of audio stream to be played may be PCM (Pulse Code Modulation) audio data.
It should be noted that the PCM audio data is a bare stream of uncompressed audio sample data.
As a possible implementation manner, the audio processing apparatus may obtain the audio data stream to be played from the storage device, for example, the decoded audio data stream to be played may be stored in a buffer, and at this time, the audio processing apparatus may obtain the audio data stream to be played from the buffer. Or, the decoded audio data stream to be played is stored in a storage device outside the audio processing apparatus, and at this time, the audio processing apparatus can obtain the audio data stream to be played from the storage device.
Step S102, whether the data variation between adjacent audio data exists in the audio data stream to be played is detected to be larger than a preset threshold value, if yes, the target audio data is subjected to gradient processing by adopting a preset gradient step length, and the detected audio data is obtained.
Wherein the target audio data is one of two adjacent audio data having a data variation larger than a preset threshold.
In the embodiment of the present application, when the variation of two adjacent audio data exceeds a certain threshold, when one audio data is switched to another audio data, the pop sound is caused due to the large data amount, so that in the present application, the audio data with data abrupt change between the adjacent audio data in the audio data stream needs to be found out, as shown in fig. 2a, and is subjected to gradual change processing, so as to reduce the pop sound caused by the data abrupt change. Based on this, after the audio processing device acquires the audio data stream to be played, the audio processing device can detect the audio data stream to be played, compare two audio data in the audio data stream to be played, and detect whether the data variation between adjacent audio data is greater than a preset threshold value. When it is detected that the data variation between adjacent audio data in the audio data stream to be played is greater than the preset threshold, it indicates that a data sudden change occurs between the adjacent audio data, and at this time, a preset gradual change step length may be adopted to perform gradual change processing on the target audio data in the adjacent audio data, so that the data variation between the adjacent audio data after the gradual change processing is smaller than the preset threshold, as shown in fig. 2b, thereby reducing the generation of pop sound. And the audio processing device determines the audio data with the data variation between the audio data and the target audio data larger than a preset threshold value and the gradient data obtained after gradient processing of the target audio data as the detected audio data.
It should be noted that the preset gradual change step length is preset according to actual requirements.
As a possible implementation manner, the audio data detection step is executed in a loop until the audio data in the audio data stream to be played are all determined to be detected audio data. As shown in fig. 3, the audio data detecting step includes:
s301, determining first audio data and second audio data adjacent to the first audio data in the audio data to be detected of the audio data stream to be played according to a preset sequence.
S302, whether the data variation between the first audio data and the second audio data is larger than a preset threshold value is detected.
And S303, if the second audio data is larger than the target audio data, determining the second audio data as the target audio data, and performing gradient processing on the target audio data to obtain at least one gradient data of the target audio data.
S304, at least one gradient data of the first audio data and the target audio data is determined as detected audio data, and the audio data to be detected is updated.
It should be noted that the audio data to be detected is the audio data in the audio data stream that is not detected and whether the data variation with the front and rear adjacent audio data exceeds the preset threshold. The detected audio data is the audio data which is in the audio data stream and has been subjected to data variation with front and back adjacent audio data, and whether the data variation exceeds a preset threshold value or not. The audio data adjacent to the front and back of the audio data refers to the audio data adjacent to the playing time corresponding to the audio data, the playing time is before the playing time corresponding to the audio data, and the audio data is after the playing time corresponding to the audio data.
In the embodiment of the application, after the audio processing device acquires the audio data stream to be played, all the audio data in the audio data stream to be played may be determined as undetected audio data. At this time, the audio processing device may determine the first audio data and the second audio data adjacent to the first audio data in the audio data to be detected of the audio data stream to be played according to a preset sequence. For example, the audio processing apparatus may determine, as the first audio data, the audio data with the smallest playing time in the audio data to be detected according to the size sequence of the playing time of the audio data. Alternatively, the audio processing apparatus may determine, as the first audio data, the audio data stored first in the audio data to be detected according to a storage order of the audio data in the buffer. After the audio processing device determines the first audio data in the audio data to be detected, the audio processing device may determine the audio data adjacent to the first audio data in the audio data to be played as the second audio data. For example, audio data played after the first audio data and adjacent to the first audio data may be determined as the second audio data. Or determines audio data stored after and adjacent to the first audio data as the second audio data. After the first audio data and the second audio data are determined, the first audio data and the second audio data may be compared, whether a data amount change between the first audio data and the second audio data is greater than a preset threshold is detected, if the data amount change is greater than the preset threshold, it is indicated that the data change between the first audio data and the second audio data is too large, which may cause pop sound, at this time, since a data change amount between the first audio data and the second audio data exceeds the preset threshold, and the second audio data is played after the first audio data, in order to reduce the data change amount between the first audio data and the second audio data, gradient data may be added between the first audio data and the second audio data, since the first audio data is switched to the second audio data, gradient data may be generated according to the second audio data, so that the first audio data may be switched to the second audio data after the gradient data, so as to reduce the data variation when the audio data is switched. Therefore, the second audio data can be determined as the target audio data, and the target audio data is subjected to gradient processing by adopting a preset gradient step length to obtain at least one gradient data of the target audio data. And determining at least one gradient data of the first audio data and the target audio data as detected audio data, updating the audio data to be detected, determining the first audio data in the original audio data to be detected as detected audio data, and updating the audio data to be detected.
The audio processing device can detect whether the audio data in the audio data stream to be played are all the detected audio data. That is, whether the audio data in the audio data stream to be played are determined as detected audio data is detected, if the audio data in the audio data stream to be detected have the audio data to be detected, the steps S301 to S304 are executed again until the audio data in the audio data stream to be played are determined as detected audio data. When the steps S301 to S304 are executed again, the audio processing apparatus may determine the first audio data and the second audio data according to a preset sequence in the updated audio data to be detected. At this time, the determined first audio data may be the second audio data in the last detection period.
As a possible implementation manner, when detecting whether a data variation between the first audio data and the second audio data is greater than a preset threshold, the preset threshold may be different according to different detection manners. For example, when a data difference between the first audio data and the second audio data can be detected, the preset threshold is a preset difference threshold. That is, a data difference between the first audio data and the second audio data may be calculated, and the data difference may be compared with a preset difference threshold, so as to determine whether a data variation between the first audio data and the second audio data is greater than the preset threshold. When the data difference value is larger than a preset difference threshold value, determining that the data variation between the first audio data and the second audio data is larger than a preset threshold value; and when the data difference value is not greater than the preset difference threshold value, determining that the data variation between the first audio data and the second audio data is not greater than the preset threshold value.
Alternatively, a ratio between the first audio data and the second audio data may be detected, where the predetermined threshold includes a first predetermined ratio and a second predetermined ratio. That is, a data ratio between the first audio data and the second audio data is calculated, the data ratio is compared with a first preset ratio and a second preset ratio, and whether the data ratio is greater than the first preset ratio or less than the second preset ratio is detected. And when the data ratio is greater than a first preset ratio or the data ratio is smaller than a second preset ratio, determining that the data variation between the first audio data and the second audio data is greater than a preset threshold.
It should be noted that, in this embodiment of the present application, the size sequence of the audio data playing time refers to the playing sequence of the audio data, where the playing time corresponding to the audio data played first is the smallest, and the playing time corresponding to the audio data played later is smaller. In an initial stage, the audio processing apparatus may determine a first frame of audio data in the audio data stream to be played as first audio data. Of course, the preset sequence may be other sequences, and is not limited in itself.
As a possible implementation manner, the first audio data and the second audio data are both frame audio data, so that the detection accuracy of the audio data stream to be played can be increased, and pop sound caused by the fact that the data variation of the audio data between any two adjacent frames is larger than a preset threshold can be avoided. However, this method requires detection of each frame of data, and the amount of calculation is large. Accordingly, audio data within a period of time may be determined as the first audio data and the second audio data. At this time, the first audio data and the second audio data are correspondingly detected by taking the audio data contained in the time period as a unit, so that the calculation amount is greatly reduced. That is, determining first audio data and second audio data adjacent to the first audio data in the audio data to be detected of the audio data stream to be played includes:
according to the time sequence in the audio data to be detected of the audio data stream to be played, the audio data of a first preset time period is determined as first audio data, and the audio data of a second preset time period adjacent to the first preset time period is determined as second audio data.
That is to say, in the embodiment of the present application, in order to improve the detection efficiency, a time period may be used as a detection window to detect the data change amount of the audio data in the audio data stream. That is, in the audio data stream to be played, the audio data of the first time period and the audio data of the second time period are determined in the data to be detected according to the preset sequence, wherein the audio data in the first time period and the audio data in the second time period are adjacent audio data. In this case, the first audio data and the second audio data are both audio data of a time period. For example, the first preset time period is 5 seconds, and the second preset time period is 5 seconds. That is, in the audio data to be detected in the audio data stream, according to a preset sequence, for example, according to the sequence of the playing time of the audio data, the audio time within 5 seconds of the smallest playing time of the audio data to be detected is determined as the audio data of the first preset time period. And determining the audio data which are adjacent to the audio data of the first preset time period and have the playing time within 5 seconds after the audio data of the first preset time period as the audio data of the second preset time period. That is, in the embodiment of the present application, the first preset time period is used as the first sliding window, the second preset time period is used as the second sliding window, and the audio data in the first sliding window is determined as the first audio data through the first sliding window. Through the second sliding window, the audio data within the second sliding window is determined as the second audio data.
As a possible implementation manner, detecting whether a data variation between the first audio data and the second audio data is greater than a preset threshold includes:
calculating the data variation between each frame of audio data in a first preset time period and the corresponding audio data in a second preset time period; calculating a data variation average value of the audio data between the first preset time period and the second preset time period according to the data variation between each frame of audio data in the first preset time period and the corresponding audio data in the second preset time period; and detecting whether the data variation average value of the audio data between the first preset time period and the second preset time period is greater than a preset threshold value.
In the embodiment of the present application, after the audio data of the first preset time period is determined as the first audio data and the audio data of the second preset time period is determined as the second audio data, because both the first audio data and the second audio data include multiple frames of audio data, for the accuracy of detection, the data change amount of each frame of audio data in the first audio data and the audio data at the corresponding position in the second audio data may be calculated. After the data variation between each frame of audio data in the first audio data and the corresponding audio data in the second audio data is calculated, a data variation average value of the first audio data and the second audio data, that is, an average value of the data variation between the audio data in the first preset time period and the audio data in the corresponding second preset time period is calculated according to the data variation between each frame of audio data in the first audio data and the corresponding audio data in the second audio data, and the average value is compared with a preset threshold to determine whether the average value is greater than the preset threshold.
For example, the first audio data includes 3 frames of audio data, which are a, b, and c; the second audio data includes 3 frames of audio data, which are d, e, and f, respectively. The audio processing device may perform data variation calculation on the 3 frames of audio data in the first audio data and the 3 frames of audio data in the second audio data respectively. That is, the audio processing apparatus may calculate a data change amount between the a audio data in the first audio data and the d audio data in the second audio data, calculate a data change amount between the b audio data in the first audio data and the e audio data in the second audio data, and calculate a data change amount between the c audio data in the first audio data and the f audio data in the second audio data. The audio processing device calculates an average value of three variable quantities according to a data variable quantity between a audio data in the first audio data and d audio data in the second audio data, a data variable quantity between b audio data in the first audio data and e audio data in the second audio data, and a data variable quantity between c audio data in the first audio data and f audio data in the second audio data, takes the average value as a variation average value, compares the variation average value with a preset threshold value, and detects whether the variation average value is larger than the preset threshold value. And when the average variation value is larger than a preset threshold value, taking the second audio data as target audio data, performing gradient processing on the second audio data by adopting a preset gradient step length to obtain at least one gradient data of the second audio data, and determining the at least one gradient data of the second audio data and the first audio data as detected audio data.
As a possible implementation manner, performing a gradient processing on the target audio data, and obtaining at least one gradient data of the target audio data includes:
data [ i ] is utilized for target audio data by formula]=i*data[i]Targetand/N, performing gradient processing to obtain at least one gradient data of the target audio data.
Wherein, data [ i]TargetThe ith data, data [ i ], representing the target audio data stream]And indicating the ith gradation data, wherein i is an integer which is more than 0 and not more than N, and N is the number of the preset gradation data.
In the embodiment of the present application, when the data variation between the first audio data and the second audio data is greater than the preset threshold, because the audio data is the audio data that is switched from the first audio data to the second audio data, the audio data can be generated as gradient data according to the second audio data, when the first audio data is switched from the second audio data, the gradient data can be inserted after the first audio data, and after the gradient data is switched to the second audio data, because the gradient data generated by the second audio data is smaller than the second audio data, the data variation between the audio data can be reduced. In willAfter the second audio data is determined as the target audio data, the number N of the gradient data to be generated can be preset according to actual requirements, the gradient step length is preset as i/N, and the target audio data adopts a formula data [ i [ ]]=i*data[i]Targetand/N, performing gradient processing to obtain at least one gradient data of the target audio data. I.e. the target audio data is divided into N audio data. The N audio data are stored between the first audio data and the second audio data, so that when the first audio data are switched to the second audio data, the first audio data are switched to the gradient data of the second audio data, the gradient data of the second audio data are switched to the second audio data, the data variation between the gradient data of the first audio data and the gradient data of the second audio data is smaller than a preset threshold, and the data variation between the gradient data of the second audio data and the second audio data is smaller than the preset threshold, so that the possibility of pop sound caused by data shock among the audio data can be greatly reduced.
As a possible implementation manner, it is detected in the step S302 whether the data variation between the first audio data and the second audio data is greater than a first preset threshold, and it may be detected in the step S302 that the data variation between the first audio data and the second audio data is greater than the first preset threshold, and then the following steps S303 to S304 are performed. It may also be detected that the data variation between the first audio data and the second audio data is not greater than the first preset threshold, and the following step S305 is executed.
As shown in fig. 4, the method further comprises:
step S305, when the data variation between the first audio data and the second audio data is not greater than the preset threshold, determining the first audio data as the detected audio data, and updating the audio data to be detected.
In this embodiment of the application, a data variation between the first audio data and the second audio data may not be greater than a preset threshold, that is, the data variation between the first audio data and the second audio data is less than or equal to the preset threshold, which indicates that the data variation between the first audio data and the second audio data is relatively smooth, when the first audio data is switched to the second audio data, a pop sound is not caused, and at this time, the first audio data may be directly determined as detected audio data, and the audio data to be detected in the audio data stream to be played is updated. And re-executing the step S301 until the audio data in the audio data stream to be played are all detected audio data.
Step S103, the detected audio data in the audio data stream to be played is transmitted to the player.
In the embodiment of the application, after determining the detected audio data in the audio data stream to be played, the audio processing device may transmit the detected audio data in the audio data stream to be played to the player for playing. Because the data variation of the adjacent related audio data in the detected audio data exceeds the preset threshold value, the data variation between the two adjacent audio data in the detected audio data is not greater than the preset threshold value, and the pop sound possibility caused by sudden change of the audio data can be greatly reduced.
Referring to fig. 5, a schematic flowchart of another audio processing method provided in the embodiment of the present application is shown. As shown in fig. 5, the method includes:
step S501, obtaining an audio signal to be played.
In the embodiment of the application, when a user needs to play an audio signal, the audio processing device acquires the audio signal to be played. The audio processing device may obtain the audio signal to be played from the buffer, may obtain the audio signal from the storage device, and may also obtain the collected audio signal from the sound collection device.
Step S502, decoding and sampling conversion processing are carried out on the audio signal to be played to obtain an audio data stream.
In this embodiment, the Audio signal obtained by the Audio processing apparatus is an encoded signal, that is, the obtained Audio signal is in a WAV (Waveform Audio File) Format, an AMR (adaptive Multi-Rate) Format, an MP3(Moving Picture Experts Group Audio Layer III, motion Picture Experts compression standard Audio Layer 3) Format, or the like. At this time, the audio processing device needs to decode and convert the audio signal to be played to obtain the audio data stream in the PCM format, that is, obtain the audio sample data stream.
The audio processing device decodes the audio signal after acquiring the audio signal, and obtains an audio data stream by performing a resampling process on the decoded audio data. The sub-sampling process is to convert the sampling rate of the decoded audio data. For example, if the decoded audio data includes a plurality of sampling rates, the sampling rate of the decoded audio data may be converted to the same sampling rate.
Step S503, detecting whether the data variation between adjacent audio data exists in the audio data stream to be played is larger than a preset threshold, if so, performing gradual change processing on the target audio data by adopting a preset gradual change step length to obtain detected audio data.
Wherein the target audio data is one of two adjacent audio data having a data variation larger than a preset threshold.
Specifically, refer to step S102, which is not described herein again.
Step S504 transmits the detected audio data in the audio data stream to be played to the player.
Specifically, refer to step S103, which is not described herein again.
Therefore, in the embodiment of the present application, when it is detected that the data variation between adjacent audio data in the audio data stream is greater than the preset threshold, it is determined that the adjacent audio data has a data sudden change, and at this time, the target audio data in the adjacent audio data may be subjected to a gradual change process by using a gradual change step length to obtain the detected audio data, so as to reduce the possibility of the data sudden change between the adjacent audio data. And transmitting the detected audio data in the audio data stream to be played to the player. That is, the audio data with data abrupt change is gradually changed in the application, so that the occurrence of data abrupt change in the audio stream is reduced, pop sound caused by data abrupt change can be reduced, and the possibility of pop sound existing during the playing of the audio data is reduced.
Referring to fig. 6, a schematic structural diagram of another audio processing apparatus provided in the embodiment of the present application is shown. As shown in fig. 6, the audio processing apparatus includes:
an obtaining unit 601, configured to obtain a to-be-played audio data stream.
The processing unit 602 is configured to detect whether a data variation between adjacent audio data exists in the audio data stream to be played is greater than a preset threshold, and if the data variation exists, perform gradual change processing on the target audio data by using a preset gradual change step length to obtain detected audio data.
Wherein the target audio data is one of two adjacent audio data having a data variation larger than a preset threshold.
As a possible implementation manner, the processing unit 602 is specifically configured to execute the audio data detection step in a loop until all the audio data in the audio data stream to be played is determined to be detected audio data. Wherein, the audio data detection step comprises:
determining first audio data and second audio data adjacent to the first audio data in a preset sequence in audio data to be detected of a audio data stream to be played; detecting whether the data variation between the first audio data and the second audio data is larger than a preset threshold value or not; if so, determining the second audio data as target audio data, and performing gradient processing on the target audio data to obtain at least one gradient data of the target audio data; and determining the first audio data and at least one gradient data of the target audio data as detected audio data, and updating the audio data to be detected.
As a possible implementation manner, the processing unit 602 is specifically configured to determine, in order of time, audio data in a first preset time period as first audio data and determine, in audio data to be detected of an audio data stream to be played, audio data in a second preset time period adjacent to the first preset time period as second audio data.
As a possible implementation manner, the processing unit 602 is specifically configured to calculate a data variation between each frame of audio data in a first preset time period and corresponding audio data in a second preset time period; calculating a data variation average value of the audio data between the first preset time period and the second preset time period according to the data variation between each frame of audio data in the first preset time period and the corresponding audio data in the second preset time period; and detecting whether the data variation average value of the audio data between the first preset time period and the second preset time period is greater than a preset threshold value.
As a possible implementation manner, the processing unit 602 is specifically configured to utilize the formula data [ i ] for the target audio data]=i*data[i]Targetand/N, performing gradient processing to obtain at least one gradient data of the target audio data.
Wherein the data [ i ]]TargetThe ith data, data [ i ], representing the target audio data stream]And indicating the ith gradation data, wherein i is an integer which is more than 0 and not more than N, and N is the number of the preset gradation data.
As a possible implementation manner, the processing unit 502 is further configured to determine the first audio data as detected audio data and update the audio data to be detected when a data variation between the first audio data and the second audio data is not greater than a preset threshold.
A transmitting unit 603, configured to transmit the detected audio data in the audio data stream to be played to the player.
As a possible implementation manner, the obtaining unit 601 is further configured to obtain an audio signal to be played.
As a possible implementation manner, the obtaining unit 601 is specifically configured to decode and perform resampling processing on an audio signal to be played, so as to obtain an audio data stream.
Corresponding to the embodiment, the application further provides the electronic equipment. Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, where the electronic device 700 may include: a processor 701, a memory 702, and a communication unit 703. The components communicate over one or more buses, and those skilled in the art will appreciate that the configuration of the servers shown in the figures are not meant to limit embodiments of the present invention, and may be in the form of buses, stars, more or fewer components than those shown, some components in combination, or a different arrangement of components.
The communication unit 703 is configured to establish a communication channel, so that the storage device can communicate with other devices. Receiving the user data sent by other devices or sending the user data to other devices.
The processor 701, which is a control center of the storage device, connects various parts of the entire electronic device using various interfaces and lines, and executes various functions of the electronic device and/or processes data by operating or executing software programs and/or modules stored in the memory 702 and calling data stored in the memory. The processor may be composed of Integrated Circuits (ICs), for example, a single packaged IC, or a plurality of packaged ICs connected to the same or different functions. For example, processor 701 may include only a Central Processing Unit (CPU). In the embodiment of the present invention, the CPU may be a single operation core, or may include multiple operation cores.
The memory 702 is used for storing instructions executed by the processor 701, and the memory 702 may be implemented by any type of volatile or non-volatile storage device or combination thereof, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic disk or optical disk.
The execution of the instructions in memory 702, when executed by processor 701, enables electronic device 700 to perform some or all of the steps in the embodiment shown in fig. 5.
In specific implementation, the present invention further provides a computer storage medium, where the computer storage medium may store a program, and the program may include some or all of the steps in the embodiments of the audio processing method provided by the present invention when executed. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), a Random Access Memory (RAM), or the like.
Those skilled in the art will readily appreciate that the techniques of the embodiments of the present invention may be implemented as software plus a required general purpose hardware platform. Based on such understanding, the technical solutions in the embodiments of the present invention may be substantially or partially embodied in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, or the like, and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute the method according to the embodiments or some parts of the embodiments.
The same and similar parts in the various embodiments in this specification may be referred to each other. Especially, as for the device embodiment and the terminal embodiment, since they are basically similar to the method embodiment, the description is relatively simple, and the relevant points can be referred to the description in the method embodiment.

Claims (10)

1. An audio processing method, comprising:
acquiring a data stream to be played;
detecting whether the data variation between adjacent audio data exists in the audio data stream to be played is larger than a preset threshold, if so, performing gradient processing on the target audio data by adopting a preset gradient step length to obtain detected audio data; the target audio data is one of two adjacent audio data with data variation larger than a preset threshold;
and transmitting the detected audio data in the audio data stream to be played to a player.
2. The method according to claim 1, wherein the detecting whether the data variation between adjacent audio data exists in the audio data stream to be played is greater than a preset threshold, and if so, performing a gradual change process on the target audio data by using a preset gradual change step length to obtain the detected audio data comprises:
circularly executing the audio data detection step until the audio data in the audio data stream to be played are determined to be detected audio data; wherein the audio data detecting step includes:
determining first audio data and second audio data adjacent to the first audio data in the audio data to be detected of the audio data stream to be played according to a preset sequence;
detecting whether the data variation between the first audio data and the second audio data is larger than a preset threshold value or not;
if so, determining the second audio data as target audio data, and performing gradient processing on the target audio data to obtain at least one gradient data of the target audio data;
and determining at least one gradient data of the first audio data and the target audio data as detected audio data, and updating the audio data to be detected.
3. The method according to claim 2, wherein the determining of the first audio data and the second audio data adjacent to the first audio data in the audio data to be detected of the audio data stream to be played comprises:
and determining audio data of a first preset time period as first audio data and determining audio data of a second preset time period adjacent to the first preset time period as second audio data in the audio data to be detected of the audio data stream to be played according to a time sequence.
4. The method of claim 3, wherein the detecting whether the amount of data change between the first audio data and the second audio data is greater than a predetermined threshold comprises:
calculating the data variation between each frame of audio data in a first preset time period and the corresponding audio data in a second preset time period;
calculating a data variation average value of the audio data between the first preset time period and the second preset time period according to the data variation between each frame of audio data in the first preset time period and the corresponding audio data in the second preset time period;
and detecting whether the data variation average value of the audio data between the first preset time period and the second preset time period is greater than a preset threshold value.
5. The method of claim 2, wherein the performing of the fade processing on the target audio data to obtain at least one fade data of the target audio data comprises:
utilizing a formula data [ i ] for the target audio data]=i*data[i]TargetPerforming gradient processing to obtain at least one gradient data of the target audio data; wherein the data [ i ]]TargetRepresenting the ith data, data i, of the target audio data stream]And indicating the ith gradation data, wherein i is an integer which is more than 0 and not more than N, and N is the number of the preset gradation data.
6. The method of claim 2 or 3, further comprising:
and when the data variation between the first audio data and the second audio data is not larger than a preset threshold value, determining the first audio data as detected audio data, and updating the audio data to be detected.
7. The method of claim 1, further comprising, prior to said obtaining the audio data stream to be played:
acquiring an audio signal to be played;
the acquiring of the audio data stream to be played comprises:
and decoding and sampling the audio signal to be played to obtain an audio data stream.
8. An audio processing apparatus, comprising:
the acquisition unit is used for acquiring a to-be-played audio data stream;
the processing unit is used for detecting whether the data variation between adjacent audio data exists in the audio data stream to be played is larger than a preset threshold value, if so, performing gradient processing on the target audio data by adopting a preset gradient step length to obtain detected audio data; the target audio data is one of two adjacent audio data with data variation larger than a preset threshold;
and the transmission unit is used for transmitting the detected audio data in the audio data stream to be played to a player.
9. An electronic device comprising a memory for storing computer program instructions and a processor for executing the program instructions, wherein the computer program instructions, when executed by the processor, trigger the electronic device to perform the method of any of claims 1-7.
10. A computer-readable storage medium, comprising a stored program, wherein the program, when executed, controls an apparatus in which the computer-readable storage medium resides to perform the method of any one of claims 1-7.
CN202210403461.8A 2022-04-18 2022-04-18 Audio processing method, device, equipment and storage medium Pending CN114724581A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210403461.8A CN114724581A (en) 2022-04-18 2022-04-18 Audio processing method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210403461.8A CN114724581A (en) 2022-04-18 2022-04-18 Audio processing method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114724581A true CN114724581A (en) 2022-07-08

Family

ID=82243084

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210403461.8A Pending CN114724581A (en) 2022-04-18 2022-04-18 Audio processing method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114724581A (en)

Similar Documents

Publication Publication Date Title
CN105190746B (en) Method and apparatus for detecting target keyword
CN111314335B (en) Data transmission method, device, terminal, storage medium and system
KR100944084B1 (en) Method and apparatus for decoding a coded digital audio signal which is arranged in frames containing headers
EP2140637B1 (en) Method of transmitting data in a communication system
KR20160005050A (en) Adaptive audio frame processing for keyword detection
CN110503944B (en) Method and device for training and using voice awakening model
CN110910885B (en) Voice wake-up method and device based on decoding network
KR101800710B1 (en) Decoding method and decoding device
CN111245734B (en) Audio data transmission method, device, processing equipment and storage medium
RU2705458C2 (en) Masking errors in frames
US11567728B2 (en) Dynamically preventing audio artifacts
WO2018120627A1 (en) Audio data processing method and apparatus
KR101002405B1 (en) Controlling a time-scaling of an audio signal
US20170105141A1 (en) Method for shortening a delay in real-time voice communication and electronic device
CN110782907A (en) Method, device and equipment for transmitting voice signal and readable storage medium
KR101411197B1 (en) Network jitter smoothing with reduced delay
US20030208359A1 (en) Method and apparatus for controlling buffering of audio stream
CN114724581A (en) Audio processing method, device, equipment and storage medium
WO2016173675A1 (en) Suitability score based on attribute scores
CN113658581B (en) Acoustic model training method, acoustic model processing method, acoustic model training device, acoustic model processing equipment and storage medium
KR101748039B1 (en) Sampling rate conversion method and system for efficient voice call
CN110855645B (en) Streaming media data playing method and device
CN113963680A (en) Audio playing method, device and equipment
JP4603429B2 (en) Client / server speech recognition method, speech recognition method in server computer, speech feature extraction / transmission method, system, apparatus, program, and recording medium using these methods
CN107087210A (en) The method and terminal of video broadcasting condition are judged based on cache-time

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination