WO2017121304A1 - 音频数据处理方法和终端 - Google Patents

音频数据处理方法和终端 Download PDF

Info

Publication number
WO2017121304A1
WO2017121304A1 PCT/CN2017/070692 CN2017070692W WO2017121304A1 WO 2017121304 A1 WO2017121304 A1 WO 2017121304A1 CN 2017070692 W CN2017070692 W CN 2017070692W WO 2017121304 A1 WO2017121304 A1 WO 2017121304A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio frame
frame
audio
sample point
point value
Prior art date
Application number
PCT/CN2017/070692
Other languages
English (en)
French (fr)
Inventor
杨将
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to JP2018529129A priority Critical patent/JP6765650B2/ja
Priority to EP17738118.3A priority patent/EP3404652B1/en
Priority to KR1020187016293A priority patent/KR102099029B1/ko
Priority to MYPI2018701827A priority patent/MY191125A/en
Publication of WO2017121304A1 publication Critical patent/WO2017121304A1/zh
Priority to US15/951,078 priority patent/US10194200B2/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/165Management of the audio stream, e.g. setting of volume, audio stream path
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/167Audio streaming, i.e. formatting and decoding of an encoded audio signal representation into a data stream for transmission or storage purposes
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10KSOUND-PRODUCING DEVICES; METHODS OR DEVICES FOR PROTECTING AGAINST, OR FOR DAMPING, NOISE OR OTHER ACOUSTIC WAVES IN GENERAL; ACOUSTICS NOT OTHERWISE PROVIDED FOR
    • G10K11/00Methods or devices for transmitting, conducting or directing sound in general; Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
    • G10K11/16Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/005Correction of errors induced by the transmission channel, if related to the coding algorithm
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0316Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S1/00Two-channel systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S1/00Two-channel systems
    • H04S1/007Two-channel systems in which the audio signals are in digital form

Definitions

  • the present application relates to the field of audio data processing technologies, and in particular, to an audio data processing method and terminal.
  • the application of audio data processing technology enables people to generate sound data through the sound collection of the pickup and store it, and play the stored audio data through the audio player to reproduce the sound when needed.
  • the wide application of audio data processing technology makes the recording and reproduction of sound very easy, and has an important impact on people's lives and work.
  • the audio data streams of the left and right channels are separated by one frame of audio by inserting one frame of audio data between two adjacent audio data streams of one of the left and right channels.
  • the audio data streams of the left and right channels are not synchronized, the problem that the audio data streams of the left and right channels are out of synchronization can also be alleviated by inserting audio data into one of the audio data streams.
  • audio data is inserted between adjacent two frames of audio data in the audio data stream, and generally one of the two frames of audio data is directly inserted, but after insertion, there is an obvious presence in the inserted audio data during playback. Noise needs to be overcome. Similarly, there is also noise in deleting one frame of audio data in the audio stream.
  • an audio data processing method and terminal are provided.
  • An audio data processing method includes:
  • sampling point values before the frame division position in the second audio frame and sampling point values after the frame division position in the first audio frame splicing in order to generate a third audio frame, and the third audio A frame is inserted between the first audio frame and the second audio frame.
  • An audio data processing method includes:
  • sampling point values before the frame division position in the first audio frame and sampling point values after the frame division position in the second audio frame splicing in order to generate a fourth audio frame, and the first audio The frame and the second audio frame are replaced together with the fourth audio frame.
  • a terminal comprising a memory and a processor, the memory storing computer readable instructions, wherein the computer readable instructions are executed by the processor such that the processor performs the following steps:
  • sampling point values before the frame division position in the second audio frame and sampling point values after the frame division position in the first audio frame splicing in order to generate a third audio frame, and A three audio frame is inserted between the first audio frame and the second audio frame.
  • a terminal comprising a memory and a processor, the memory storing computer readable instructions, wherein the computer readable instructions are executed by the processor such that the processor performs the following steps:
  • sampling point values before the frame division position in the first audio frame and sampling point values after the frame division position in the second audio frame splicing in order to generate a fourth audio frame, and the first audio The frame and the second audio frame are replaced together with the fourth audio frame.
  • FIG. 1 is a schematic structural diagram of a terminal for implementing an audio data processing method in an embodiment
  • FIG. 2 is a schematic flow chart of an audio data processing method in an embodiment
  • 3A is a schematic diagram of inserting an audio frame between adjacent first audio frames and second audio frames in one embodiment
  • FIG. 3B is a schematic diagram of deleting one frame among adjacent first audio frames and second audio frames in one embodiment
  • FIG. 4 is a partial sample point value distribution diagram of a first audio frame in an embodiment
  • 5 is a partial sample point value distribution diagram of a second audio frame in an embodiment
  • FIG. 6 is a partial sample point value distribution diagram in which an first audio frame and a second audio frame overlap in one embodiment
  • FIG. 7A is a schematic diagram of a process of dividing an audio frame, splicing an audio frame, and inserting an audio frame in one embodiment
  • 7B is a schematic diagram of a process of dividing an audio frame, splicing an audio frame, and replacing an audio frame in one embodiment
  • FIG. 8 is a schematic diagram of a process of retaining a copy and performing a playback process in one embodiment
  • FIG. 9 is a flow chart showing the steps of determining a frame division position in an embodiment
  • FIG. 10 is a schematic diagram of a first fitting curve of a first audio frame and a second fitting curve of a second audio frame in the same coordinate system in one embodiment
  • FIG. 11 is a schematic flow chart of an audio data processing method in another embodiment
  • FIG. 12 is a structural block diagram of a terminal in an embodiment
  • Figure 13 is a block diagram showing the structure of a terminal in another embodiment
  • Figure 14 is a block diagram showing the structure of the frame division position determining module of Figure 12 or Figure 13 in one embodiment.
  • a terminal 100 for implementing an audio data processing method including a processor connected through a system bus, a non-volatile storage medium, an internal memory, an input device, and audio. Output Interface.
  • the processor has a computing function and a function of controlling the operation of the terminal 100, the processor being configured to perform an audio data processing method.
  • the non-volatile storage medium includes at least one of a magnetic storage medium, an optical storage medium, and a flash storage medium, the non-volatile storage medium storing computer readable instructions executable by the processor The processor is caused to perform an audio data processing method.
  • the input device includes a physical button, a trackball, a touchpad, a physical interface for accessing an external control device, and a touch layer overlapping the display screen.
  • At least one of the external control devices such as a mouse or a multimedia remote control device.
  • the terminal 100 includes various electronic devices capable of audio data processing such as a desktop computer, a portable notebook computer, a mobile phone, a music player, and a smart watch.
  • an audio data processing method is provided. This embodiment is applied to the terminal 100 in FIG. 1 to illustrate.
  • the method specifically includes the following steps:
  • Step 202 Acquire an adjacent first audio frame and a second audio frame from the audio data stream, where the first audio frame precedes the second audio frame in time series.
  • the audio data stream includes a series of sample point values having timings obtained by sampling the original analog sound signal at a specific audio sample rate, and a series of sample point values can describe the sound.
  • the audio sampling rate is the number of sampling points collected in one second, in Hertz (Hz). The higher the audio sampling rate, the higher the frequency of the sound wave that can be described.
  • the audio frame includes a fixed number of sample point values with timing. According to the encoding format of the audio data stream, if the encoding format itself has an audio frame, it is directly used. If there is no audio frame but only a series of sampling point values with timing, the series of timing samples can be selected according to the preset frame length.
  • the audio frame is divided into point values.
  • the preset frame length refers to the number of sample point values included in a preset one frame of audio frame.
  • the first audio frame and the second audio frame obtained from the audio data stream are adjacent, and the first audio frame precedes the second audio frame in time series, that is, when the audio data stream is played back, first The audio frame is played first, and the second audio frame is played after the first audio frame is played.
  • the first audio frame and the second audio frame are two adjacent audio frames that need to be inserted between the two.
  • a piece of audio data stream includes a first audio frame A, a second audio frame B, ... arranged in time series, and when the audio frame needs to be inserted, the first audio frame A and the second audio frame are required. Audio frame F is inserted between B.
  • FIG. 3B when it is required to delete an audio frame, it is necessary to delete the sampling point value of one audio frame from the sampling point values of the two audio frames of the first audio frame A and the second audio B, and reserve one audio frame G.
  • Step 204 determining a frame segmentation position, a sample point value at a frame segmentation position in the first audio frame, and The sample point value at the frame division position in the second audio frame satisfies the distance approach condition.
  • the frame division position refers to a position at which the first audio frame and the second audio frame are divided, and is a relative position with respect to one audio frame.
  • the distance refers to the absolute value of the difference of the sample point value pairs at the corresponding positions in the two audio frames. For example, referring to the local sampling point value distribution map of the first audio frame A shown in FIG. 4 and the local sampling point value distribution map of the second audio frame B shown in FIG. 5, the first sampling of the first audio frame A The absolute value of the difference between the point value and the first sample point value of the second audio frame B is the first sample point value of the first audio frame A and the first sample point value of the second audio frame B. distance.
  • the distance approach condition refers to a quantization condition used to determine whether the distances of the two sample point values are close.
  • the distance proximity condition may include a case where the distance is equal to 0, and may also include a case where the distances of the two sample point values are not equal but close, such as the distance is less than or equal to the threshold, and the threshold may be preset or may be It is dynamically determined based on the sample point values in the first audio frame and/or the second audio frame, such as may be the average of the sample point values in the first audio frame and/or the second audio frame multiplied by a preset percentage.
  • the terminal may calculate the distance of each sample point value pair in the first audio frame and the second audio frame, thereby filtering out the sample point value pair with the smallest distance, and the frame segmentation position is the smallest distance selected by the screen.
  • the sample point value is corresponding to the position, and the distance proximity condition is that the distance of the pair of sample point values corresponding to the frame segmentation position in the first audio frame and the second audio frame is minimized.
  • the sample point value pair here refers to two sample point values at the same position in two audio frames, and the position of the sample point value is the relative position of the sample point value relative to the associated audio frame.
  • FIG. 4 and FIG. 5 are overlapped to obtain the overlapping partial sample point value distribution map shown in FIG. 6 to compare the local sample point value distribution of the audio frame B of the audio frame A.
  • the frame division position is S
  • the absolute value of the difference between the sample point value at S in the audio frame A and the sample point value at S in the audio frame B is very close or even equal, that is, the sample at S in the audio frame A.
  • the point value satisfies the distance close condition with the sample point value at S in the audio frame B.
  • Step 206 Acquire a sample point value before the frame division position in the second audio frame and sample point values after the frame division position in the first audio frame, and splicing in order to generate a third audio frame, and third The audio frame is inserted between the first audio frame and the second audio frame; or, the sample point value before the frame segmentation position in the first audio frame and the sample point value after the frame segmentation position in the second audio frame are acquired, and are sequentially stitched together A fourth audio frame is generated and the first audio frame and the second audio frame are replaced together with a fourth audio frame.
  • the sample point value before the frame splitting position in the second audio frame is obtained, and the sample point value after the frame split position in the first audio frame is obtained, and the total number of sample points obtained is exactly equal to One audio frame length.
  • the sample point value from the second audio frame is preceded, and the sample point values from the first audio frame are spliced sequentially in order to generate a third audio frame.
  • the sample point values from the second audio frame remain in the order in which they are located in the second audio frame, and the sample point values from the first audio frame remain in the order in which they were located in the first audio frame.
  • the generated third audio frame is inserted between the first audio frame and the second audio frame.
  • the first audio frame A is divided into a front portion and a rear portion according to the frame division position S
  • the second audio frame B is also divided into a front portion and a rear portion according to the frame division position S.
  • the former part refers to the sample point value before the frame division position S
  • the latter part is the sample point value after the frame division position.
  • the sample point value before the frame segmentation position in the first audio frame is obtained, and the sample point value after the frame segmentation position in the second audio frame is obtained, and the total number of sampled point values obtained is exactly equal to one audio frame. length.
  • the sample point values from the first audio frame are forwarded, and the sample point values from the second audio frame are sequentially spliced in order to obtain a fourth audio frame.
  • the sample point values from the first audio frame remain in the order in which they are located in the first audio frame, and the sample point values from the second audio frame remain in the order in the second audio frame.
  • the first audio frame and the second audio frame are replaced with the generated fourth audio frame.
  • the first audio frame D is divided into a front portion and a rear portion according to the frame division position S
  • the second audio frame E is also divided into a front portion and a rear portion according to the frame division position S.
  • the former part refers to the sample point value before the frame division position S
  • the latter part is the frame division position.
  • Sample point value Splicing the front portion of the first audio frame A with the rear portion of the second audio frame B to obtain a fourth audio frame G, and then replacing the first audio frame A with the second audio frame G obtained by stitching Audio frame B.
  • the part before the frame division position of the second audio frame is spliced with the portion after the frame division position of the first audio frame to obtain a third audio frame, and the first audio is inserted.
  • the front portion of the third audio frame is the front portion of the second audio frame
  • the rear portion of the third audio frame is the rear portion of the first audio frame. Since the first audio frame and the second audio frame are themselves seamlessly connected, the first audio frame can be seamlessly connected to the front portion of the third audio frame, and the rear portion of the third audio frame is seamlessly connected to the second audio frame.
  • the third audio frame satisfies the distance close condition at the frame division position, so that the splicing does not cause too much abrupt change, and thus the noise problem caused by the jump between the audio frames when the audio frame is inserted can be substantially overcome.
  • the fourth audio frame is obtained by splicing the previous part of the frame division position of the first audio frame with the part after the frame division position of the second audio frame, and replacing the first audio frame and the second audio. frame.
  • the front portion of the fourth audio frame is the front portion of the first audio frame and the rear portion of the fourth audio frame is the rear portion of the second audio frame.
  • the replaced fourth audio frame can be seamlessly connected with the previous audio frame of the first audio frame, and The latter audio frame of the second audio frame is seamlessly connected, and the fourth audio frame satisfies the distance close condition at the frame division position, so that the splicing does not cause too much mutation, so the audio can be basically overcome when the audio frame is deleted. Noise problems caused by jumps between frames.
  • the audio data processing method further includes retaining a copy of the sample point value of the at least one audio frame length when the audio data stream is subjected to real-time playback processing.
  • step 202 includes: when detecting the instruction for inserting the audio frame, obtaining the first audio frame according to the copy retained before the sampling point value currently being played, and according to the sampling point value currently being played back A sample point value of one audio frame length obtains a second audio frame.
  • the playback process refers to a process of restoring a sound signal according to the sample point value, and retaining at least one A copy of the sample point value of the audio frame length, that is, a copy of at least one audio frame.
  • the terminal when the terminal performs a playback process on a sample point value A1, the terminal retains the copy A1' of the sample point value A1, and a copy of the sample point value that has been played before the sample point value A1 is also Reserved, the total length of the retained copy is at least one audio frame length.
  • the terminal After passing the length of one audio frame, the terminal is performing playback processing on the sample point value B1. At this time, the copy B1' of the sample point value B1 is also retained, and the reserved copy at this time includes at least the copy A' of the audio frame A. Assuming that the terminal detects an instruction to insert an audio frame at this time, the terminal will use the copy of the sample point value of the length of the audio frame between the copy A1' and the sample point value B1 currently being played back as the first audio. Frame A, and the audio frame B of one audio frame length after the sample point value B1 is taken as the second audio frame.
  • the response can be immediately responded when an instruction for inserting the audio frame is detected, and there is no need to wait for an audio frame length. The time increases the efficiency of inserting audio frames.
  • step 204 specifically includes the following steps:
  • Step 902 Acquire candidate positions, where the sample point value at the candidate position in the first audio frame and the sample point value at the corresponding candidate position in the second audio frame satisfy the distance proximity condition.
  • the candidate position is a position in the selected audio frame that can be used as a frame division position, and the specific terminal can traverse all the positions in the audio frame, and when traversing to each position, determine the first audio frame and the second audio frame. Whether the pair of sample points at the corresponding position in the middle meets the distance proximity condition. If the distance close condition is satisfied, the traversed position is added to the candidate position set, and the traversal is continued; if the distance close condition is not satisfied, the traversal is continued. If the set of candidate locations is still not empty after the traversal, the preset location (such as the middle position of the audio frame) or the location with the smallest distance of the pair of sampled values may be selected to be added to the set of candidate locations.
  • the distance approach condition refers to a quantization condition used to determine whether the distances of the two sample point values are close.
  • the distance proximity condition may include a case where the distance is equal to 0, and may also include a case where the distances of the two sample point values are not equal but close, such as the distance is less than or equal to the threshold, and the threshold may be preset or may be Is based on the sampling point value in the first audio frame and/or the second audio frame State is determined.
  • the terminal may calculate the distance of each pair of sample point values in the first audio frame and the second audio frame and sort the data in ascending order, thereby adding the position corresponding to the preset number of distances before the sort to the candidate position set.
  • the location corresponding to the distance of the preset ratio in all the calculated distances may be obtained from the minimum distance of the sorted distances, and the distance proximity condition is the first audio frame and the second
  • the distance of the pair of sample point values corresponding to the candidate position in the audio frame is the preset number of distances after all the calculated distances are sorted in ascending order, or all the calculated distances are sorted in ascending order and then the top is calculated. The distance of the preset distance in the distance.
  • the distance proximity condition is that a product of the first difference value and the second difference value is less than or equal to 0; wherein the first difference value is a sample point value at the candidate position in the first audio frame and the second audio frame The difference of the sample point values at the corresponding candidate positions; the second difference is the difference between the sample point value of the next position of the candidate position in the first audio frame and the sample point value at the corresponding position in the second audio frame.
  • the distance close condition can be used as follows.
  • Formula (1) means:
  • i denotes a candidate position in the first audio frame A and the second audio frame B, which may be referred to as a sample point value sequence number, where m is an audio frame length;
  • (a i -b i ) is a first difference value, indicating a frame of an audio sampling point candidate value at position i a i and a second audio frame B corresponding sample point candidate at the position i of the difference value b i;
  • (a i + 1 -b i + 1) for the first two difference indicating that the next position of the first audio frame in the candidate position i a i + 1 of the sampling point values of a i + 1 and a second position corresponding audio frame B i + 1 at the sampling point values B i + 1
  • the difference of formula; (1) represents that the product of the first difference (a i -b i ) and the second difference (a i+1 -b i+1 ) is less than or equal to zero.
  • the distance approach condition represented by the above formula (1) is to find the intersection of the first fitting curve composed of the sampling point values of the first audio frame and the second fitting curve composed of the sampling point values in the second audio frame. It is also possible to determine the intersection by other means of finding the intersection of the two curves. If the intersection point is exactly the position of a sample point value, the position is added to the candidate position set; if the intersection point is not the position of any sample point value, the position closest to the intersection point among all the positions of the audio frame may be added to the candidate position. set. For example, if the first fitting curve and the second fitting curve in FIG. 10 have an intersection X, the two positions S1 or S2 closest to the intersection X may be added to the candidate position set. Other ways to find the intersection of two curves, for example, first obtain the mathematical expression of the two fitting curves, and then directly calculate the intersection by function calculation. The distance represented by the above formula (1) is more efficient than the condition.
  • Step 904 Acquire a distance sum of each sample point value pair in a range of discrete positions of a preset length of the coverage candidate position in the first audio frame and the second audio frame.
  • the discrete location range covering the preset length of the candidate location includes a candidate location, and the discrete location set includes a fixed number of discrete locations, that is, a preset length.
  • a certain number of discrete positions may be selected in the same position before and after the candidate position to form a discrete position range together with the candidate position, or the discrete position may be selected together with the candidate position before and after the candidate position to form a discrete position range.
  • the respective positions in the set of discrete positions may preferably be sequentially adjacent. Of course, the discrete positions may be selected at intervals to form a discrete position range together with the candidate positions.
  • the terminal may select a candidate location from the set of candidate locations one by one, and acquire a distance of each sample point value pair in a discrete position range covering a preset length of the selected candidate location in the first audio frame and the second audio frame. .
  • the following formula (2) may be employed to obtain a distance sum of pairs of sample point values in a range of discrete positions of a preset length of the coverage candidate position in the first audio frame and the second audio frame:
  • N may take [1, (m-1)/2], preferably [2, (m-1) / 100], preferably 5; candidate position is n + N,
  • the discrete position range is N positions with the candidate position n+N centered on the left and right sides and the candidate position n+N constitute a discrete position range [n,...,n+N,...2N with a preset length of 2N+1.
  • Step 906 determining the minimum distance and the corresponding candidate position as the frame division position.
  • the distance and after each of the candidate locations in the set of candidate locations can be separately calculated to find the smallest distance and the corresponding candidate location as the frame segmentation location. Specifically, it can be expressed as the following formula (3):
  • the determined frame segmentation position also satisfies the distance proximity condition: the product of the first difference value and the second difference value is less than or equal to 0; wherein the first difference value is the sample point value at the frame segmentation position in the first audio frame and the second audio a difference of sample point values at a corresponding frame division position in the frame; the second difference is a sample point value of a next position of the frame division position in the first audio frame and a sample point value at a corresponding position in the second audio frame Difference.
  • the frame division position found by the above steps 904 to 906 is obtained by finding the candidate position at the intersection which is the most similar to the intersection of the first fitting curve and the second fitting curve as the frame division position.
  • the above step 904 is a specific step of acquiring the local similarity at the corresponding candidate position in the first audio frame and the second audio frame
  • step 906 is a specific step of determining the frame division position according to the local similarity.
  • the local similarity at the candidate position refers to the degree to which the first fitting curve and the second fitting curve are similar in a fixed range near the candidate position, and the smaller the local similarity calculated by the above formula (2), the more similar. If the first fitted curve and the second fitted curve are similar in the vicinity of the candidate positions, the corresponding two curves have more similar slopes, and the third audio frame transition obtained after the splitting is more gentle, and the noise suppression effect is more it is good.
  • Local similarity can also be obtained by calculating the cross-correlation by the cross-correlation function.
  • the cross-correlation function can also indicate the similarity degree of the two signals, if applied to this scheme, when calculating the cross-correlation degree of a small number of points, the two separate large-sampling point values may obtain a larger value. The degree of cross-correlation, which indicates that the two curves are similar, but not the optimal frame segmentation position.
  • the local similarity obtained by the above formula (2) overcomes the shortcomings of calculating the cross-correlation degree by using the cross-correlation function.
  • the sampling point value of each position in the formula (2) plays a relatively balanced role in calculating the cross-correlation degree.
  • the absolute value of the difference is used as the action value of the sampling point value of one position, which can well describe the slope difference before and after the intersection point.
  • the most suitable candidate position can be found as the frame division position.
  • the audio data processing method further includes: performing acquisition of the second audio frame for the adjacent first audio frame and the second audio frame acquired from the audio data stream of the specified channel when the sound effect is turned on
  • the sample point value before the frame division position and the sample point value after the frame division position in the first audio frame are sequentially spliced to generate a third audio frame, and the third audio frame is inserted into the first audio frame and the second audio frame.
  • the steps are performed, and the inserted third audio frame is fade-in processed, so that the inserted third audio frame gradually transitions from the no-sound state to the complete sound state in time series.
  • the steps of inserting the audio frame into the first half of step 202, step 204, and step 206 are performed on the audio data stream of the specified channel.
  • the command to turn on the sound is an instruction for inserting an audio frame.
  • the sound effect that is turned on is based on the asynchronous sound of the channel.
  • the audio data stream of the specified channel is compared with the remaining sounds.
  • the track is delayed by one audio frame to achieve a surround sound effect due to the time when the source arrives at the ear of the person differing by one audio frame.
  • the no-sound state refers to the state before the sound effect is turned on, and the complete sound state is the state after the sound effect is turned on, and the third audio frame is fade-in processed, so that the inserted third audio frame follows the timing of the sampled point value, and the timing is never
  • the sound state gradually transitions to a full sound state, thereby achieving a smooth transition of the sound effect. For example, if the volume needs to be increased by 5 times in the full sound state, the volume multiple can be gradually increased until the maximum of 5 times is seamlessly connected with the second audio frame in the complete sound state.
  • a gradual transition can be a linear transition or a curved transition.
  • the steps of replacing the audio frame in the second half of step 202, step 204 and step 206 may be performed on the audio data stream of the specified channel, and the fourth audio frame replaced is fade-out processed.
  • the fourth audio frame replaced with a gradual transition from a full sound state to a no sound state in time series.
  • the fade-out process as opposed to the fade-in process, is a process of gradually eliminating the effects of sound effects.
  • one frame of audio frames is deleted, so that the designated channels are restored to a state synchronized with other channels. It can quickly turn on and/or off channel-based asynchronous sound, improving the efficiency of switching sound effects.
  • the sample point value before the frame segmentation position in the first audio frame and the sample point value after the frame segmentation position in the second audio frame may be performed on the specified channel, and a step of sequentially stitching to generate a fourth audio frame, and replacing the first audio frame and the second audio frame with the fourth audio frame, and performing a fade-in process on the replaced fourth audio frame to replace the fourth audio frame
  • the audio frame gradually transitions from a no-sound state to a full-sound state in time series.
  • step 202, step 204, and the first half of step 206 may be performed on the specified channel: acquiring the sample point value before the frame splitting position in the second audio frame and the first audio frame.
  • the sample point values after the frame division position are spliced in order to generate a third audio frame, and the third audio frame is inserted between the first audio frame and the second audio frame.
  • the inserted third audio frame is fade-out processed, so that the inserted third audio frame gradually transitions from the complete sound state to the no-sound state in time series.
  • the sound effect based on the channel asynchronous can be quickly turned on and/or off, and the efficiency of switching the sound effect is improved.
  • an audio data processing method includes the following steps:
  • Step 1102 When the sound effect is turned on, the adjacent first audio frame and the second audio frame are acquired from the audio data stream of the specified channel, and the first audio frame precedes the second audio frame in time series.
  • Step 1104 Acquire a first candidate location, where the sample point value at the first candidate position in the first audio frame and the sample point value at the corresponding first candidate position in the second audio frame satisfy a distance approach condition.
  • the distance proximity condition may be that the product of the first difference value and the second difference value is less than or equal to zero.
  • the first difference is a difference between the sample point value at the candidate position in the first audio frame and the sample point value at the corresponding candidate position in the second audio frame.
  • the second difference is the difference between the sample point value of the next position of the candidate position in the first audio frame and the sample point value at the corresponding position in the second audio frame.
  • Step 1106 Acquire a distance sum of each sample point value pair in a range of discrete positions covering a preset length of the first candidate position in the first audio frame and the second audio frame.
  • Step 1108 Determine the minimum distance and the corresponding first candidate position as the first frame division position.
  • Step 1110 Acquire a sample point value before the frame division position in the second audio frame and sample point values after the frame division position in the first audio frame, and splicing in order to generate a third audio frame.
  • Step 1112 Insert a third audio frame between the first audio frame and the second audio frame.
  • Step 1114 Perform a fade-in process on the inserted third audio frame, so that the inserted third audio frame gradually transitions from the no-sound state to the full-sound state in time series.
  • Step 1116 when the sound effect is turned off, the adjacent fifth audio frame and the sixth audio frame are acquired from the audio data stream of the specified channel, and the fifth audio frame precedes the sixth audio frame in time series.
  • the fifth audio frame is equivalent to the first audio frame used to generate the fourth audio frame in step 206 of the embodiment shown in FIG. 2, and the sixth audio frame is equivalent to the step 206 in the embodiment shown in FIG. 2.
  • Step 1118 Acquire a second candidate position, where the sample point value at the second candidate position in the fifth audio frame and the sample point value at the corresponding second candidate position in the sixth audio frame satisfy the distance close condition.
  • the distance proximity condition may be that the product of the first difference value and the second difference value is less than or equal to zero.
  • the first difference is a difference between the sample point value at the candidate position in the fifth audio frame and the sample point value at the corresponding candidate position in the sixth audio frame.
  • the second difference is a difference between a sample point value of a next position of the candidate position in the fifth audio frame and a sample point value at a corresponding position in the sixth audio frame.
  • Step 1120 Acquire a distance sum of each sample point value pair in a range of discrete positions covering a preset length of the second candidate position in the fifth audio frame and the sixth audio frame.
  • Step 1122 Determine the minimum distance and the corresponding second candidate position as the second frame division position.
  • Step 1124 Acquire a sample point value before the second frame split position in the fifth audio frame and sample point values after the second frame split position in the sixth audio frame, and splicing in order to generate a fourth audio frame.
  • Step 1126 the fifth audio frame and the sixth audio frame are replaced together with the fourth audio frame.
  • Step 1128 Perform a fade-out process on the replaced fourth audio frame, so that the replaced fourth audio frame gradually transitions from the full sound state to the no-sound state in time series.
  • the part before the frame division position of the second audio frame is spliced with the portion after the frame division position of the first audio frame to obtain a third audio frame, and the first audio is inserted.
  • the front portion of the third audio frame is the front portion of the second audio frame
  • the rear portion of the third audio frame is the rear portion of the first audio frame. Since the first audio frame and the second audio frame are themselves seamlessly connected, the first audio frame can be seamlessly connected to the front portion of the third audio frame, and the rear portion of the third audio frame is seamlessly connected to the second audio frame.
  • the third audio frame satisfies the distance close condition at the frame division position, so that the splicing does not cause too much abrupt change, so that the noise problem due to the jump between the audio frames when the audio frame is inserted can be substantially overcome.
  • the fourth audio frame is obtained by splicing the previous part of the frame division position of the first audio frame with the part after the frame division position of the second audio frame, and replacing the first audio frame and the second audio. frame.
  • the front portion of the fourth audio frame is the front portion of the first audio frame and the rear portion of the fourth audio frame is the rear portion of the second audio frame.
  • the replaced fourth audio frame can be seamlessly connected with the previous audio frame of the first audio frame, and The latter audio frame of the second audio frame is seamlessly connected, and the fourth audio frame satisfies the distance close condition at the frame division position, so that the splicing does not cause too much mutation, so the audio can be basically overcome when the audio frame is deleted. Noise problems caused by jumps between frames.
  • the present application also provides a terminal.
  • the internal structure of the terminal may correspond to the structure shown in FIG. 1.
  • Each of the following modules may be implemented in whole or in part by software, hardware, or a combination thereof.
  • the terminal 1200 includes an audio frame acquisition module 1201 and a frame division location determining module 1202, and further includes at least one of an audio frame insertion module 1203 and an audio frame replacement module 1204.
  • the audio frame obtaining module 1201 is configured to obtain an adjacent first audio frame and a second audio frame from the audio data stream, where the first audio frame precedes the second audio frame in time series.
  • the audio data stream includes a series of sample point values having timings obtained by sampling the original analog sound signal at a specific audio sample rate, and a series of sample point values can describe the sound.
  • the audio sampling rate is the number of sampling points collected in one second, in Hertz. The higher the audio sampling rate, the higher the frequency of the sound wave that can be described.
  • the audio frame includes a fixed number of sample point values with timing. According to the encoding format of the audio data stream, if the encoding format itself has an audio frame, it is directly used. If there is no audio frame but only a series of sampling point values with timing, the series of timing samples can be selected according to the preset frame length.
  • the audio frame is divided into point values.
  • the preset frame length refers to the number of sample point values included in a preset one frame of audio frame.
  • the first audio frame and the second audio frame acquired by the audio frame obtaining module 1201 from the audio data stream are adjacent, and the first audio frame is temporally preceded by the second audio frame, that is, the audio data stream is played.
  • the first audio frame is played first
  • the second audio frame is played after the first audio frame is played.
  • the first audio frame and the second audio frame are two adjacent audio frames that need to be inserted between the two.
  • the frame segmentation position determining module 1202 is configured to determine a frame segmentation position, where the sample point value at the frame segmentation position in the first audio frame and the sample point value at the frame segmentation position in the second audio frame satisfy the distance approach condition.
  • the frame division position refers to a position at which the first audio frame and the second audio frame are divided, and is a relative position with respect to one audio frame.
  • the distance refers to the absolute value of the difference of the sample point value pairs at the corresponding positions in the two audio frames. For example, referring to the local sampling point value distribution map of the first audio frame A shown in FIG. 4 and the local sampling point value distribution map of the second audio frame B shown in FIG. 5, the first sampling of the first audio frame A The absolute value of the difference between the point value and the first sample point value of the second audio frame B is the first sample point value of the first audio frame A and the first sample point value of the second audio frame B. distance.
  • the distance approach condition refers to a quantization condition used to determine whether the distances of the two sample point values are close.
  • the distance proximity condition may include a case where the distance is equal to 0, and may also include a case where the distances of the two sample point values are not equal but close, such as the distance is less than or equal to the threshold, and the threshold may be preset or may be It is dynamically determined based on the sample point values in the first audio frame and/or the second audio frame, such as may be the average of the sample point values in the first audio frame and/or the second audio frame multiplied by a preset percentage.
  • the frame splitting position determining module 1202 can calculate the distance of each sample point value pair in the first audio frame and the second audio frame, thereby filtering out the sample point value pair with the smallest distance, and the frame splitting position is screening.
  • the distance from the sample point value with the smallest distance is the corresponding position.
  • the distance close condition is that the distance of the pair of sample point values corresponding to the frame division position in the first audio frame and the second audio frame is minimized.
  • the sample point value pair here refers to two sample point values at the same position in two audio frames, and the position of the sample point value is the relative position of the sample point value relative to the associated audio frame.
  • the audio frame insertion module 1203 is configured to acquire sample point values before the frame segmentation position in the second audio frame and sample point values after the frame segmentation position in the first audio frame, and sequentially splicing to generate a third audio frame, and A three audio frame is inserted between the first audio frame and the second audio frame.
  • the audio frame insertion module 1203 acquires the sampling point value before the frame division position in the second audio frame, and acquires the sampling point value after the frame division position in the first audio frame, and acquires the sampling point.
  • the total number of values is exactly equal to the length of one audio frame.
  • the sample point value from the second audio frame is preceded, and the sample point values from the first audio frame are spliced sequentially in order to generate a third audio frame.
  • the sample point values from the second audio frame remain in the order in which they are located in the second audio frame, and the sample point values from the first audio frame remain in the order in which they were located in the first audio frame.
  • the generated third audio frame is inserted between the first audio frame and the second audio frame.
  • the audio frame replacement module 1204 is configured to obtain sample point values before the frame segmentation position in the first audio frame and sample point values after the frame segmentation position in the second audio frame, and sequentially splicing to generate a fourth audio frame, and An audio frame and a second audio frame are replaced with a fourth audio frame.
  • the audio frame replacement module 1204 obtains the sampling point value before the frame division position in the first audio frame, and acquires the sampling point value after the frame division position in the second audio frame, and the total number of the sampled point values obtained. It is exactly equal to the length of one audio frame.
  • the sample point values from the first audio frame are forwarded, and the sample point values from the second audio frame are sequentially spliced in order to obtain a fourth audio frame.
  • the sample point values from the first audio frame remain in the order in which they are located in the first audio frame, and the sample point values from the second audio frame remain in the order in the second audio frame. Finally, the first audio frame and the second audio frame are replaced with the generated fourth audio frame.
  • the terminal 1200 when the audio frame needs to be inserted, splicing the portion before the frame division position of the second audio frame with the portion after the frame division position of the first audio frame to obtain a third audio frame, inserting the first audio frame and Between the second audio frames.
  • the front portion of the third audio frame is the front portion of the second audio frame
  • the rear portion of the third audio frame is the rear portion of the first audio frame. Since the first audio frame and the second audio frame are themselves seamlessly connected, the first audio frame can be seamlessly connected to the front portion of the third audio frame, and the rear portion of the third audio frame is seamlessly connected to the second audio frame.
  • the third audio frame satisfies the distance proximity condition at the frame division position, so that the splicing portion does not cause too much The change can therefore substantially overcome the noise problem caused by the jump between audio frames when inserting an audio frame.
  • the fourth audio frame is obtained by splicing the previous part of the frame division position of the first audio frame with the part after the frame division position of the second audio frame, and replacing the first audio frame and the second audio. frame.
  • the front portion of the fourth audio frame is the front portion of the first audio frame and the rear portion of the fourth audio frame is the rear portion of the second audio frame.
  • the replaced fourth audio frame can be seamlessly connected with the previous audio frame of the first audio frame, and The latter audio frame of the second audio frame is seamlessly connected, and the fourth audio frame satisfies the distance close condition at the frame division position, so that the splicing does not cause too much mutation, so the audio can be basically overcome when the audio frame is deleted. Noise problems caused by jumps between frames.
  • the terminal 1200 further includes: a copy retention module 1205, configured to reserve a copy of the sample point value of at least one audio frame length when performing real-time playback processing on the audio data stream.
  • a copy retention module 1205 configured to reserve a copy of the sample point value of at least one audio frame length when performing real-time playback processing on the audio data stream.
  • the audio frame obtaining module 1201 is further configured to: when detecting an instruction for inserting an audio frame, obtain a first audio frame according to a copy that is reserved before a sampling point value currently being played, and according to a sampling point that is currently performing playback processing A sample point value of one audio frame length after the value obtains a second audio frame.
  • the playback process refers to a process of restoring a sound signal according to the sample point value, and retaining a copy of the sample point value of at least one audio frame length, that is, retaining a copy of at least one audio frame.
  • the copy retention module 1205 when performing playback processing on a sample point value A1, the copy retention module 1205 retains the copy A1' of the sample point value A1, and performs the sample point value of the playback process before the sample point value A1.
  • the copy is also retained, and the total length of the retained copy is at least one audio frame length.
  • the copy retention module 1205 also retains the copy B1' of the sample point value B1, and the reserved copy at this time includes at least the copy A of the audio frame A. '.
  • the audio frame acquisition module 1201 will copy the copy A1' to the current playback process.
  • a copy of the sample point value of the one audio frame length between the sample values B1 is taken as the first audio frame A, and the audio frame B of one audio frame length after the sample point value B1 is taken as the second audio frame.
  • the response can be immediately responded when an instruction for inserting the audio frame is detected, and there is no need to wait for an audio frame length. The time increases the efficiency of inserting audio frames.
  • the frame splitting location determining module 1202 includes a candidate location obtaining module 1202a, a similarity metric module 1202b, and a determining module 1202c.
  • the candidate location obtaining module 1202a is configured to acquire a candidate location, where the sampling point value at the candidate location in the first audio frame and the sampling point value at the corresponding candidate location in the second audio frame satisfy a distance proximity condition.
  • the similarity metric module 1202b is configured to obtain local similarities at the corresponding candidate locations in the first audio frame and the second audio frame.
  • the determining module 1202c is configured to determine a frame splitting position according to the local similarity.
  • the candidate location obtaining module 1202a is configured to acquire a candidate location, where the sampling point value at the candidate location in the first audio frame and the sampling point value at the corresponding candidate location in the second audio frame satisfy the distance proximity condition.
  • the candidate position is a position in the selected audio frame that can be used as a frame division position, and the position is discrete, and each sample point value corresponds to a discrete position.
  • the specific candidate location obtaining module 1202a may traverse all the locations in the audio frame, and when traversing to each location, determine whether the pair of sampling point values in the first audio frame and the corresponding position in the second audio frame satisfy the distance proximity condition. If the distance proximity condition is satisfied, the candidate location acquisition module 1202a adds the traversed location to the candidate location set and continues the traversal; if the distance proximity condition is not met, the traversal continues.
  • the candidate location obtaining module 1202a may select a preset location (such as an intermediate location of the audio frame) or a location where the distance of the pair of sampled values is the smallest to be added to the set of candidate locations.
  • the distance approach condition refers to a quantization condition used to determine whether the distances of the two sample point values are close.
  • the distance proximity condition may include a case where the distance is equal to 0, and may also include a case where the distances of the two sample point values are not equal but close, such as the distance is less than or equal to the threshold, and the threshold may be preset or may be It is dynamically determined based on sample point values in the first audio frame and/or the second audio frame.
  • the candidate location acquisition module 1202a may calculate the distance of each sample point value pair in the first audio frame and the second audio frame and sort the data in ascending order, thereby ranking the position corresponding to the preset predetermined number of distances.
  • the distance proximity condition is that the distance of the sample point value pairs corresponding to the candidate positions in the first audio frame and the second audio frame is a preset number of all the calculated distances sorted in ascending order. distance.
  • the position corresponding to the preset distance of all the calculated distances may be obtained from the minimum distance of the sorted distances, and the distance proximity condition is the first audio frame and the second audio frame.
  • the distance of the pair of sample point values corresponding to the candidate position is the distance from the top of all the calculated distances after all the calculated distances are sorted in ascending order.
  • the distance proximity condition is that a product of the first difference value and the second difference value is less than or equal to 0; wherein the first difference value is a sample point value at the candidate position in the first audio frame and the second audio frame The difference of the sample point values at the corresponding candidate positions; the second difference is the difference between the sample point value of the next position of the candidate position in the first audio frame and the sample point value at the corresponding position in the second audio frame.
  • the distance close condition can be used as follows.
  • Formula (1) means:
  • i denotes a candidate position in the first audio frame A and the second audio frame B, which may be referred to as a sample point value sequence number, where m is an audio frame length;
  • (a i -b i ) is a first difference value, indicating a frame of an audio sampling point candidate value at position i a i and a second audio frame B corresponding sample point candidate at the position i of the difference value b i;
  • (a i + 1 -b i + 1) for the first two difference indicating that the next position of the first audio frame in the candidate position i a i + 1 of the sampling point values of a i + 1 and a second position corresponding audio frame B i + 1 at the sampling point values B i + 1
  • the difference of formula; (1) represents that the product of the first difference (a i -b i ) and the second difference (a i+1 -b i+1 ) is less than or equal to zero.
  • the distance approach condition represented by the above formula (1) is to find the intersection of the first fitting curve composed of the sampling point values of the first audio frame and the second fitting curve composed of the sampling point values in the second audio frame. It is also possible to determine the intersection by other means of finding the intersection of the two curves. If the intersection is exactly the location of a sample point value, the location is added to the candidate location set; if the intersection is not any sample The position of the point value can be added to the candidate location set among the positions of all the positions of the audio frame closest to the intersection. For example, if the first fitting curve and the second fitting curve in FIG. 10 have an intersection X, the two positions S1 or S2 closest to the intersection X may be added to the candidate position set. Other ways to find the intersection of two curves, for example, first obtain the mathematical expression of the two fitting curves, and then directly calculate the intersection by function calculation. The distance represented by the above formula (1) is more efficient than the condition.
  • the similarity measurement module 1202b is configured to obtain a distance sum of each sample point value pair in a range of discrete positions of the preset length of the coverage candidate position in the first audio frame and the second audio frame.
  • the discrete location range covering the preset length of the candidate location includes a candidate location, the discrete location set includes a fixed number of discrete locations, that is, a preset length, and the locations in the location set are sequentially adjacent.
  • the similarity metric module 1202b may specifically select candidate locations from the set of candidate locations one by one, and acquire pairs of sampled point values in the first audio frame and the second audio frame within a discrete location range covering a preset length of the selected candidate location. The distance and.
  • the similarity metric module 1202b may employ the following formula (2) to obtain the distance of each sample point value pair within a discrete position range of the preset length of the coverage candidate position in the first audio frame and the second audio frame. with:
  • N may take [1, (m-1)/2], preferably [2, (m-1) / 100], preferably 5; candidate position is n + N,
  • the discrete position range is N positions with the candidate position n+N centered on the left and right sides and the candidate position n+N constitute a discrete position range [n,...,n+N,...2N with a preset length of 2N+1.
  • the determining module 1202c is configured to determine the minimum distance and the corresponding candidate position as the frame splitting position.
  • the similarity metric module 1202b is configured to obtain local similarity at the corresponding candidate position in the first audio frame and the second audio frame, and the determining module 1202c is configured to determine the frame splitting bit according to the local similarity. Set.
  • the distance and the subsequent candidate distances may be respectively calculated for all the candidate locations in the candidate location set, and the minimum distance and the corresponding candidate location are found as frames.
  • Split position Specifically, it can be expressed as the following formula (3):
  • the determined frame segmentation position also satisfies the distance proximity condition: the product of the first difference value and the second difference value is less than or equal to 0; wherein the first difference value is the sample point value at the frame segmentation position in the first audio frame and the second audio a difference of sample point values at a corresponding frame division position in the frame; the second difference is a sample point value of a next position of the frame division position in the first audio frame and a sample point value at a corresponding position in the second audio frame Difference.
  • the candidate position at the intersection which is the most similar near the intersection of the first fitting curve and the second fitting curve is found as the frame division position.
  • the local similarity at the candidate position refers to the degree to which the first fitting curve and the second fitting curve are similar in a fixed range near the candidate position, and the smaller the local similarity calculated by the above formula (2), the more similar. If the first fitted curve and the second fitted curve are similar in the vicinity of the candidate positions, the corresponding two curves have more similar slopes, and the third audio frame transition obtained after the splitting is more gentle, and the noise suppression effect is more it is good.
  • the local similarity can also be obtained by calculating the cross-correlation degree by the cross-correlation function.
  • the cross-correlation function can also express the similarity degree of the two signals, if applied to this scheme, when calculating the cross-correlation degree of a small number of points, separate The two large large sample points in the same direction may obtain a large cross-correlation, indicating that the two curves are more similar, but not the optimal frame segmentation position.
  • the local similarity obtained by the above formula (2) overcomes the shortcomings of calculating the cross-correlation degree by using the cross-correlation function.
  • the sampling point value of each position in the formula (2) plays a relatively balanced role in calculating the cross-correlation degree.
  • the absolute value of the difference as the action value of the sampling point value of one position, the slope difference before and after the intersection point can be well described, and the most suitable candidate position can be found as the frame segmentation position.
  • the audio frame insertion module 1203 is further configured to acquire the second audio for the adjacent first audio frame and the second audio frame acquired from the audio data stream of the specified channel when the sound effect is turned on.
  • the sample point value before the frame division position in the frame and the sample point value after the frame division position in the first audio frame are sequentially spliced to generate a third audio frame, and the third audio frame is inserted into the first audio frame and the second audio.
  • the inserted third audio frame is fade-in processed, so that the inserted third audio frame gradually transitions from the no-sound state to the complete sound state in time series.
  • the audio frame replacement module 1204 is further configured to: when the sound effect is turned off, obtain the sample point value before the frame segmentation position in the first audio frame and the sample point value after the frame segmentation position in the second audio frame, and sequentially splicing Generating a fourth audio frame, and replacing the first audio frame and the second audio frame with the fourth audio frame, and performing a fade-out process on the replaced fourth audio frame, and replacing the fourth audio frame with the timing Gradually transition from a full sound state to a no sound effect.
  • the audio frame replacement module 1204 is further configured to acquire the first one of the first audio frame and the second audio frame that are acquired from the audio data stream of the specified channel when the sound effect is turned on.
  • the sample point value before the frame division position in the audio frame and the sample point value after the frame division position in the second audio frame are sequentially spliced to generate a fourth audio frame, and the first audio frame and the second audio frame are replaced together
  • the fourth audio frame is subjected to the fade-out processing of the replaced fourth audio frame, so that the replaced fourth audio frame gradually transitions from the complete sound state to the no-sound state in time series.
  • the audio frame insertion module 1203 is further configured to acquire a frame segmentation in the second audio frame for the adjacent first audio frame and the second audio frame acquired from the audio data stream of the specified channel when the sound effect is turned off. Positioning the previous sample point value and the sample point value after the frame division position in the first audio frame, splicing in order to generate a third audio frame, and inserting the third audio frame between the first audio frame and the second audio frame, And inserting the inserted third audio frame to fade out, so that the inserted third audio frame gradually transitions from the complete sound state to the no sound state in time series.
  • the storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a read-only memory (ROM), or a random access memory (RAM).

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Human Computer Interaction (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Mathematical Physics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Signal Processing For Digital Recording And Reproducing (AREA)
  • Management Or Editing Of Information On Record Carriers (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Stereophonic System (AREA)

Abstract

一种音频数据处理方法和终端,该方法包括:从音频数据流中获取相邻的第一音频帧和第二音频帧,第一音频帧在时序上先于第二音频帧(202);确定帧分割位置,第一音频帧中帧分割位置处的采样点值与第二音频帧中帧分割位置处的采样点值满足距离接近条件(204);获取第二音频帧中帧分割位置以前的采样点值以及第一音频帧中帧分割位置以后的采样点值,按顺序拼接以生成第三音频帧,并将第三音频帧插入第一音频帧和第二音频帧之间(206)。

Description

音频数据处理方法和终端
本申请要求于2016年1月14日提交中国专利局,申请号为201610025708.1,发明名称为“音频数据处理方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及音频数据处理技术领域,特别是涉及一种音频数据处理方法和终端。
背景技术
音频数据处理技术的应用,使得人们可以通过拾音器采集声音生成音频数据并存储,在需要时可通过音频播放器将存储的音频数据播放出来,重现声音。音频数据处理技术的广泛应用,使得声音的记录和再现变的非常容易,对人们的生活和工作都有重要影响。
目前,在对音频数据流进行处理时,存在需要在相邻的两帧音频数据之间插入一帧音频数据的情况。比如,在一些特殊的音效中,通过将左右声道中其中一个声道的音频数据流相邻的两帧音频数据之间插入一帧音频数据,使得左右声道的音频数据流相差一帧音频数据,可以实现环绕声的特殊效果。又比如,当左右声道的音频数据流不同步时,也可以通过在其中一个音频数据流中插入音频数据来缓解左右声道的音频数据流不同步的问题。
然而,目前在音频数据流中相邻的两帧音频数据之间插入音频数据,一般是直接插入这两帧音频数据中的一个,但插入后在播放时会在插入的音频数据处存在明显的噪声,需要克服。类似地,在音频数据流中删除一帧音频数据也会存在噪声。
发明内容
根据本申请的各种实施例,提供一种音频数据处理方法和终端。
一种音频数据处理方法,包括:
从音频数据流中获取相邻的第一音频帧和第二音频帧,所述第一音频帧在时序上先于所述第二音频帧;
确定帧分割位置,所述第一音频帧中所述帧分割位置处的采样点值与所述第二音频帧中所述帧分割位置处的采样点值满足距离接近条件;及
获取所述第二音频帧中帧分割位置以前的采样点值以及所述第一音频帧中帧分割位置以后的采样点值,按顺序拼接以生成第三音频帧,并将所述第三音频帧插入所述第一音频帧和第二音频帧之间。
一种音频数据处理方法,包括:
从音频数据流中获取相邻的第一音频帧和第二音频帧,所述第一音频帧在时序上先于所述第二音频帧;
确定帧分割位置,所述第一音频帧中所述帧分割位置处的采样点值与所述第二音频帧中所述帧分割位置处的采样点值满足距离接近条件;及
获取所述第一音频帧中帧分割位置以前的采样点值以及所述第二音频帧中帧分割位置以后的采样点值,按顺序拼接以生成第四音频帧,并将所述第一音频帧和第二音频帧一并替换为所述第四音频帧。
一种终端,包括存储器和处理器,所述存储器中储存有计算机可读指令,其特征在于,所述计算机可读指令被所述处理器执行时,使得所述处理器执行以下步骤:
从音频数据流中获取相邻的第一音频帧和第二音频帧,所述第一音频帧在时序上先于所述第二音频帧;
确定帧分割位置,所述第一音频帧中所述帧分割位置处的采样点值与所述第二音频帧中所述帧分割位置处的采样点值满足距离接近条件;及
获取所述第二音频帧中帧分割位置以前的采样点值以及所述第一音频帧中帧分割位置以后的采样点值,按顺序拼接以生成第三音频帧,并将所述第 三音频帧插入所述第一音频帧和第二音频帧之间。
一种终端,包括存储器和处理器,所述存储器中储存有计算机可读指令,其特征在于,所述计算机可读指令被所述处理器执行时,使得所述处理器执行以下步骤:
从音频数据流中获取相邻的第一音频帧和第二音频帧,所述第一音频帧在时序上先于所述第二音频帧;
确定帧分割位置,所述第一音频帧中所述帧分割位置处的采样点值与所述第二音频帧中所述帧分割位置处的采样点值满足距离接近条件;及
获取所述第一音频帧中帧分割位置以前的采样点值以及所述第二音频帧中帧分割位置以后的采样点值,按顺序拼接以生成第四音频帧,并将所述第一音频帧和第二音频帧一并替换为所述第四音频帧。
本申请的一个或多个实施例的细节在下面的附图和描述中提出。本申请的其它特征、目的和优点将从说明书、附图以及权利要求书变得明显。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为一个实施例中用于实现音频数据处理方法的终端的结构示意图;
图2为一个实施例中音频数据处理方法的流程示意图;
图3A为一个实施例中在相邻的第一音频帧和第二音频帧之间插入音频帧的示意图;
图3B为一个实施例中在相邻的第一音频帧和第二音频帧之中删除一帧的示意图;
图4为一个实施例中第一音频帧的局部采样点值分布图;
图5为一个实施例中第二音频帧的局部采样点值分布图;
图6为一个实施例中第一音频帧和第二音频帧重叠的局部采样点值分布图;
图7A为一个实施例中分割音频帧、拼接音频帧以及插入音频帧的过程的示意图;
图7B为一个实施例中分割音频帧、拼接音频帧以及替换音频帧的过程的示意图;
图8为一个实施例中保留副本以及进行播放处理的过程的示意图;
图9为一个实施例中确定帧分割位置的步骤的流程示意图;
图10为一个实施例中第一音频帧的第一拟合曲线和第二音频帧的第二拟合曲线在同一坐标系下的示意图;
图11为另一个实施例中音频数据处理方法的流程示意图;
图12为一个实施例中终端的结构框图;
图13为另一个实施例中终端的结构框图;
图14为一个实施例中图12或图13中帧分割位置确定模块的结构框图。
具体实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
如图1所示,在一个实施例中,提供了一种用于实现音频数据处理方法的终端100,包括通过系统总线连接的处理器、非易失性存储介质、内存储器、输入装置以及音频输出接口。其中处理器具有计算功能和控制终端100工作的功能,该处理器被配置为执行一种音频数据处理方法。非易失性存储介质包括磁存储介质、光存储介质和闪存式存储介质中的至少一种,非易失性存储介质存储有计算机可读指令,该计算机可读指令被处理器执行时,可使得处理器执行一种音频数据处理方法。输入装置包括物理按钮、轨迹球、触控板、用于接入外接控制设备的物理接口以及与显示屏重叠的触控层中的 至少一种,外接控制设备比如鼠标或者多媒体线控装置等。终端100包括台式计算机、便携式笔记本电脑、手机、音乐播放器以及智能手表等各种可进行音频数据处理的电子设备。
如图2所示,在一个实施例中,提供了一种音频数据处理方法,本实施例以该方法应用于上述图1中的终端100来举例说明。该方法具体包括如下步骤:
步骤202,从音频数据流中获取相邻的第一音频帧和第二音频帧,第一音频帧在时序上先于第二音频帧。
具体地,音频数据流包括具有时序的一系列的采样点值,采样点值通过将原始的模拟声音信号按照特定的音频采样率采样获得,一系列的采样点值就可以描述声音。音频采样率则是一秒钟内所采集的采样点的数量,单位为赫兹(Hz),音频采样率越高所能描述的声波频率就越高。
音频帧包括具有时序的、数量固定的采样点值。按照音频数据流的编码格式,若编码格式本身存在音频帧则直接采用,若不存在音频帧而只是一系列具有时序的采样点值,则可以按照预设帧长度从这一系列具有时序的采样点值中划分出音频帧。预设帧长度是指预设的一帧音频帧中所包括的采样点值的数量。
从音频数据流中获取的第一音频帧和第二音频帧是相邻的,且第一音频帧在时序上先于第二音频帧,就是说在对音频数据流进行播放处理时,第一音频帧先播放,当第一音频帧播放完毕之后播放第二音频帧。第一音频帧和第二音频帧是需要在两者之间插入音频帧的两个相邻音频帧。
举例说明,参照图3A,一段音频数据流中包括按照时序排列的第一音频帧A、第二音频帧B……,在需要插入音频帧时,需要在第一音频帧A和第二音频帧B之间插入音频帧F。参照图3B,在需要删除音频帧时,需要将第一音频帧A和第二音频B这两帧音频帧的采样点值中删除掉一个音频帧的采样点值,保留一个音频帧G。
步骤204,确定帧分割位置,第一音频帧中帧分割位置处的采样点值与 第二音频帧中帧分割位置处的采样点值满足距离接近条件。
具体地,帧分割位置是指将第一音频帧和第二音频帧进行分割的位置,是相对于一个音频帧的相对位置。距离是指两个音频帧中对应的位置处的采样点值对的差值的绝对值。举例说明,参照图4所示的第一音频帧A的局部采样点值分布图以及图5所示的第二音频帧B的局部采样点值分布图,第一音频帧A的第一个采样点值与第二音频帧B的第一个采样点值的差值的绝对值,便是第一音频帧A的第一个采样点值与第二音频帧B的第一个采样点值的距离。
距离接近条件是指用来判定两个采样点值的距离是否接近的量化条件。在一个实施例中,距离接近条件可以包括距离等于0的情况,还可以包括两个采样点值的距离不相等但接近的情况,比如距离小于等于阈值,该阈值可以是预先设置的,也可以是根据第一音频帧和/或第二音频帧中的采样点值动态确定的,比如可以是第一音频帧和/或第二音频帧中采样点值的平均值乘以预设百分比。
在一个实施例中,终端可计算第一音频帧和第二音频帧中每个采样点值对的距离,从而筛选出距离最小的采样点值对,帧分割位置便是筛选出的距离最小的采样点值对所对应的位置,此时距离接近条件便是第一音频帧和第二音频帧中帧分割位置对应的采样点值对的距离最小化。这里的采样点值对是指两个音频帧中相同位置处的两个采样点值,采样点值的位置则是该采样点值相对于所属音频帧的相对位置。
举例说明,将图4和图5重叠得到图6所示的重叠的局部采样点值分布图,以便对音频帧A的音频帧B的局部采样点值分布进行比较。假设帧分割位置为S,则音频帧A中S处的采样点值与音频帧B中S处的采样点值的差值的绝对值很接近甚至相等,也就是音频帧A中S处的采样点值与音频帧B中S处的采样点值满足距离接近条件。
步骤206,获取第二音频帧中帧分割位置以前的采样点值以及第一音频帧中帧分割位置以后的采样点值,按顺序拼接以生成第三音频帧,并将第三 音频帧插入第一音频帧和第二音频帧之间;或者,获取第一音频帧中帧分割位置以前的采样点值以及第二音频帧中帧分割位置以后的采样点值,按顺序拼接以生成第四音频帧,并将第一音频帧和第二音频帧一并替换为第四音频帧。
具体地,在需要插入音频帧时,获取第二音频帧中帧分割位置以前的采样点值,并获取第一音频帧中帧分割位置以后的采样点值,获取的采样点值的总数恰好等于一个音频帧长度。将来自于第二音频帧的采样点值在前,来自于第一音频帧中的采样点值在后按顺序进行拼接,生成第三音频帧。而且,来自于第二音频帧的采样点值仍保留所在第二音频帧中的顺序,来自于第一音频帧中的采样点值仍保留所在第一音频帧中的顺序。最后将生成的第三音频帧插入第一音频帧和第二音频帧之间。
举例说明,参照图7A,第一音频帧A按照帧分割位置S划分为前部分和后部分,第二音频帧B也按照帧分割位置S划分为前部分和后部分。其中前部分是指帧分割位置S以前的采样点值,相应地后部分则是帧分割位置以后的采样点值。将第二音频帧B的前部分与第一音频帧A的后部分进行拼接,获得第三音频帧F,然后便可以将拼接获得的第三音频帧F插入第一音频帧A与第二音频帧B之间。
在需要删除音频帧时,获取第一音频帧中帧分割位置以前的采样点值,并获取第二音频帧中帧分割位置以后的采样点值,获取的采样点值的总数恰好等于一个音频帧长度。将来自于第一音频帧的采样点值在前、来自于第二音频帧的采样点值在后按顺序进行拼接,获得第四音频帧。而且,来自于第一音频帧的采样点值仍保留所在第一音频帧中的顺序,来自于第二音频帧中的采样点值仍保留所在第二音频帧中的顺序。最后用生成的第四音频帧替换掉第一音频帧和第二音频帧。
举例说明,参照图7B,第一音频帧D按照帧分割位置S划分为前部分和后部分,第二音频帧E也按照帧分割位置S划分为前部分和后部分。其中前部分是指帧分割位置S以前的采样点值,相应地后部分则是帧分割位置以后 的采样点值。将第一音频帧A的前部分与第二音频帧B的后部分进行拼接,获得第四音频帧G,然后便可以用拼接获得的第四音频帧G替换掉第一音频帧A与第二音频帧B。
上述音频数据处理方法,在需要插入音频帧时,将第二音频帧的帧分割位置以前的部分与第一音频帧的帧分割位置以后的部分进行拼接后获得第三音频帧,插入第一音频帧和第二音频帧之间。插入之后,第三音频帧的前部分是第二音频帧的前部分,而第三音频帧的后部分则是第一音频帧的后部分。由于第一音频帧和第二音频帧本身是无缝连接的,这样第一音频帧能够与第三音频帧的前部分无缝连接,第三音频帧的后部分与第二音频帧无缝连接,而且第三音频帧在帧分割位置处满足距离接近条件,这样拼接处也不会产生太大突变,因此可基本克服在插入音频帧时因为音频帧之间的跳跃而产生的噪声问题。
在需要删除音频帧时,将第一音频帧的帧分割位置以前的部分与第二音频帧的帧分割位置以后的部分进行拼接后获得第四音频帧,替换掉第一音频帧和第二音频帧。替换之后,第四音频帧的前部分是第一音频帧的前部分,而第四音频帧的后部分则是第二音频帧的后部分。由于第一音频帧和前一音频帧、第二音频帧和后一音频帧都是无缝连接的,这样替换后第四音频帧能够与第一音频帧的前一音频帧无缝连接,与第二音频帧的后一音频帧无缝连接,而且第四音频帧在帧分割位置处满足距离接近条件,这样拼接处也不会产生太大突变,因此可基本克服在删除音频帧时因为音频帧之间的跳跃而产生的噪声问题。
在一个实施例中,该音频数据处理方法还包括:在对音频数据流进行实时的播放处理时,保留至少一个音频帧长度的采样点值的副本。且步骤202包括:在检测到用于插入音频帧的指令时,根据当前正在进行播放处理的采样点值之前保留的副本获得第一音频帧,并根据当前正在进行播放处理的采样点值之后的一个音频帧长度的采样点值获得第二音频帧。
其中,播放处理是指根据采样点值还原出声音信号的处理,保留至少一 个音频帧长度的采样点值的副本,也就是保留至少一个音频帧的副本。具体地,参照图8,终端在对一个采样点值A1进行播放处理时,保留该采样点值A1的副本A1′,在该采样点值A1之前进行了播放处理的采样点值的副本也会保留下来,保留的副本的总长度至少为一个音频帧长度。
终端在经过一个音频帧长度之后,正在对采样点值B1进行播放处理,此时也会保留该采样点值B1的副本B1′,此时保留的副本至少包括音频帧A的副本A′。假设此时终端检测到用于插入音频帧的指令,则终端会将副本A1′到当前正在进行播放处理的采样点值B1之间的这一个音频帧长度的采样点值的副本作为第一音频帧A,并将采样点值B1之后的一个音频帧长度的音频帧B作为第二音频帧。
本实施例中,通过在对音频数据流进行实时的播放处理时保留至少一个音频帧的副本,在检测到用于插入音频帧的指令时可以立即做出响应,不需要再等待一个音频帧长度的时间,提高了插入音频帧的效率。
如图9所示,在一个实施例中,步骤204具体包括如下步骤:
步骤902,获取候选位置,第一音频帧中候选位置处的采样点值与第二音频帧中相应候选位置处的采样点值满足距离接近条件。
其中,候选位置是筛选出的可作为帧分割位置的音频帧中的位置,具体终端可遍历音频帧中的所有位置,在遍历到每个位置时,判断第一音频帧中和第二音频帧中相应位置处的采样点值对是否满足距离接近条件。若满足距离接近条件则将遍历到的位置加入候选位置集合中,并继续遍历;若不满足距离接近条件,则继续遍历。若遍历之后候选位置集合仍未空,则可选择预设位置(比如音频帧的中间位置)或者采样点值对的距离最小的位置加入到候选位置集合中。
距离接近条件是指用来判定两个采样点值的距离是否接近的量化条件。在一个实施例中,距离接近条件可以包括距离等于0的情况,还可以包括两个采样点值的距离不相等但接近的情况,比如距离小于等于阈值,该阈值可以是预先设置的,也可以是根据第一音频帧和/或第二音频帧中的采样点值动 态确定的。
在一个实施例中,终端可计算第一音频帧和第二音频帧中每个采样点值对的距离并升序排序,从而将排序靠前的预设数量的距离所对应的位置加入候选位置集合中,或者可从排序的距离的最小距离起获取占所有计算出的距离中的预设比例的距离所对应的位置加入候选位置集合中,此时距离接近条件便是第一音频帧和第二音频帧中候选位置对应的采样点值对的距离是将所有计算出的距离升序排序后靠前的预设数量的距离,或者是将所有计算出的距离升序排序后靠前的占所有计算出的距离中的预设比例的距离。
在一个实施例中,距离接近条件为:第一差值与第二差值的乘积小于等于0;其中,第一差值为第一音频帧中候选位置处的采样点值与第二音频帧中相应候选位置处的采样点值的差值;第二差值为第一音频帧中候选位置的下一位置的采样点值与第二音频帧中相应位置处的采样点值的差值。
具体地,假设第一音频帧A为[a1,a2,……,am],第二音频帧B为[b1,b2,……,bm],则距离接近条件可用以下公式(1)表示:
(ai-bi)*(ai+1-bi+1)≤0,(i∈[1,m-1])    公式(1)
其中,i表示第一音频帧A以及第二音频帧B中的候选位置,可称为采样点值序号,m为一个音频帧长度;(ai-bi)为第一差值,表示第一音频帧A中候选位置i处的采样点值ai与第二音频帧B中相应候选位置i处的采样点值bi的差值;(ai+1-bi+1)为第二差值,表示第一音频帧A中候选位置i的下一位置i+1的采样点值ai+1与第二音频帧B中相应位置i+1处的采样点值bi+1的差值;公式(1)表示第一差值(ai-bi)与第二差值(ai+1-bi+1)的乘积小于等于0。
上述公式(1)所表示的距离接近条件,是为了找到第一音频帧的采样点值构成的第一拟合曲线和第二音频帧中的采样点值构成的第二拟合曲线的交点,还可以用其它求取两个曲线交点的方式来确定交点。若该交点正好是一个采样点值的位置,则将该位置加入候选位置集合;若该交点不是任何采样点值的位置,则可将音频帧的所有位置中最靠近该交点的位置加入候选位置 集合。比如图10中的第一拟合曲线和第二拟合曲线存在交点X,则可将最靠近该交点X的两个位置S1或S2加入候选位置集合。其它求取两个曲线交点的方式比如先分别求取两个拟合曲线的数学表达,从而通过函数计算来直接求取交点。上述公式(1)所表示的距离接近条件效率更高。
步骤904,获取第一音频帧和第二音频帧中在覆盖候选位置的预设长度的离散位置范围内的各采样点值对的距离和。
其中,覆盖候选位置的预设长度的离散位置范围,包括某候选位置,该离散位置集合包括的离散位置的数量是固定的即预设长度。优选可以在候选位置前后等量选取一定数量的离散位置与候选位置一同构成离散位置范围,也可以在候选位置前后不等量地选取离散位置与候选位置一同构成离散位置范围。离散位置集合中的各个位置优选可以是顺序相邻的,当然也可以间隔地选取离散位置与候选位置一同构成离散位置范围。
终端具体可逐个从候选位置集合中选择候选位置,并获取第一音频帧和第二音频帧中在覆盖所选择的候选位置的预设长度的离散位置范围内的各采样点值对的距离和。
在一个实施例中,可采用以下公式(2)来获取第一音频帧和第二音频帧中在覆盖候选位置的预设长度的离散位置范围内的各采样点值对的距离和:
Figure PCTCN2017070692-appb-000001
其中,n为候选位置减去N,N可取[1,(m-1)/2],优选可取[2,(m-1)/100],更优可取5;候选位置为n+N,此时离散位置范围为以候选位置n+N为中心向左右分别取N个位置与候选位置n+N构成预设长度为2N+1的离散位置范围[n,…,n+N,…2N+n];|aj-bj|是第一音频帧A和第二音频帧B中在离散位置范围内的各采样点值对(aj,bj)的距离,Rn则是第一音频帧A和第二音频帧B中在离散位置范围内的各采样点值对(aj,bj)的距离和。
步骤906,将最小距离和所对应的候选位置确定为帧分割位置。
具体地,为了从候选位置集合中找出最优的候选位置作为帧分割位置, 可对候选位置集合中的所有候选位置分别计算距离和之后,找出最小的距离和所对应的候选位置作为帧分割位置。具体可表示为如下公式(3):
T=Min(Rn)
其中,T为目标函数,通过优化目标函数T,求得最小距离和对应的候选位置n,从而获得帧分割位置n+N。确定的帧分割位置也满足距离接近条件:第一差值与第二差值的乘积小于等于0;其中,第一差值为第一音频帧中帧分割位置处的采样点值与第二音频帧中相应帧分割位置处的采样点值的差值;第二差值为第一音频帧中帧分割位置的下一位置的采样点值与第二音频帧中相应位置处的采样点值的差值。
上述通过步骤904到步骤906找到的帧分割位置,是通过找到在第一拟合曲线和第二拟合曲线的交点附近最相似的交点处的候选位置作为帧分割位置。上述步骤904是获取第一音频帧和第二音频帧中在相应的候选位置处的局部相似度的具体步骤,而步骤906则是根据局部相似度确定帧分割位置的具体步骤。候选位置处的局部相似度是指在候选位置附近固定范围内第一拟合曲线和第二拟合曲线相似的程度,通过上述公式(2)计算出的局部相似度越小表示越相似。若第一拟合曲线和第二拟合曲线在候选位置附近越相似,相应的两种曲线具有越相似的斜率,分割之后再拼接获得的第三音频帧过渡越平缓,对噪声的抑制作用更好。
局部相似度还可以通过互相关函数计算互相关度而获得。设两个函数分别是f(t)和g(t),则互相关函数定义为R(u)=f(t)*g(-t),它反映的是两个函数在不同的相对位置上互相匹配的程度。互相关函数虽然也可以表示两个信号的相似程度,但是如果应用于本方案,在进行少量点的互相关度的计算时,单独的两个同向大采样点值可能会获得一个较大的互相关度,表示两条曲线越相似,但却不是最佳的帧分割位置。但通过上述公式(2)获得的局部相似度克服了利用互相关函数计算互相关度的缺点,公式(2)中每个位置的采样点值在计算互相关度时所起的作用比较平衡,同时利用差值的绝对值作为衡量一个位置的采样点值所起作用的作用值,可以很好地描述交点前后的斜率差 异,可以找到最合适的候选位置作为帧分割位置。
在一个实施例中,该音频数据处理方法还包括:对于在开启音效时从指定声道的音频数据流中获取的相邻的第一音频帧和第二音频帧,执行获取第二音频帧中帧分割位置以前的采样点值以及第一音频帧中帧分割位置以后的采样点值,按顺序拼接以生成第三音频帧,并将第三音频帧插入第一音频帧和第二音频帧之间的步骤,并对插入的第三音频帧进行淡入处理,使插入的第三音频帧按时序从无音效状态逐渐过渡到完整音效状态。
具体地,对指定声道的音频数据流执行步骤202、步骤204以及步骤206的前半部分插入音频帧的步骤。开启音效的指令是用于插入音频帧的指令,此时开启的音效是基于声道异步的音效,通过在指定声道插入一帧音频帧,使得指定声道的音频数据流比剩余的其它声道延迟一个音频帧,从而达到因音源到达人两耳的时间相差一个音频帧的时间而产生的环绕音效。
无音效状态是指开启音效之前的状态,完整音效状态是开启音效之后的状态,通过对第三音频帧进行淡入处理,使得插入的第三音频帧按照其中采样点值的时序,按时序从无音效状态逐渐过渡到完整音效状态,从而达到音效平缓过渡的效果。比如若完整音效状态下需要音量提高5倍,则可以逐步提升音量的倍数,直至最高达到5倍时与处于完整音效状态的第二音频帧无缝连接。逐渐过渡可以是线性过渡,也可以是曲线性过渡。
本实施例中,在关闭音效时,可对指定声道的音频数据流执行步骤202、步骤204以及步骤206的后半部分替换音频帧的步骤,并对替换为的第四音频帧进行淡出处理,使替换为的第四音频帧按时序从完整音效状态逐渐过渡到无音效状态。淡出处理与淡入处理相反,是逐渐消除音效的影响的处理过程。
本实施例中,通过将指定声道的两帧音频帧替换为一帧音频帧,删除掉一帧音频帧,使得指定声道恢复到与其它声道同步的状态。可快速开启和/或关闭基于声道异步的音效,提高了切换音效的效率。
在一个实施例中,对于在开启音效时从指定声道的音频数据流中获取的 相邻的第一音频帧和第二音频帧,还可对指定声道执行获取第一音频帧中帧分割位置以前的采样点值以及第二音频帧中帧分割位置以后的采样点值,按顺序拼接以生成第四音频帧,并将第一音频帧和第二音频帧一并替换为第四音频帧的步骤,并对替换为的第四音频帧进行淡入处理,使替换为的第四音频帧按时序从无音效状态逐渐过渡到完整音效状态。
本实施例中,在关闭音效时,则可以对指定声道执行步骤202、步骤204,以及步骤206的前半部分:获取第二音频帧中帧分割位置以前的采样点值以及第一音频帧中帧分割位置以后的采样点值,按顺序拼接以生成第三音频帧,并将第三音频帧插入第一音频帧和第二音频帧之间。并且对插入的第三音频帧进行淡出处理,使插入的第三音频帧按时序从完整音效状态逐渐过渡到无音效状态。本实施例也可实现快速开启和/或关闭基于声道异步的音效,提高了切换音效的效率。
如图11所示,在一个实施例中,一种音频数据处理方法,包括如下步骤:
步骤1102,在开启音效时,从指定声道的音频数据流中获取相邻的第一音频帧和第二音频帧,第一音频帧在时序上先于第二音频帧。
步骤1104,获取第一候选位置,第一音频帧中第一候选位置处的采样点值与第二音频帧中相应第一候选位置处的采样点值满足距离接近条件。其中,距离接近条件可为:第一差值与第二差值的乘积小于等于0。且第一差值为第一音频帧中候选位置处的采样点值与第二音频帧中相应候选位置处的采样点值的差值。第二差值为第一音频帧中候选位置的下一位置的采样点值与第二音频帧中相应位置处的采样点值的差值。
步骤1106,获取第一音频帧和第二音频帧中在覆盖第一候选位置的预设长度的离散位置范围内的各采样点值对的距离和。
步骤1108,将最小距离和所对应的第一候选位置确定为第一帧分割位置。
步骤1110,获取第二音频帧中帧分割位置以前的采样点值以及第一音频帧中帧分割位置以后的采样点值,按顺序拼接以生成第三音频帧。
步骤1112,将第三音频帧插入第一音频帧和第二音频帧之间。
步骤1114,对插入的第三音频帧进行淡入处理,使插入的第三音频帧按时序从无音效状态逐渐过渡到完整音效状态。
步骤1116,在关闭音效时,从指定声道的音频数据流中获取相邻的第五音频帧和第六音频帧,第五音频帧在时序上先于第六音频帧。其中,第五音频帧相当于图2所示的实施例的步骤206中用来生成第四音频帧的第一音频帧,第六音频帧相当于图2所示的实施例的步骤206中用来生成第四音频帧的第二音频帧。
步骤1118,获取第二候选位置,第五音频帧中第二候选位置处的采样点值与第六音频帧中相应第二候选位置处的采样点值满足距离接近条件。其中,距离接近条件可为:第一差值与第二差值的乘积小于等于0。且第一差值为第五音频帧中候选位置处的采样点值与第六音频帧中相应候选位置处的采样点值的差值。第二差值为第五音频帧中候选位置的下一位置的采样点值与第六音频帧中相应位置处的采样点值的差值。
步骤1120,获取第五音频帧和第六音频帧中在覆盖第二候选位置的预设长度的离散位置范围内的各采样点值对的距离和。
步骤1122,将最小距离和所对应的第二候选位置确定为第二帧分割位置。
步骤1124,获取第五音频帧中第二帧分割位置以前的采样点值以及第六音频帧中第二帧分割位置以后的采样点值,按顺序拼接以生成第四音频帧。
步骤1126,将第五音频帧和第六音频帧一并替换为第四音频帧。
步骤1128,对替换为的第四音频帧进行淡出处理,使替换为的第四音频帧按时序从完整音效状态逐渐过渡到无音效状态。
上述音频数据处理方法,在需要插入音频帧时,将第二音频帧的帧分割位置以前的部分与第一音频帧的帧分割位置以后的部分进行拼接后获得第三音频帧,插入第一音频帧和第二音频帧之间。插入之后,第三音频帧的前部分是第二音频帧的前部分,而第三音频帧的后部分则是第一音频帧的后部分。由于第一音频帧和第二音频帧本身是无缝连接的,这样第一音频帧能够与第三音频帧的前部分无缝连接,第三音频帧的后部分与第二音频帧无缝连接, 而且第三音频帧在帧分割位置处满足距离接近条件,这样拼接处也不会产生太大突变,因此可基本克服在插入音频帧时因为音频帧之间的跳跃而产生的噪声问题。
在需要删除音频帧时,将第一音频帧的帧分割位置以前的部分与第二音频帧的帧分割位置以后的部分进行拼接后获得第四音频帧,替换掉第一音频帧和第二音频帧。替换之后,第四音频帧的前部分是第一音频帧的前部分,而第四音频帧的后部分则是第二音频帧的后部分。由于第一音频帧和前一音频帧、第二音频帧和后一音频帧都是无缝连接的,这样替换后第四音频帧能够与第一音频帧的前一音频帧无缝连接,与第二音频帧的后一音频帧无缝连接,而且第四音频帧在帧分割位置处满足距离接近条件,这样拼接处也不会产生太大突变,因此可基本克服在删除音频帧时因为音频帧之间的跳跃而产生的噪声问题。
本申请还提供一种终端,终端的内部结构可对应于如图1所示的结构,下述每个模块可全部或部分通过软件、硬件或其组合来实现。如图12所示,在一个实施例中,终端1200包括音频帧获取模块1201和帧分割位置确定模块1202,还包括音频帧插入模块1203和音频帧替换模块1204中的至少一种。
音频帧获取模块1201,用于从音频数据流中获取相邻的第一音频帧和第二音频帧,第一音频帧在时序上先于第二音频帧。
具体地,音频数据流包括具有时序的一系列的采样点值,采样点值通过将原始的模拟声音信号按照特定的音频采样率采样获得,一系列的采样点值就可以描述声音。音频采样率则是一秒钟内所采集的采样点的数量,单位为赫兹,音频采样率越高所能描述的声波频率就越高。
音频帧包括具有时序的、数量固定的采样点值。按照音频数据流的编码格式,若编码格式本身存在音频帧则直接采用,若不存在音频帧而只是一系列具有时序的采样点值,则可以按照预设帧长度从这一系列具有时序的采样点值中划分出音频帧。预设帧长度是指预设的一帧音频帧中所包括的采样点值的数量。
音频帧获取模块1201从音频数据流中获取的第一音频帧和第二音频帧是相邻的,且第一音频帧在时序上先于第二音频帧,就是说在对音频数据流进行播放处理时,第一音频帧先播放,当第一音频帧播放完毕之后播放第二音频帧。第一音频帧和第二音频帧是需要在两者之间插入音频帧的两个相邻音频帧。
帧分割位置确定模块1202,用于确定帧分割位置,第一音频帧中帧分割位置处的采样点值与第二音频帧中帧分割位置处的采样点值满足距离接近条件。
具体地,帧分割位置是指将第一音频帧和第二音频帧进行分割的位置,是相对于一个音频帧的相对位置。距离是指两个音频帧中对应的位置处的采样点值对的差值的绝对值。举例说明,参照图4所示的第一音频帧A的局部采样点值分布图以及图5所示的第二音频帧B的局部采样点值分布图,第一音频帧A的第一个采样点值与第二音频帧B的第一个采样点值的差值的绝对值,便是第一音频帧A的第一个采样点值与第二音频帧B的第一个采样点值的距离。
距离接近条件是指用来判定两个采样点值的距离是否接近的量化条件。在一个实施例中,距离接近条件可以包括距离等于0的情况,还可以包括两个采样点值的距离不相等但接近的情况,比如距离小于等于阈值,该阈值可以是预先设置的,也可以是根据第一音频帧和/或第二音频帧中的采样点值动态确定的,比如可以是第一音频帧和/或第二音频帧中采样点值的平均值乘以预设百分比。
在一个实施例中,帧分割位置确定模块1202可计算第一音频帧和第二音频帧中每个采样点值对的距离,从而筛选出距离最小的采样点值对,帧分割位置便是筛选出的距离最小的采样点值对所对应的位置,此时距离接近条件便是第一音频帧和第二音频帧中帧分割位置对应的采样点值对的距离最小化。这里的采样点值对是指两个音频帧中相同位置处的两个采样点值,采样点值的位置则是该采样点值相对于所属音频帧的相对位置。
音频帧插入模块1203,用于获取第二音频帧中帧分割位置以前的采样点值以及第一音频帧中帧分割位置以后的采样点值,按顺序拼接以生成第三音频帧,并将第三音频帧插入第一音频帧和第二音频帧之间。
具体地,在需要插入音频帧时,音频帧插入模块1203获取第二音频帧中帧分割位置以前的采样点值,并获取第一音频帧中帧分割位置以后的采样点值,获取的采样点值的总数恰好等于一个音频帧长度。将来自于第二音频帧的采样点值在前,来自于第一音频帧中的采样点值在后按顺序进行拼接,生成第三音频帧。而且,来自于第二音频帧的采样点值仍保留所在第二音频帧中的顺序,来自于第一音频帧中的采样点值仍保留所在第一音频帧中的顺序。最后将生成的第三音频帧插入第一音频帧和第二音频帧之间。
音频帧替换模块1204,用于获取第一音频帧中帧分割位置以前的采样点值以及第二音频帧中帧分割位置以后的采样点值,按顺序拼接以生成第四音频帧,并将第一音频帧和第二音频帧一并替换为第四音频帧。
在需要删除音频帧时,音频帧替换模块1204获取第一音频帧中帧分割位置以前的采样点值,并获取第二音频帧中帧分割位置以后的采样点值,获取的采样点值的总数恰好等于一个音频帧长度。将来自于第一音频帧的采样点值在前、来自于第二音频帧的采样点值在后按顺序进行拼接,获得第四音频帧。而且,来自于第一音频帧的采样点值仍保留所在第一音频帧中的顺序,来自于第二音频帧中的采样点值仍保留所在第二音频帧中的顺序。最后用生成的第四音频帧替换掉第一音频帧和第二音频帧。
上述终端1200,在需要插入音频帧时,将第二音频帧的帧分割位置以前的部分与第一音频帧的帧分割位置以后的部分进行拼接后获得第三音频帧,插入第一音频帧和第二音频帧之间。插入之后,第三音频帧的前部分是第二音频帧的前部分,而第三音频帧的后部分则是第一音频帧的后部分。由于第一音频帧和第二音频帧本身是无缝连接的,这样第一音频帧能够与第三音频帧的前部分无缝连接,第三音频帧的后部分与第二音频帧无缝连接,而且第三音频帧在帧分割位置处满足距离接近条件,这样拼接处也不会产生太大突 变,因此可基本克服在插入音频帧时因为音频帧之间的跳跃而产生的噪声问题。
在需要删除音频帧时,将第一音频帧的帧分割位置以前的部分与第二音频帧的帧分割位置以后的部分进行拼接后获得第四音频帧,替换掉第一音频帧和第二音频帧。替换之后,第四音频帧的前部分是第一音频帧的前部分,而第四音频帧的后部分则是第二音频帧的后部分。由于第一音频帧和前一音频帧、第二音频帧和后一音频帧都是无缝连接的,这样替换后第四音频帧能够与第一音频帧的前一音频帧无缝连接,与第二音频帧的后一音频帧无缝连接,而且第四音频帧在帧分割位置处满足距离接近条件,这样拼接处也不会产生太大突变,因此可基本克服在删除音频帧时因为音频帧之间的跳跃而产生的噪声问题。
如图13所示,在一个实施例中,终端1200还包括:副本保留模块1205,用于在对音频数据流进行实时的播放处理时,保留至少一个音频帧长度的采样点值的副本。
音频帧获取模块1201还用于在检测到用于插入音频帧的指令时,根据当前正在进行播放处理的采样点值之前保留的副本获得第一音频帧,并根据当前正在进行播放处理的采样点值之后的一个音频帧长度的采样点值获得第二音频帧。
其中,播放处理是指根据采样点值还原出声音信号的处理,保留至少一个音频帧长度的采样点值的副本,也就是保留至少一个音频帧的副本。具体地,参照图8,在对一个采样点值A1进行播放处理时,副本保留模块1205保留该采样点值A1的副本A1′,在该采样点值A1之前进行了播放处理的采样点值的副本也会保留下来,保留的副本的总长度至少为一个音频帧长度。
在经过一个音频帧长度之后,正在对采样点值B1进行播放处理,此时副本保留模块1205也会保留该采样点值B1的副本B1′,此时保留的副本至少包括音频帧A的副本A′。假设此时音频帧获取模块1201检测到用于插入音频帧的指令,则音频帧获取模块1201会将副本A1′到当前正在进行播放处理的采 样点值B1之间的这一个音频帧长度的采样点值的副本作为第一音频帧A,并将采样点值B1之后的一个音频帧长度的音频帧B作为第二音频帧。
本实施例中,通过在对音频数据流进行实时的播放处理时保留至少一个音频帧的副本,在检测到用于插入音频帧的指令时可以立即做出响应,不需要再等待一个音频帧长度的时间,提高了插入音频帧的效率。
如图14所示,在一个实施例中,帧分割位置确定模块1202包括:候选位置获取模块1202a、相似度量模块1202b和确定模块1202c。
候选位置获取模块1202a,用于获取候选位置,所述第一音频帧中所述候选位置处的采样点值与所述第二音频帧中相应候选位置处的采样点值满足距离接近条件。相似度量模块1202b,用于获取第一音频帧和第二音频帧中在相应的候选位置处的局部相似度。确定模块1202c,用于根据所述局部相似度确定帧分割位置。
候选位置获取模块1202a,用于获取候选位置,第一音频帧中候选位置处的采样点值与第二音频帧中相应候选位置处的采样点值满足距离接近条件。
其中,候选位置是筛选出的可作为帧分割位置的音频帧中的位置,位置是离散的,每个采样点值对应一个离散的位置。具体候选位置获取模块1202a可遍历音频帧中的所有位置,在遍历到每个位置时,判断第一音频帧中和第二音频帧中相应位置处的采样点值对是否满足距离接近条件。若满足距离接近条件则候选位置获取模块1202a将遍历到的位置加入候选位置集合中,并继续遍历;若不满足距离接近条件,则继续遍历。若遍历之后候选位置集合仍未空,则候选位置获取模块1202a可选择预设位置(比如音频帧的中间位置)或者采样点值对的距离最小的位置加入到候选位置集合中。
距离接近条件是指用来判定两个采样点值的距离是否接近的量化条件。在一个实施例中,距离接近条件可以包括距离等于0的情况,还可以包括两个采样点值的距离不相等但接近的情况,比如距离小于等于阈值,该阈值可以是预先设置的,也可以是根据第一音频帧和/或第二音频帧中的采样点值动态确定的。
在一个实施例中,候选位置获取模块1202a可计算第一音频帧和第二音频帧中每个采样点值对的距离并升序排序,从而将排序靠前的预设数量的距离所对应的位置加入候选位置集合中,此时距离接近条件便是第一音频帧和第二音频帧中候选位置对应的采样点值对的距离是将所有计算出的距离升序排序后靠前的预设数量的距离。或者可从排序的距离的最小距离起获取占所有计算出的距离中的预设比例的距离所对应的位置加入候选位置集合中,此时距离接近条件便是第一音频帧和第二音频帧中候选位置对应的采样点值对的距离是将所有计算出的距离升序排序后靠前的占所有计算出的距离中的预设比例的距离。
在一个实施例中,距离接近条件为:第一差值与第二差值的乘积小于等于0;其中,第一差值为第一音频帧中候选位置处的采样点值与第二音频帧中相应候选位置处的采样点值的差值;第二差值为第一音频帧中候选位置的下一位置的采样点值与第二音频帧中相应位置处的采样点值的差值。
具体地,假设第一音频帧A为[a1,a2,……,am],第二音频帧B为[b1,b2,……,bm],则距离接近条件可用以下公式(1)表示:
(ai-bi)*(ai+1-bi+1)≤0,(i∈[1,m-1])   公式(1)
其中,i表示第一音频帧A以及第二音频帧B中的候选位置,可称为采样点值序号,m为一个音频帧长度;(ai-bi)为第一差值,表示第一音频帧A中候选位置i处的采样点值ai与第二音频帧B中相应候选位置i处的采样点值bi的差值;(ai+1-bi+1)为第二差值,表示第一音频帧A中候选位置i的下一位置i+1的采样点值ai+1与第二音频帧B中相应位置i+1处的采样点值bi+1的差值;公式(1)表示第一差值(ai-bi)与第二差值(ai+1-bi+1)的乘积小于等于0。
上述公式(1)所表示的距离接近条件,是为了找到第一音频帧的采样点值构成的第一拟合曲线和第二音频帧中的采样点值构成的第二拟合曲线的交点,还可以用其它求取两个曲线交点的方式来确定交点。若该交点正好是一个采样点值的位置,则将该位置加入候选位置集合;若该交点不是任何采样 点值的位置,则可将音频帧的所有位置中最靠近该交点的位置加入候选位置集合。比如图10中的第一拟合曲线和第二拟合曲线存在交点X,则可将最靠近该交点X的两个位置S1或S2加入候选位置集合。其它求取两个曲线交点的方式比如先分别求取两个拟合曲线的数学表达,从而通过函数计算来直接求取交点。上述公式(1)所表示的距离接近条件效率更高。
相似度量模块1202b,用于获取第一音频帧和第二音频帧中在覆盖候选位置的预设长度的离散位置范围内的各采样点值对的距离和。
其中,覆盖候选位置的预设长度的离散位置范围,包括某候选位置,该离散位置集合包括的离散位置的数量是固定的即预设长度,且该位置集合中的位置是顺序相邻的。相似度量模块1202b具体可逐个从候选位置集合中选择候选位置,并获取第一音频帧和第二音频帧中在覆盖所选择的候选位置的预设长度的离散位置范围内的各采样点值对的距离和。
在一个实施例中,相似度量模块1202b可采用以下公式(2)来获取第一音频帧和第二音频帧中在覆盖候选位置的预设长度的离散位置范围内的各采样点值对的距离和:
Figure PCTCN2017070692-appb-000002
其中,n为候选位置减去N,N可取[1,(m-1)/2],优选可取[2,(m-1)/100],更优可取5;候选位置为n+N,此时离散位置范围为以候选位置n+N为中心向左右分别取N个位置与候选位置n+N构成预设长度为2N+1的离散位置范围[n,…,n+N,…2N+n];|aj-bj|是第一音频帧A和第二音频帧B中在离散位置范围内的各采样点值对(aj,bj)的距离,Rn则是第一音频帧A和第二音频帧B中在离散位置范围内的各采样点值对(aj,bj)的距离和。
确定模块1202c,用于将最小距离和所对应的候选位置确定为帧分割位置。
相似度量模块1202b用于获取第一音频帧和第二音频帧中在相应的候选位置处的局部相似度,确定模块1202c则用于根据局部相似度确定帧分割位 置。
具体地,为了从候选位置集合中找出最优的候选位置作为帧分割位置,可对候选位置集合中的所有候选位置分别计算距离和之后,找出最小的距离和所对应的候选位置作为帧分割位置。具体可表示为如下公式(3):
T=Min(Rn)
其中,T为目标函数,通过优化目标函数T,求得最小距离和对应的候选位置n,从而获得帧分割位置n+N。确定的帧分割位置也满足距离接近条件:第一差值与第二差值的乘积小于等于0;其中,第一差值为第一音频帧中帧分割位置处的采样点值与第二音频帧中相应帧分割位置处的采样点值的差值;第二差值为第一音频帧中帧分割位置的下一位置的采样点值与第二音频帧中相应位置处的采样点值的差值。
本实施例中,通过找到在第一拟合曲线和第二拟合曲线的交点附近最相似的交点处的候选位置作为帧分割位置。候选位置处的局部相似度是指在候选位置附近固定范围内第一拟合曲线和第二拟合曲线相似的程度,通过上述公式(2)计算出的局部相似度越小表示越相似。若第一拟合曲线和第二拟合曲线在候选位置附近越相似,相应的两种曲线具有越相似的斜率,分割之后再拼接获得的第三音频帧过渡越平缓,对噪声的抑制作用更好。
局部相似度还可以通过互相关函数计算互相关度而获得,互相关函数虽然也可以表示两个信号的相似程度,但是如果应用于本方案,在进行少量点的互相关度的计算时,单独的两个同向大采样点值可能会获得一个较大的互相关度,表示两条曲线越相似,但却不是最佳的帧分割位置。但通过上述公式(2)获得的局部相似度克服了利用互相关函数计算互相关度的缺点,公式(2)中每个位置的采样点值在计算互相关度时所起的作用比较平衡,同时利用差值的绝对值作为衡量一个位置的采样点值所起作用的作用值,可以很好地描述交点前后的斜率差异,可以找到最合适的候选位置作为帧分割位置。
在一个实施例中,音频帧插入模块1203还用于对于在开启音效时从指定声道的音频数据流中获取的相邻的第一音频帧和第二音频帧,获取第二音频 帧中帧分割位置以前的采样点值以及第一音频帧中帧分割位置以后的采样点值,按顺序拼接以生成第三音频帧,并将第三音频帧插入第一音频帧和第二音频帧之间,并对插入的第三音频帧进行淡入处理,使插入的第三音频帧按时序从无音效状态逐渐过渡到完整音效状态。
本实施例中,音频帧替换模块1204还用于在关闭音效时,获取第一音频帧中帧分割位置以前的采样点值以及第二音频帧中帧分割位置以后的采样点值,按顺序拼接以生成第四音频帧,并将第一音频帧和第二音频帧一并替换为第四音频帧,并对替换为的第四音频帧进行淡出处理,使替换为的第四音频帧按时序从完整音效状态逐渐过渡到无音效状态。
在一个实施例中,音频帧替换模块1204还用于对于在开启音效时从指定声道的音频数据流中获取的相邻的所述第一音频帧和所述第二音频帧,获取第一音频帧中帧分割位置以前的采样点值以及第二音频帧中帧分割位置以后的采样点值,按顺序拼接以生成第四音频帧,并将第一音频帧和第二音频帧一并替换为第四音频帧,并对替换为的第四音频帧进行淡出处理,使替换为的第四音频帧按时序从完整音效状态逐渐过渡到无音效状态。
本实施例中,音频帧插入模块1203还用于对于在关闭音效时从指定声道的音频数据流中获取的相邻的第一音频帧和第二音频帧,获取第二音频帧中帧分割位置以前的采样点值以及第一音频帧中帧分割位置以后的采样点值,按顺序拼接以生成第三音频帧,并将第三音频帧插入第一音频帧和第二音频帧之间,并对插入的第三音频帧进行淡出处理,使插入的第三音频帧按时序从完整音效状态逐渐过渡到无音效状态。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,所述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)等非易失性存储介质,或随机存储记忆体(Random Access Memory,RAM)等。
以上所述实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。

Claims (20)

  1. 一种音频数据处理方法,包括:
    从音频数据流中获取相邻的第一音频帧和第二音频帧,所述第一音频帧在时序上先于所述第二音频帧;
    确定帧分割位置,所述第一音频帧中所述帧分割位置处的采样点值与所述第二音频帧中所述帧分割位置处的采样点值满足距离接近条件;及
    获取所述第二音频帧中帧分割位置以前的采样点值以及所述第一音频帧中帧分割位置以后的采样点值,按顺序拼接以生成第三音频帧,并将所述第三音频帧插入所述第一音频帧和第二音频帧之间。
  2. 根据权利要求1所述的方法,其特征在于,还包括:
    在对音频数据流进行实时的播放处理时,保留至少一个音频帧长度的采样点值的副本;
    所述从音频数据流中获取相邻的第一音频帧和第二音频帧包括:
    在检测到用于插入音频帧的指令时,根据当前正在进行播放处理的采样点值之前保留的副本获得第一音频帧,并根据当前正在进行播放处理的采样点值之后的一个音频帧长度的采样点值获得第二音频帧。
  3. 根据权利要求1所述的方法,其特征在于,所述确定帧分割位置包括:
    获取候选位置,所述第一音频帧中所述候选位置处的采样点值与所述第二音频帧中相应候选位置处的采样点值满足距离接近条件;
    获取第一音频帧和第二音频帧中在相应的候选位置处的局部相似度;及
    根据所述局部相似度确定帧分割位置。
  4. 根据权利要求1所述的方法,其特征在于,所述确定帧分割位置包括:
    获取候选位置,所述第一音频帧中所述候选位置处的采样点值与所述第二音频帧中相应候选位置处的采样点值满足距离接近条件;
    获取所述第一音频帧和所述第二音频帧中在覆盖所述候选位置的预设长度的离散位置范围内的各采样点值对的距离和;及
    将最小距离和所对应的候选位置确定为帧分割位置。
  5. 根据权利要求4所述的方法,其特征在于,所述距离接近条件为:
    第一差值与第二差值的乘积小于等于0;
    其中,所述第一差值为所述第一音频帧中所述候选位置处的采样点值与所述第二音频帧中相应候选位置处的采样点值的差值;
    所述第二差值为所述第一音频帧中所述候选位置的下一位置的采样点值与所述第二音频帧中相应位置处的采样点值的差值。
  6. 根据权利要求1所述的方法,其特征在于,还包括:
    对于在开启音效时从指定声道的音频数据流中获取的相邻的所述第一音频帧和所述第二音频帧,执行所述获取所述第二音频帧中帧分割位置以前的采样点值以及所述第一音频帧中帧分割位置以后的采样点值,按顺序拼接以生成第三音频帧,并将所述第三音频帧插入所述第一音频帧和第二音频帧之间的步骤,并对插入的第三音频帧进行淡入处理,使插入的第三音频帧按时序从无音效状态逐渐过渡到完整音效状态。
  7. 一种音频数据处理方法,包括:
    从音频数据流中获取相邻的第一音频帧和第二音频帧,所述第一音频帧在时序上先于所述第二音频帧;
    确定帧分割位置,所述第一音频帧中所述帧分割位置处的采样点值与所述第二音频帧中所述帧分割位置处的采样点值满足距离接近条件;及
    获取所述第一音频帧中帧分割位置以前的采样点值以及所述第二音频帧中帧分割位置以后的采样点值,按顺序拼接以生成第四音频帧,并将所述第一音频帧和第二音频帧一并替换为所述第四音频帧。
  8. 根据权利要求7所述的方法,其特征在于,还包括:
    在对音频数据流进行实时的播放处理时,保留至少一个音频帧长度的采样点值的副本;
    所述从音频数据流中获取相邻的第一音频帧和第二音频帧包括:
    在检测到用于插入音频帧的指令时,根据当前正在进行播放处理的采样点值之前保留的副本获得第一音频帧,并根据当前正在进行播放处理的采样 点值之后的一个音频帧长度的采样点值获得第二音频帧。
  9. 根据权利要求7所述的方法,其特征在于,所述确定帧分割位置包括:
    获取候选位置,所述第一音频帧中所述候选位置处的采样点值与所述第二音频帧中相应候选位置处的采样点值满足距离接近条件;
    获取第一音频帧和第二音频帧中在相应的候选位置处的局部相似度;及
    根据所述局部相似度确定帧分割位置。
  10. 根据权利要求7所述的方法,其特征在于,所述确定帧分割位置包括:
    获取候选位置,所述第一音频帧中所述候选位置处的采样点值与所述第二音频帧中相应候选位置处的采样点值满足距离接近条件;
    获取所述第一音频帧和所述第二音频帧中在覆盖所述候选位置的预设长度的离散位置范围内的各采样点值对的距离和;及
    将最小距离和所对应的候选位置确定为帧分割位置。
  11. 根据权利要求10所述的方法,其特征在于,所述距离接近条件为:
    第一差值与第二差值的乘积小于等于0;
    其中,所述第一差值为所述第一音频帧中所述候选位置处的采样点值与所述第二音频帧中相应候选位置处的采样点值的差值;
    所述第二差值为所述第一音频帧中所述候选位置的下一位置的采样点值与所述第二音频帧中相应位置处的采样点值的差值。
  12. 根据权利要求7所述的方法,其特征在于,还包括:
    对于在开启音效时从指定声道的音频数据流中获取的相邻的所述第一音频帧和所述第二音频帧,执行所述获取所述第一音频帧中帧分割位置以前的采样点值以及所述第二音频帧中帧分割位置以后的采样点值,按顺序拼接以生成第四音频帧,并将所述第一音频帧和第二音频帧一并替换为所述第四音频帧的步骤,并对替换为的所述第四音频帧进行淡入处理,使替换为的所述第四音频帧按时序从无音效状态逐渐过渡到完整音效状态。
  13. 一种终端,包括存储器和处理器,所述存储器中储存有计算机可读 指令,其特征在于,所述计算机可读指令被所述处理器执行时,使得所述处理器执行以下步骤:
    从音频数据流中获取相邻的第一音频帧和第二音频帧,所述第一音频帧在时序上先于所述第二音频帧;
    确定帧分割位置,所述第一音频帧中所述帧分割位置处的采样点值与所述第二音频帧中所述帧分割位置处的采样点值满足距离接近条件;及
    获取所述第二音频帧中帧分割位置以前的采样点值以及所述第一音频帧中帧分割位置以后的采样点值,按顺序拼接以生成第三音频帧,并将所述第三音频帧插入所述第一音频帧和第二音频帧之间。
  14. 根据权利要求13所述的终端,其特征在于,所述计算机可读指令被所述处理器执行时,还使得所述处理器执行以下步骤:
    在对音频数据流进行实时的播放处理时,保留至少一个音频帧长度的采样点值的副本;
    所述从音频数据流中获取相邻的第一音频帧和第二音频帧包括:
    在检测到用于插入音频帧的指令时,根据当前正在进行播放处理的采样点值之前保留的副本获得第一音频帧,并根据当前正在进行播放处理的采样点值之后的一个音频帧长度的采样点值获得第二音频帧。
  15. 根据权利要求13所述的终端,其特征在于,所述确定帧分割位置包括:
    获取候选位置,所述第一音频帧中所述候选位置处的采样点值与所述第二音频帧中相应候选位置处的采样点值满足距离接近条件;
    获取第一音频帧和第二音频帧中在相应的候选位置处的局部相似度;及
    根据所述局部相似度确定帧分割位置。
  16. 根据权利要求13所述的终端,其特征在于,所述确定帧分割位置包括:
    获取候选位置,所述第一音频帧中所述候选位置处的采样点值与所述第二音频帧中相应候选位置处的采样点值满足距离接近条件;
    获取所述第一音频帧和所述第二音频帧中在覆盖所述候选位置的预设长度的离散位置范围内的各采样点值对的距离和;及
    将最小距离和所对应的候选位置确定为帧分割位置。
  17. 根据权利要求16所述的终端,其特征在于,所述距离接近条件为:
    第一差值与第二差值的乘积小于等于0;
    其中,所述第一差值为所述第一音频帧中所述候选位置处的采样点值与所述第二音频帧中相应候选位置处的采样点值的差值;
    所述第二差值为所述第一音频帧中所述候选位置的下一位置的采样点值与所述第二音频帧中相应位置处的采样点值的差值。
  18. 根据权利要求13所述的终端,其特征在于,所述计算机可读指令被所述处理器执行时,还使得所述处理器执行以下步骤:
    对于在开启音效时从指定声道的音频数据流中获取的相邻的所述第一音频帧和所述第二音频帧,执行所述获取所述第二音频帧中帧分割位置以前的采样点值以及所述第一音频帧中帧分割位置以后的采样点值,按顺序拼接以生成第三音频帧,并将所述第三音频帧插入所述第一音频帧和第二音频帧之间的步骤,并对插入的第三音频帧进行淡入处理,使插入的第三音频帧按时序从无音效状态逐渐过渡到完整音效状态。
  19. 一种终端,包括存储器和处理器,所述存储器中储存有计算机可读指令,其特征在于,所述计算机可读指令被所述处理器执行时,使得所述处理器执行以下步骤:
    从音频数据流中获取相邻的第一音频帧和第二音频帧,所述第一音频帧在时序上先于所述第二音频帧;
    确定帧分割位置,所述第一音频帧中所述帧分割位置处的采样点值与所述第二音频帧中所述帧分割位置处的采样点值满足距离接近条件;及
    获取所述第一音频帧中帧分割位置以前的采样点值以及所述第二音频帧中帧分割位置以后的采样点值,按顺序拼接以生成第四音频帧,并将所述第一音频帧和第二音频帧一并替换为所述第四音频帧。
  20. 根据权利要求19所述的终端,其特征在于,所述计算机可读指令被所述处理器执行时,还使得所述处理器执行以下步骤:
    在对音频数据流进行实时的播放处理时,保留至少一个音频帧长度的采样点值的副本;
    所述从音频数据流中获取相邻的第一音频帧和第二音频帧包括:
    在检测到用于插入音频帧的指令时,根据当前正在进行播放处理的采样点值之前保留的副本获得第一音频帧,并根据当前正在进行播放处理的采样点值之后的一个音频帧长度的采样点值获得第二音频帧。
PCT/CN2017/070692 2016-01-14 2017-01-10 音频数据处理方法和终端 WO2017121304A1 (zh)

Priority Applications (5)

Application Number Priority Date Filing Date Title
JP2018529129A JP6765650B2 (ja) 2016-01-14 2017-01-10 オーディオデータ処理方法および端末
EP17738118.3A EP3404652B1 (en) 2016-01-14 2017-01-10 Audio data processing method and terminal
KR1020187016293A KR102099029B1 (ko) 2016-01-14 2017-01-10 오디오 데이터 처리 방법 및 단말기
MYPI2018701827A MY191125A (en) 2016-01-14 2017-01-10 Audio data processing method and terminal
US15/951,078 US10194200B2 (en) 2016-01-14 2018-04-11 Audio data processing method and terminal

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610025708.1A CN106970771B (zh) 2016-01-14 2016-01-14 音频数据处理方法和装置
CN201610025708.1 2016-01-14

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US15/951,078 Continuation-In-Part US10194200B2 (en) 2016-01-14 2018-04-11 Audio data processing method and terminal

Publications (1)

Publication Number Publication Date
WO2017121304A1 true WO2017121304A1 (zh) 2017-07-20

Family

ID=59310835

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/070692 WO2017121304A1 (zh) 2016-01-14 2017-01-10 音频数据处理方法和终端

Country Status (7)

Country Link
US (1) US10194200B2 (zh)
EP (1) EP3404652B1 (zh)
JP (1) JP6765650B2 (zh)
KR (1) KR102099029B1 (zh)
CN (1) CN106970771B (zh)
MY (1) MY191125A (zh)
WO (1) WO2017121304A1 (zh)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108347672B (zh) * 2018-02-09 2021-01-22 广州酷狗计算机科技有限公司 播放音频的方法、装置及存储介质
CN109086026B (zh) * 2018-07-17 2020-07-03 阿里巴巴集团控股有限公司 播报语音的确定方法、装置和设备
CN109346111B (zh) * 2018-10-11 2020-09-04 广州酷狗计算机科技有限公司 数据处理方法、装置、终端及存储介质
CN111613195B (zh) * 2019-02-22 2022-12-09 浙江大学 音频拼接方法、装置及存储介质
CN111954027B (zh) * 2020-08-06 2022-07-08 浩联时代(北京)科技有限公司 流媒体数据转码方法、装置、计算设备及可读存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060085197A1 (en) * 2000-12-28 2006-04-20 Yamaha Corporation Singing voice-synthesizing method and apparatus and storage medium
CN101425291A (zh) * 2007-10-31 2009-05-06 株式会社东芝 语音处理装置及语音处理方法
CN101640053A (zh) * 2009-07-24 2010-02-03 王祐凡 音频处理方法、装置以及音频播放方法、装置
CN101789240A (zh) * 2009-12-25 2010-07-28 华为技术有限公司 语音信号处理方法和装置以及通信系统
CN103905843A (zh) * 2014-04-23 2014-07-02 无锡天脉聚源传媒科技有限公司 一种规避连续i帧的分布式音视频处理装置和处理方法
US20140236584A1 (en) * 2013-02-21 2014-08-21 Qualcomm Incorporated Systems and methods for quantizing and dequantizing phase information

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5490234A (en) * 1993-01-21 1996-02-06 Apple Computer, Inc. Waveform blending technique for text-to-speech system
JP3017715B2 (ja) * 1997-10-31 2000-03-13 松下電器産業株式会社 音声再生装置
JP3744216B2 (ja) * 1998-08-07 2006-02-08 ヤマハ株式会社 波形形成装置及び方法
US7272556B1 (en) * 1998-09-23 2007-09-18 Lucent Technologies Inc. Scalable and embedded codec for speech and audio signals
JP2004109362A (ja) * 2002-09-17 2004-04-08 Pioneer Electronic Corp フレーム構造のノイズ除去装置、フレーム構造のノイズ除去方法およびフレーム構造のノイズ除去プログラム
JP4219898B2 (ja) * 2002-10-31 2009-02-04 富士通株式会社 音声強調装置
JP4406440B2 (ja) * 2007-03-29 2010-01-27 株式会社東芝 音声合成装置、音声合成方法及びプログラム
US20090048827A1 (en) * 2007-08-17 2009-02-19 Manoj Kumar Method and system for audio frame estimation
US8762852B2 (en) * 2010-11-04 2014-06-24 Digimarc Corporation Smartphone-based methods and systems
JP5784939B2 (ja) 2011-03-17 2015-09-24 スタンレー電気株式会社 発光素子、発光素子モジュールおよび車両用灯具
US9066121B2 (en) * 2011-08-09 2015-06-23 Google Technology Holdings LLC Addressable advertising switch by decoupling decoding from service acquisitions
US9043201B2 (en) * 2012-01-03 2015-05-26 Google Technology Holdings LLC Method and apparatus for processing audio frames to transition between different codecs
CN104519401B (zh) * 2013-09-30 2018-04-17 贺锦伟 视频分割点获得方法及设备

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060085197A1 (en) * 2000-12-28 2006-04-20 Yamaha Corporation Singing voice-synthesizing method and apparatus and storage medium
CN101425291A (zh) * 2007-10-31 2009-05-06 株式会社东芝 语音处理装置及语音处理方法
CN101640053A (zh) * 2009-07-24 2010-02-03 王祐凡 音频处理方法、装置以及音频播放方法、装置
CN101789240A (zh) * 2009-12-25 2010-07-28 华为技术有限公司 语音信号处理方法和装置以及通信系统
US20140236584A1 (en) * 2013-02-21 2014-08-21 Qualcomm Incorporated Systems and methods for quantizing and dequantizing phase information
CN103905843A (zh) * 2014-04-23 2014-07-02 无锡天脉聚源传媒科技有限公司 一种规避连续i帧的分布式音视频处理装置和处理方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3404652A4 *

Also Published As

Publication number Publication date
CN106970771A (zh) 2017-07-21
EP3404652A4 (en) 2018-11-21
CN106970771B (zh) 2020-01-14
MY191125A (en) 2022-05-31
KR102099029B1 (ko) 2020-04-08
US10194200B2 (en) 2019-01-29
KR20180082521A (ko) 2018-07-18
US20180234721A1 (en) 2018-08-16
EP3404652B1 (en) 2019-12-04
JP2019508722A (ja) 2019-03-28
JP6765650B2 (ja) 2020-10-07
EP3404652A1 (en) 2018-11-21

Similar Documents

Publication Publication Date Title
WO2017121304A1 (zh) 音频数据处理方法和终端
US12114048B2 (en) Automated voice translation dubbing for prerecorded videos
US11456017B2 (en) Looping audio-visual file generation based on audio and video analysis
CN112235631B (zh) 视频处理方法、装置、电子设备及存储介质
US8767970B2 (en) Audio panning with multi-channel surround sound decoding
US8271872B2 (en) Composite audio waveforms with precision alignment guides
US8744249B2 (en) Picture selection for video skimming
US9613605B2 (en) Method, device and system for automatically adjusting a duration of a song
WO2022143924A1 (zh) 视频生成方法、装置、电子设备和存储介质
WO2014204997A1 (en) Adaptive audio content generation
EP3929770B1 (en) Methods, systems, and media for modifying the presentation of video content on a user device based on a consumption of the user device
US20060236219A1 (en) Media timeline processing infrastructure
WO2009104402A1 (ja) 音楽再生装置、音楽再生方法、音楽再生プログラム、及び集積回路
US20160313970A1 (en) Gapless media generation
US20140371891A1 (en) Local control of digital signal processing
KR101212036B1 (ko) 프로젝트를 분할하여 동영상의 객체정보를 저작하는 방법 및 장치
WO2024131555A1 (zh) 视频配乐方法、设备、存储介质及程序产品
CN118283216A (zh) 数据处理方法、电子设备以及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17738118

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2018529129

Country of ref document: JP

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 20187016293

Country of ref document: KR

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: 1020187016293

Country of ref document: KR

NENP Non-entry into the national phase

Ref country code: DE