WO2017121304A1 - 音频数据处理方法和终端 - Google Patents
音频数据处理方法和终端 Download PDFInfo
- Publication number
- WO2017121304A1 WO2017121304A1 PCT/CN2017/070692 CN2017070692W WO2017121304A1 WO 2017121304 A1 WO2017121304 A1 WO 2017121304A1 CN 2017070692 W CN2017070692 W CN 2017070692W WO 2017121304 A1 WO2017121304 A1 WO 2017121304A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- audio frame
- frame
- audio
- sample point
- point value
- Prior art date
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 20
- 238000005070 sampling Methods 0.000 claims abstract description 71
- 238000000034 method Methods 0.000 claims abstract description 42
- 230000011218 segmentation Effects 0.000 claims abstract description 38
- 230000008569 process Effects 0.000 claims description 28
- 230000000694 effects Effects 0.000 claims description 26
- 230000007704 transition Effects 0.000 claims description 20
- 238000012545 processing Methods 0.000 claims description 18
- 238000013459 approach Methods 0.000 claims description 17
- 230000000717 retained effect Effects 0.000 claims description 9
- 238000010586 diagram Methods 0.000 description 13
- 230000006870 function Effects 0.000 description 10
- 238000003780 insertion Methods 0.000 description 9
- 230000037431 insertion Effects 0.000 description 9
- 238000005314 correlation function Methods 0.000 description 7
- 230000001174 ascending effect Effects 0.000 description 6
- 238000013139 quantization Methods 0.000 description 4
- 230000005236 sound signal Effects 0.000 description 4
- 230000008859 change Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 230000014759 maintenance of location Effects 0.000 description 3
- 230000035772 mutation Effects 0.000 description 3
- 230000009471 action Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 230000001629 suppression Effects 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 238000004590 computer program Methods 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/439—Processing of audio elementary streams
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
- G06F3/165—Management of the audio stream, e.g. setting of volume, audio stream path
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/16—Vocoder architecture
- G10L19/167—Audio streaming, i.e. formatting and decoding of an encoded audio signal representation into a data stream for transmission or storage purposes
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10K—SOUND-PRODUCING DEVICES; METHODS OR DEVICES FOR PROTECTING AGAINST, OR FOR DAMPING, NOISE OR OTHER ACOUSTIC WAVES IN GENERAL; ACOUSTICS NOT OTHERWISE PROVIDED FOR
- G10K11/00—Methods or devices for transmitting, conducting or directing sound in general; Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
- G10K11/16—Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/005—Correction of errors induced by the transmission channel, if related to the coding algorithm
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0316—Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S1/00—Two-channel systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S1/00—Two-channel systems
- H04S1/007—Two-channel systems in which the audio signals are in digital form
Definitions
- the present application relates to the field of audio data processing technologies, and in particular, to an audio data processing method and terminal.
- the application of audio data processing technology enables people to generate sound data through the sound collection of the pickup and store it, and play the stored audio data through the audio player to reproduce the sound when needed.
- the wide application of audio data processing technology makes the recording and reproduction of sound very easy, and has an important impact on people's lives and work.
- the audio data streams of the left and right channels are separated by one frame of audio by inserting one frame of audio data between two adjacent audio data streams of one of the left and right channels.
- the audio data streams of the left and right channels are not synchronized, the problem that the audio data streams of the left and right channels are out of synchronization can also be alleviated by inserting audio data into one of the audio data streams.
- audio data is inserted between adjacent two frames of audio data in the audio data stream, and generally one of the two frames of audio data is directly inserted, but after insertion, there is an obvious presence in the inserted audio data during playback. Noise needs to be overcome. Similarly, there is also noise in deleting one frame of audio data in the audio stream.
- an audio data processing method and terminal are provided.
- An audio data processing method includes:
- sampling point values before the frame division position in the second audio frame and sampling point values after the frame division position in the first audio frame splicing in order to generate a third audio frame, and the third audio A frame is inserted between the first audio frame and the second audio frame.
- An audio data processing method includes:
- sampling point values before the frame division position in the first audio frame and sampling point values after the frame division position in the second audio frame splicing in order to generate a fourth audio frame, and the first audio The frame and the second audio frame are replaced together with the fourth audio frame.
- a terminal comprising a memory and a processor, the memory storing computer readable instructions, wherein the computer readable instructions are executed by the processor such that the processor performs the following steps:
- sampling point values before the frame division position in the second audio frame and sampling point values after the frame division position in the first audio frame splicing in order to generate a third audio frame, and A three audio frame is inserted between the first audio frame and the second audio frame.
- a terminal comprising a memory and a processor, the memory storing computer readable instructions, wherein the computer readable instructions are executed by the processor such that the processor performs the following steps:
- sampling point values before the frame division position in the first audio frame and sampling point values after the frame division position in the second audio frame splicing in order to generate a fourth audio frame, and the first audio The frame and the second audio frame are replaced together with the fourth audio frame.
- FIG. 1 is a schematic structural diagram of a terminal for implementing an audio data processing method in an embodiment
- FIG. 2 is a schematic flow chart of an audio data processing method in an embodiment
- 3A is a schematic diagram of inserting an audio frame between adjacent first audio frames and second audio frames in one embodiment
- FIG. 3B is a schematic diagram of deleting one frame among adjacent first audio frames and second audio frames in one embodiment
- FIG. 4 is a partial sample point value distribution diagram of a first audio frame in an embodiment
- 5 is a partial sample point value distribution diagram of a second audio frame in an embodiment
- FIG. 6 is a partial sample point value distribution diagram in which an first audio frame and a second audio frame overlap in one embodiment
- FIG. 7A is a schematic diagram of a process of dividing an audio frame, splicing an audio frame, and inserting an audio frame in one embodiment
- 7B is a schematic diagram of a process of dividing an audio frame, splicing an audio frame, and replacing an audio frame in one embodiment
- FIG. 8 is a schematic diagram of a process of retaining a copy and performing a playback process in one embodiment
- FIG. 9 is a flow chart showing the steps of determining a frame division position in an embodiment
- FIG. 10 is a schematic diagram of a first fitting curve of a first audio frame and a second fitting curve of a second audio frame in the same coordinate system in one embodiment
- FIG. 11 is a schematic flow chart of an audio data processing method in another embodiment
- FIG. 12 is a structural block diagram of a terminal in an embodiment
- Figure 13 is a block diagram showing the structure of a terminal in another embodiment
- Figure 14 is a block diagram showing the structure of the frame division position determining module of Figure 12 or Figure 13 in one embodiment.
- a terminal 100 for implementing an audio data processing method including a processor connected through a system bus, a non-volatile storage medium, an internal memory, an input device, and audio. Output Interface.
- the processor has a computing function and a function of controlling the operation of the terminal 100, the processor being configured to perform an audio data processing method.
- the non-volatile storage medium includes at least one of a magnetic storage medium, an optical storage medium, and a flash storage medium, the non-volatile storage medium storing computer readable instructions executable by the processor The processor is caused to perform an audio data processing method.
- the input device includes a physical button, a trackball, a touchpad, a physical interface for accessing an external control device, and a touch layer overlapping the display screen.
- At least one of the external control devices such as a mouse or a multimedia remote control device.
- the terminal 100 includes various electronic devices capable of audio data processing such as a desktop computer, a portable notebook computer, a mobile phone, a music player, and a smart watch.
- an audio data processing method is provided. This embodiment is applied to the terminal 100 in FIG. 1 to illustrate.
- the method specifically includes the following steps:
- Step 202 Acquire an adjacent first audio frame and a second audio frame from the audio data stream, where the first audio frame precedes the second audio frame in time series.
- the audio data stream includes a series of sample point values having timings obtained by sampling the original analog sound signal at a specific audio sample rate, and a series of sample point values can describe the sound.
- the audio sampling rate is the number of sampling points collected in one second, in Hertz (Hz). The higher the audio sampling rate, the higher the frequency of the sound wave that can be described.
- the audio frame includes a fixed number of sample point values with timing. According to the encoding format of the audio data stream, if the encoding format itself has an audio frame, it is directly used. If there is no audio frame but only a series of sampling point values with timing, the series of timing samples can be selected according to the preset frame length.
- the audio frame is divided into point values.
- the preset frame length refers to the number of sample point values included in a preset one frame of audio frame.
- the first audio frame and the second audio frame obtained from the audio data stream are adjacent, and the first audio frame precedes the second audio frame in time series, that is, when the audio data stream is played back, first The audio frame is played first, and the second audio frame is played after the first audio frame is played.
- the first audio frame and the second audio frame are two adjacent audio frames that need to be inserted between the two.
- a piece of audio data stream includes a first audio frame A, a second audio frame B, ... arranged in time series, and when the audio frame needs to be inserted, the first audio frame A and the second audio frame are required. Audio frame F is inserted between B.
- FIG. 3B when it is required to delete an audio frame, it is necessary to delete the sampling point value of one audio frame from the sampling point values of the two audio frames of the first audio frame A and the second audio B, and reserve one audio frame G.
- Step 204 determining a frame segmentation position, a sample point value at a frame segmentation position in the first audio frame, and The sample point value at the frame division position in the second audio frame satisfies the distance approach condition.
- the frame division position refers to a position at which the first audio frame and the second audio frame are divided, and is a relative position with respect to one audio frame.
- the distance refers to the absolute value of the difference of the sample point value pairs at the corresponding positions in the two audio frames. For example, referring to the local sampling point value distribution map of the first audio frame A shown in FIG. 4 and the local sampling point value distribution map of the second audio frame B shown in FIG. 5, the first sampling of the first audio frame A The absolute value of the difference between the point value and the first sample point value of the second audio frame B is the first sample point value of the first audio frame A and the first sample point value of the second audio frame B. distance.
- the distance approach condition refers to a quantization condition used to determine whether the distances of the two sample point values are close.
- the distance proximity condition may include a case where the distance is equal to 0, and may also include a case where the distances of the two sample point values are not equal but close, such as the distance is less than or equal to the threshold, and the threshold may be preset or may be It is dynamically determined based on the sample point values in the first audio frame and/or the second audio frame, such as may be the average of the sample point values in the first audio frame and/or the second audio frame multiplied by a preset percentage.
- the terminal may calculate the distance of each sample point value pair in the first audio frame and the second audio frame, thereby filtering out the sample point value pair with the smallest distance, and the frame segmentation position is the smallest distance selected by the screen.
- the sample point value is corresponding to the position, and the distance proximity condition is that the distance of the pair of sample point values corresponding to the frame segmentation position in the first audio frame and the second audio frame is minimized.
- the sample point value pair here refers to two sample point values at the same position in two audio frames, and the position of the sample point value is the relative position of the sample point value relative to the associated audio frame.
- FIG. 4 and FIG. 5 are overlapped to obtain the overlapping partial sample point value distribution map shown in FIG. 6 to compare the local sample point value distribution of the audio frame B of the audio frame A.
- the frame division position is S
- the absolute value of the difference between the sample point value at S in the audio frame A and the sample point value at S in the audio frame B is very close or even equal, that is, the sample at S in the audio frame A.
- the point value satisfies the distance close condition with the sample point value at S in the audio frame B.
- Step 206 Acquire a sample point value before the frame division position in the second audio frame and sample point values after the frame division position in the first audio frame, and splicing in order to generate a third audio frame, and third The audio frame is inserted between the first audio frame and the second audio frame; or, the sample point value before the frame segmentation position in the first audio frame and the sample point value after the frame segmentation position in the second audio frame are acquired, and are sequentially stitched together A fourth audio frame is generated and the first audio frame and the second audio frame are replaced together with a fourth audio frame.
- the sample point value before the frame splitting position in the second audio frame is obtained, and the sample point value after the frame split position in the first audio frame is obtained, and the total number of sample points obtained is exactly equal to One audio frame length.
- the sample point value from the second audio frame is preceded, and the sample point values from the first audio frame are spliced sequentially in order to generate a third audio frame.
- the sample point values from the second audio frame remain in the order in which they are located in the second audio frame, and the sample point values from the first audio frame remain in the order in which they were located in the first audio frame.
- the generated third audio frame is inserted between the first audio frame and the second audio frame.
- the first audio frame A is divided into a front portion and a rear portion according to the frame division position S
- the second audio frame B is also divided into a front portion and a rear portion according to the frame division position S.
- the former part refers to the sample point value before the frame division position S
- the latter part is the sample point value after the frame division position.
- the sample point value before the frame segmentation position in the first audio frame is obtained, and the sample point value after the frame segmentation position in the second audio frame is obtained, and the total number of sampled point values obtained is exactly equal to one audio frame. length.
- the sample point values from the first audio frame are forwarded, and the sample point values from the second audio frame are sequentially spliced in order to obtain a fourth audio frame.
- the sample point values from the first audio frame remain in the order in which they are located in the first audio frame, and the sample point values from the second audio frame remain in the order in the second audio frame.
- the first audio frame and the second audio frame are replaced with the generated fourth audio frame.
- the first audio frame D is divided into a front portion and a rear portion according to the frame division position S
- the second audio frame E is also divided into a front portion and a rear portion according to the frame division position S.
- the former part refers to the sample point value before the frame division position S
- the latter part is the frame division position.
- Sample point value Splicing the front portion of the first audio frame A with the rear portion of the second audio frame B to obtain a fourth audio frame G, and then replacing the first audio frame A with the second audio frame G obtained by stitching Audio frame B.
- the part before the frame division position of the second audio frame is spliced with the portion after the frame division position of the first audio frame to obtain a third audio frame, and the first audio is inserted.
- the front portion of the third audio frame is the front portion of the second audio frame
- the rear portion of the third audio frame is the rear portion of the first audio frame. Since the first audio frame and the second audio frame are themselves seamlessly connected, the first audio frame can be seamlessly connected to the front portion of the third audio frame, and the rear portion of the third audio frame is seamlessly connected to the second audio frame.
- the third audio frame satisfies the distance close condition at the frame division position, so that the splicing does not cause too much abrupt change, and thus the noise problem caused by the jump between the audio frames when the audio frame is inserted can be substantially overcome.
- the fourth audio frame is obtained by splicing the previous part of the frame division position of the first audio frame with the part after the frame division position of the second audio frame, and replacing the first audio frame and the second audio. frame.
- the front portion of the fourth audio frame is the front portion of the first audio frame and the rear portion of the fourth audio frame is the rear portion of the second audio frame.
- the replaced fourth audio frame can be seamlessly connected with the previous audio frame of the first audio frame, and The latter audio frame of the second audio frame is seamlessly connected, and the fourth audio frame satisfies the distance close condition at the frame division position, so that the splicing does not cause too much mutation, so the audio can be basically overcome when the audio frame is deleted. Noise problems caused by jumps between frames.
- the audio data processing method further includes retaining a copy of the sample point value of the at least one audio frame length when the audio data stream is subjected to real-time playback processing.
- step 202 includes: when detecting the instruction for inserting the audio frame, obtaining the first audio frame according to the copy retained before the sampling point value currently being played, and according to the sampling point value currently being played back A sample point value of one audio frame length obtains a second audio frame.
- the playback process refers to a process of restoring a sound signal according to the sample point value, and retaining at least one A copy of the sample point value of the audio frame length, that is, a copy of at least one audio frame.
- the terminal when the terminal performs a playback process on a sample point value A1, the terminal retains the copy A1' of the sample point value A1, and a copy of the sample point value that has been played before the sample point value A1 is also Reserved, the total length of the retained copy is at least one audio frame length.
- the terminal After passing the length of one audio frame, the terminal is performing playback processing on the sample point value B1. At this time, the copy B1' of the sample point value B1 is also retained, and the reserved copy at this time includes at least the copy A' of the audio frame A. Assuming that the terminal detects an instruction to insert an audio frame at this time, the terminal will use the copy of the sample point value of the length of the audio frame between the copy A1' and the sample point value B1 currently being played back as the first audio. Frame A, and the audio frame B of one audio frame length after the sample point value B1 is taken as the second audio frame.
- the response can be immediately responded when an instruction for inserting the audio frame is detected, and there is no need to wait for an audio frame length. The time increases the efficiency of inserting audio frames.
- step 204 specifically includes the following steps:
- Step 902 Acquire candidate positions, where the sample point value at the candidate position in the first audio frame and the sample point value at the corresponding candidate position in the second audio frame satisfy the distance proximity condition.
- the candidate position is a position in the selected audio frame that can be used as a frame division position, and the specific terminal can traverse all the positions in the audio frame, and when traversing to each position, determine the first audio frame and the second audio frame. Whether the pair of sample points at the corresponding position in the middle meets the distance proximity condition. If the distance close condition is satisfied, the traversed position is added to the candidate position set, and the traversal is continued; if the distance close condition is not satisfied, the traversal is continued. If the set of candidate locations is still not empty after the traversal, the preset location (such as the middle position of the audio frame) or the location with the smallest distance of the pair of sampled values may be selected to be added to the set of candidate locations.
- the distance approach condition refers to a quantization condition used to determine whether the distances of the two sample point values are close.
- the distance proximity condition may include a case where the distance is equal to 0, and may also include a case where the distances of the two sample point values are not equal but close, such as the distance is less than or equal to the threshold, and the threshold may be preset or may be Is based on the sampling point value in the first audio frame and/or the second audio frame State is determined.
- the terminal may calculate the distance of each pair of sample point values in the first audio frame and the second audio frame and sort the data in ascending order, thereby adding the position corresponding to the preset number of distances before the sort to the candidate position set.
- the location corresponding to the distance of the preset ratio in all the calculated distances may be obtained from the minimum distance of the sorted distances, and the distance proximity condition is the first audio frame and the second
- the distance of the pair of sample point values corresponding to the candidate position in the audio frame is the preset number of distances after all the calculated distances are sorted in ascending order, or all the calculated distances are sorted in ascending order and then the top is calculated. The distance of the preset distance in the distance.
- the distance proximity condition is that a product of the first difference value and the second difference value is less than or equal to 0; wherein the first difference value is a sample point value at the candidate position in the first audio frame and the second audio frame The difference of the sample point values at the corresponding candidate positions; the second difference is the difference between the sample point value of the next position of the candidate position in the first audio frame and the sample point value at the corresponding position in the second audio frame.
- the distance close condition can be used as follows.
- Formula (1) means:
- i denotes a candidate position in the first audio frame A and the second audio frame B, which may be referred to as a sample point value sequence number, where m is an audio frame length;
- (a i -b i ) is a first difference value, indicating a frame of an audio sampling point candidate value at position i a i and a second audio frame B corresponding sample point candidate at the position i of the difference value b i;
- (a i + 1 -b i + 1) for the first two difference indicating that the next position of the first audio frame in the candidate position i a i + 1 of the sampling point values of a i + 1 and a second position corresponding audio frame B i + 1 at the sampling point values B i + 1
- the difference of formula; (1) represents that the product of the first difference (a i -b i ) and the second difference (a i+1 -b i+1 ) is less than or equal to zero.
- the distance approach condition represented by the above formula (1) is to find the intersection of the first fitting curve composed of the sampling point values of the first audio frame and the second fitting curve composed of the sampling point values in the second audio frame. It is also possible to determine the intersection by other means of finding the intersection of the two curves. If the intersection point is exactly the position of a sample point value, the position is added to the candidate position set; if the intersection point is not the position of any sample point value, the position closest to the intersection point among all the positions of the audio frame may be added to the candidate position. set. For example, if the first fitting curve and the second fitting curve in FIG. 10 have an intersection X, the two positions S1 or S2 closest to the intersection X may be added to the candidate position set. Other ways to find the intersection of two curves, for example, first obtain the mathematical expression of the two fitting curves, and then directly calculate the intersection by function calculation. The distance represented by the above formula (1) is more efficient than the condition.
- Step 904 Acquire a distance sum of each sample point value pair in a range of discrete positions of a preset length of the coverage candidate position in the first audio frame and the second audio frame.
- the discrete location range covering the preset length of the candidate location includes a candidate location, and the discrete location set includes a fixed number of discrete locations, that is, a preset length.
- a certain number of discrete positions may be selected in the same position before and after the candidate position to form a discrete position range together with the candidate position, or the discrete position may be selected together with the candidate position before and after the candidate position to form a discrete position range.
- the respective positions in the set of discrete positions may preferably be sequentially adjacent. Of course, the discrete positions may be selected at intervals to form a discrete position range together with the candidate positions.
- the terminal may select a candidate location from the set of candidate locations one by one, and acquire a distance of each sample point value pair in a discrete position range covering a preset length of the selected candidate location in the first audio frame and the second audio frame. .
- the following formula (2) may be employed to obtain a distance sum of pairs of sample point values in a range of discrete positions of a preset length of the coverage candidate position in the first audio frame and the second audio frame:
- N may take [1, (m-1)/2], preferably [2, (m-1) / 100], preferably 5; candidate position is n + N,
- the discrete position range is N positions with the candidate position n+N centered on the left and right sides and the candidate position n+N constitute a discrete position range [n,...,n+N,...2N with a preset length of 2N+1.
- Step 906 determining the minimum distance and the corresponding candidate position as the frame division position.
- the distance and after each of the candidate locations in the set of candidate locations can be separately calculated to find the smallest distance and the corresponding candidate location as the frame segmentation location. Specifically, it can be expressed as the following formula (3):
- the determined frame segmentation position also satisfies the distance proximity condition: the product of the first difference value and the second difference value is less than or equal to 0; wherein the first difference value is the sample point value at the frame segmentation position in the first audio frame and the second audio a difference of sample point values at a corresponding frame division position in the frame; the second difference is a sample point value of a next position of the frame division position in the first audio frame and a sample point value at a corresponding position in the second audio frame Difference.
- the frame division position found by the above steps 904 to 906 is obtained by finding the candidate position at the intersection which is the most similar to the intersection of the first fitting curve and the second fitting curve as the frame division position.
- the above step 904 is a specific step of acquiring the local similarity at the corresponding candidate position in the first audio frame and the second audio frame
- step 906 is a specific step of determining the frame division position according to the local similarity.
- the local similarity at the candidate position refers to the degree to which the first fitting curve and the second fitting curve are similar in a fixed range near the candidate position, and the smaller the local similarity calculated by the above formula (2), the more similar. If the first fitted curve and the second fitted curve are similar in the vicinity of the candidate positions, the corresponding two curves have more similar slopes, and the third audio frame transition obtained after the splitting is more gentle, and the noise suppression effect is more it is good.
- Local similarity can also be obtained by calculating the cross-correlation by the cross-correlation function.
- the cross-correlation function can also indicate the similarity degree of the two signals, if applied to this scheme, when calculating the cross-correlation degree of a small number of points, the two separate large-sampling point values may obtain a larger value. The degree of cross-correlation, which indicates that the two curves are similar, but not the optimal frame segmentation position.
- the local similarity obtained by the above formula (2) overcomes the shortcomings of calculating the cross-correlation degree by using the cross-correlation function.
- the sampling point value of each position in the formula (2) plays a relatively balanced role in calculating the cross-correlation degree.
- the absolute value of the difference is used as the action value of the sampling point value of one position, which can well describe the slope difference before and after the intersection point.
- the most suitable candidate position can be found as the frame division position.
- the audio data processing method further includes: performing acquisition of the second audio frame for the adjacent first audio frame and the second audio frame acquired from the audio data stream of the specified channel when the sound effect is turned on
- the sample point value before the frame division position and the sample point value after the frame division position in the first audio frame are sequentially spliced to generate a third audio frame, and the third audio frame is inserted into the first audio frame and the second audio frame.
- the steps are performed, and the inserted third audio frame is fade-in processed, so that the inserted third audio frame gradually transitions from the no-sound state to the complete sound state in time series.
- the steps of inserting the audio frame into the first half of step 202, step 204, and step 206 are performed on the audio data stream of the specified channel.
- the command to turn on the sound is an instruction for inserting an audio frame.
- the sound effect that is turned on is based on the asynchronous sound of the channel.
- the audio data stream of the specified channel is compared with the remaining sounds.
- the track is delayed by one audio frame to achieve a surround sound effect due to the time when the source arrives at the ear of the person differing by one audio frame.
- the no-sound state refers to the state before the sound effect is turned on, and the complete sound state is the state after the sound effect is turned on, and the third audio frame is fade-in processed, so that the inserted third audio frame follows the timing of the sampled point value, and the timing is never
- the sound state gradually transitions to a full sound state, thereby achieving a smooth transition of the sound effect. For example, if the volume needs to be increased by 5 times in the full sound state, the volume multiple can be gradually increased until the maximum of 5 times is seamlessly connected with the second audio frame in the complete sound state.
- a gradual transition can be a linear transition or a curved transition.
- the steps of replacing the audio frame in the second half of step 202, step 204 and step 206 may be performed on the audio data stream of the specified channel, and the fourth audio frame replaced is fade-out processed.
- the fourth audio frame replaced with a gradual transition from a full sound state to a no sound state in time series.
- the fade-out process as opposed to the fade-in process, is a process of gradually eliminating the effects of sound effects.
- one frame of audio frames is deleted, so that the designated channels are restored to a state synchronized with other channels. It can quickly turn on and/or off channel-based asynchronous sound, improving the efficiency of switching sound effects.
- the sample point value before the frame segmentation position in the first audio frame and the sample point value after the frame segmentation position in the second audio frame may be performed on the specified channel, and a step of sequentially stitching to generate a fourth audio frame, and replacing the first audio frame and the second audio frame with the fourth audio frame, and performing a fade-in process on the replaced fourth audio frame to replace the fourth audio frame
- the audio frame gradually transitions from a no-sound state to a full-sound state in time series.
- step 202, step 204, and the first half of step 206 may be performed on the specified channel: acquiring the sample point value before the frame splitting position in the second audio frame and the first audio frame.
- the sample point values after the frame division position are spliced in order to generate a third audio frame, and the third audio frame is inserted between the first audio frame and the second audio frame.
- the inserted third audio frame is fade-out processed, so that the inserted third audio frame gradually transitions from the complete sound state to the no-sound state in time series.
- the sound effect based on the channel asynchronous can be quickly turned on and/or off, and the efficiency of switching the sound effect is improved.
- an audio data processing method includes the following steps:
- Step 1102 When the sound effect is turned on, the adjacent first audio frame and the second audio frame are acquired from the audio data stream of the specified channel, and the first audio frame precedes the second audio frame in time series.
- Step 1104 Acquire a first candidate location, where the sample point value at the first candidate position in the first audio frame and the sample point value at the corresponding first candidate position in the second audio frame satisfy a distance approach condition.
- the distance proximity condition may be that the product of the first difference value and the second difference value is less than or equal to zero.
- the first difference is a difference between the sample point value at the candidate position in the first audio frame and the sample point value at the corresponding candidate position in the second audio frame.
- the second difference is the difference between the sample point value of the next position of the candidate position in the first audio frame and the sample point value at the corresponding position in the second audio frame.
- Step 1106 Acquire a distance sum of each sample point value pair in a range of discrete positions covering a preset length of the first candidate position in the first audio frame and the second audio frame.
- Step 1108 Determine the minimum distance and the corresponding first candidate position as the first frame division position.
- Step 1110 Acquire a sample point value before the frame division position in the second audio frame and sample point values after the frame division position in the first audio frame, and splicing in order to generate a third audio frame.
- Step 1112 Insert a third audio frame between the first audio frame and the second audio frame.
- Step 1114 Perform a fade-in process on the inserted third audio frame, so that the inserted third audio frame gradually transitions from the no-sound state to the full-sound state in time series.
- Step 1116 when the sound effect is turned off, the adjacent fifth audio frame and the sixth audio frame are acquired from the audio data stream of the specified channel, and the fifth audio frame precedes the sixth audio frame in time series.
- the fifth audio frame is equivalent to the first audio frame used to generate the fourth audio frame in step 206 of the embodiment shown in FIG. 2, and the sixth audio frame is equivalent to the step 206 in the embodiment shown in FIG. 2.
- Step 1118 Acquire a second candidate position, where the sample point value at the second candidate position in the fifth audio frame and the sample point value at the corresponding second candidate position in the sixth audio frame satisfy the distance close condition.
- the distance proximity condition may be that the product of the first difference value and the second difference value is less than or equal to zero.
- the first difference is a difference between the sample point value at the candidate position in the fifth audio frame and the sample point value at the corresponding candidate position in the sixth audio frame.
- the second difference is a difference between a sample point value of a next position of the candidate position in the fifth audio frame and a sample point value at a corresponding position in the sixth audio frame.
- Step 1120 Acquire a distance sum of each sample point value pair in a range of discrete positions covering a preset length of the second candidate position in the fifth audio frame and the sixth audio frame.
- Step 1122 Determine the minimum distance and the corresponding second candidate position as the second frame division position.
- Step 1124 Acquire a sample point value before the second frame split position in the fifth audio frame and sample point values after the second frame split position in the sixth audio frame, and splicing in order to generate a fourth audio frame.
- Step 1126 the fifth audio frame and the sixth audio frame are replaced together with the fourth audio frame.
- Step 1128 Perform a fade-out process on the replaced fourth audio frame, so that the replaced fourth audio frame gradually transitions from the full sound state to the no-sound state in time series.
- the part before the frame division position of the second audio frame is spliced with the portion after the frame division position of the first audio frame to obtain a third audio frame, and the first audio is inserted.
- the front portion of the third audio frame is the front portion of the second audio frame
- the rear portion of the third audio frame is the rear portion of the first audio frame. Since the first audio frame and the second audio frame are themselves seamlessly connected, the first audio frame can be seamlessly connected to the front portion of the third audio frame, and the rear portion of the third audio frame is seamlessly connected to the second audio frame.
- the third audio frame satisfies the distance close condition at the frame division position, so that the splicing does not cause too much abrupt change, so that the noise problem due to the jump between the audio frames when the audio frame is inserted can be substantially overcome.
- the fourth audio frame is obtained by splicing the previous part of the frame division position of the first audio frame with the part after the frame division position of the second audio frame, and replacing the first audio frame and the second audio. frame.
- the front portion of the fourth audio frame is the front portion of the first audio frame and the rear portion of the fourth audio frame is the rear portion of the second audio frame.
- the replaced fourth audio frame can be seamlessly connected with the previous audio frame of the first audio frame, and The latter audio frame of the second audio frame is seamlessly connected, and the fourth audio frame satisfies the distance close condition at the frame division position, so that the splicing does not cause too much mutation, so the audio can be basically overcome when the audio frame is deleted. Noise problems caused by jumps between frames.
- the present application also provides a terminal.
- the internal structure of the terminal may correspond to the structure shown in FIG. 1.
- Each of the following modules may be implemented in whole or in part by software, hardware, or a combination thereof.
- the terminal 1200 includes an audio frame acquisition module 1201 and a frame division location determining module 1202, and further includes at least one of an audio frame insertion module 1203 and an audio frame replacement module 1204.
- the audio frame obtaining module 1201 is configured to obtain an adjacent first audio frame and a second audio frame from the audio data stream, where the first audio frame precedes the second audio frame in time series.
- the audio data stream includes a series of sample point values having timings obtained by sampling the original analog sound signal at a specific audio sample rate, and a series of sample point values can describe the sound.
- the audio sampling rate is the number of sampling points collected in one second, in Hertz. The higher the audio sampling rate, the higher the frequency of the sound wave that can be described.
- the audio frame includes a fixed number of sample point values with timing. According to the encoding format of the audio data stream, if the encoding format itself has an audio frame, it is directly used. If there is no audio frame but only a series of sampling point values with timing, the series of timing samples can be selected according to the preset frame length.
- the audio frame is divided into point values.
- the preset frame length refers to the number of sample point values included in a preset one frame of audio frame.
- the first audio frame and the second audio frame acquired by the audio frame obtaining module 1201 from the audio data stream are adjacent, and the first audio frame is temporally preceded by the second audio frame, that is, the audio data stream is played.
- the first audio frame is played first
- the second audio frame is played after the first audio frame is played.
- the first audio frame and the second audio frame are two adjacent audio frames that need to be inserted between the two.
- the frame segmentation position determining module 1202 is configured to determine a frame segmentation position, where the sample point value at the frame segmentation position in the first audio frame and the sample point value at the frame segmentation position in the second audio frame satisfy the distance approach condition.
- the frame division position refers to a position at which the first audio frame and the second audio frame are divided, and is a relative position with respect to one audio frame.
- the distance refers to the absolute value of the difference of the sample point value pairs at the corresponding positions in the two audio frames. For example, referring to the local sampling point value distribution map of the first audio frame A shown in FIG. 4 and the local sampling point value distribution map of the second audio frame B shown in FIG. 5, the first sampling of the first audio frame A The absolute value of the difference between the point value and the first sample point value of the second audio frame B is the first sample point value of the first audio frame A and the first sample point value of the second audio frame B. distance.
- the distance approach condition refers to a quantization condition used to determine whether the distances of the two sample point values are close.
- the distance proximity condition may include a case where the distance is equal to 0, and may also include a case where the distances of the two sample point values are not equal but close, such as the distance is less than or equal to the threshold, and the threshold may be preset or may be It is dynamically determined based on the sample point values in the first audio frame and/or the second audio frame, such as may be the average of the sample point values in the first audio frame and/or the second audio frame multiplied by a preset percentage.
- the frame splitting position determining module 1202 can calculate the distance of each sample point value pair in the first audio frame and the second audio frame, thereby filtering out the sample point value pair with the smallest distance, and the frame splitting position is screening.
- the distance from the sample point value with the smallest distance is the corresponding position.
- the distance close condition is that the distance of the pair of sample point values corresponding to the frame division position in the first audio frame and the second audio frame is minimized.
- the sample point value pair here refers to two sample point values at the same position in two audio frames, and the position of the sample point value is the relative position of the sample point value relative to the associated audio frame.
- the audio frame insertion module 1203 is configured to acquire sample point values before the frame segmentation position in the second audio frame and sample point values after the frame segmentation position in the first audio frame, and sequentially splicing to generate a third audio frame, and A three audio frame is inserted between the first audio frame and the second audio frame.
- the audio frame insertion module 1203 acquires the sampling point value before the frame division position in the second audio frame, and acquires the sampling point value after the frame division position in the first audio frame, and acquires the sampling point.
- the total number of values is exactly equal to the length of one audio frame.
- the sample point value from the second audio frame is preceded, and the sample point values from the first audio frame are spliced sequentially in order to generate a third audio frame.
- the sample point values from the second audio frame remain in the order in which they are located in the second audio frame, and the sample point values from the first audio frame remain in the order in which they were located in the first audio frame.
- the generated third audio frame is inserted between the first audio frame and the second audio frame.
- the audio frame replacement module 1204 is configured to obtain sample point values before the frame segmentation position in the first audio frame and sample point values after the frame segmentation position in the second audio frame, and sequentially splicing to generate a fourth audio frame, and An audio frame and a second audio frame are replaced with a fourth audio frame.
- the audio frame replacement module 1204 obtains the sampling point value before the frame division position in the first audio frame, and acquires the sampling point value after the frame division position in the second audio frame, and the total number of the sampled point values obtained. It is exactly equal to the length of one audio frame.
- the sample point values from the first audio frame are forwarded, and the sample point values from the second audio frame are sequentially spliced in order to obtain a fourth audio frame.
- the sample point values from the first audio frame remain in the order in which they are located in the first audio frame, and the sample point values from the second audio frame remain in the order in the second audio frame. Finally, the first audio frame and the second audio frame are replaced with the generated fourth audio frame.
- the terminal 1200 when the audio frame needs to be inserted, splicing the portion before the frame division position of the second audio frame with the portion after the frame division position of the first audio frame to obtain a third audio frame, inserting the first audio frame and Between the second audio frames.
- the front portion of the third audio frame is the front portion of the second audio frame
- the rear portion of the third audio frame is the rear portion of the first audio frame. Since the first audio frame and the second audio frame are themselves seamlessly connected, the first audio frame can be seamlessly connected to the front portion of the third audio frame, and the rear portion of the third audio frame is seamlessly connected to the second audio frame.
- the third audio frame satisfies the distance proximity condition at the frame division position, so that the splicing portion does not cause too much The change can therefore substantially overcome the noise problem caused by the jump between audio frames when inserting an audio frame.
- the fourth audio frame is obtained by splicing the previous part of the frame division position of the first audio frame with the part after the frame division position of the second audio frame, and replacing the first audio frame and the second audio. frame.
- the front portion of the fourth audio frame is the front portion of the first audio frame and the rear portion of the fourth audio frame is the rear portion of the second audio frame.
- the replaced fourth audio frame can be seamlessly connected with the previous audio frame of the first audio frame, and The latter audio frame of the second audio frame is seamlessly connected, and the fourth audio frame satisfies the distance close condition at the frame division position, so that the splicing does not cause too much mutation, so the audio can be basically overcome when the audio frame is deleted. Noise problems caused by jumps between frames.
- the terminal 1200 further includes: a copy retention module 1205, configured to reserve a copy of the sample point value of at least one audio frame length when performing real-time playback processing on the audio data stream.
- a copy retention module 1205 configured to reserve a copy of the sample point value of at least one audio frame length when performing real-time playback processing on the audio data stream.
- the audio frame obtaining module 1201 is further configured to: when detecting an instruction for inserting an audio frame, obtain a first audio frame according to a copy that is reserved before a sampling point value currently being played, and according to a sampling point that is currently performing playback processing A sample point value of one audio frame length after the value obtains a second audio frame.
- the playback process refers to a process of restoring a sound signal according to the sample point value, and retaining a copy of the sample point value of at least one audio frame length, that is, retaining a copy of at least one audio frame.
- the copy retention module 1205 when performing playback processing on a sample point value A1, the copy retention module 1205 retains the copy A1' of the sample point value A1, and performs the sample point value of the playback process before the sample point value A1.
- the copy is also retained, and the total length of the retained copy is at least one audio frame length.
- the copy retention module 1205 also retains the copy B1' of the sample point value B1, and the reserved copy at this time includes at least the copy A of the audio frame A. '.
- the audio frame acquisition module 1201 will copy the copy A1' to the current playback process.
- a copy of the sample point value of the one audio frame length between the sample values B1 is taken as the first audio frame A, and the audio frame B of one audio frame length after the sample point value B1 is taken as the second audio frame.
- the response can be immediately responded when an instruction for inserting the audio frame is detected, and there is no need to wait for an audio frame length. The time increases the efficiency of inserting audio frames.
- the frame splitting location determining module 1202 includes a candidate location obtaining module 1202a, a similarity metric module 1202b, and a determining module 1202c.
- the candidate location obtaining module 1202a is configured to acquire a candidate location, where the sampling point value at the candidate location in the first audio frame and the sampling point value at the corresponding candidate location in the second audio frame satisfy a distance proximity condition.
- the similarity metric module 1202b is configured to obtain local similarities at the corresponding candidate locations in the first audio frame and the second audio frame.
- the determining module 1202c is configured to determine a frame splitting position according to the local similarity.
- the candidate location obtaining module 1202a is configured to acquire a candidate location, where the sampling point value at the candidate location in the first audio frame and the sampling point value at the corresponding candidate location in the second audio frame satisfy the distance proximity condition.
- the candidate position is a position in the selected audio frame that can be used as a frame division position, and the position is discrete, and each sample point value corresponds to a discrete position.
- the specific candidate location obtaining module 1202a may traverse all the locations in the audio frame, and when traversing to each location, determine whether the pair of sampling point values in the first audio frame and the corresponding position in the second audio frame satisfy the distance proximity condition. If the distance proximity condition is satisfied, the candidate location acquisition module 1202a adds the traversed location to the candidate location set and continues the traversal; if the distance proximity condition is not met, the traversal continues.
- the candidate location obtaining module 1202a may select a preset location (such as an intermediate location of the audio frame) or a location where the distance of the pair of sampled values is the smallest to be added to the set of candidate locations.
- the distance approach condition refers to a quantization condition used to determine whether the distances of the two sample point values are close.
- the distance proximity condition may include a case where the distance is equal to 0, and may also include a case where the distances of the two sample point values are not equal but close, such as the distance is less than or equal to the threshold, and the threshold may be preset or may be It is dynamically determined based on sample point values in the first audio frame and/or the second audio frame.
- the candidate location acquisition module 1202a may calculate the distance of each sample point value pair in the first audio frame and the second audio frame and sort the data in ascending order, thereby ranking the position corresponding to the preset predetermined number of distances.
- the distance proximity condition is that the distance of the sample point value pairs corresponding to the candidate positions in the first audio frame and the second audio frame is a preset number of all the calculated distances sorted in ascending order. distance.
- the position corresponding to the preset distance of all the calculated distances may be obtained from the minimum distance of the sorted distances, and the distance proximity condition is the first audio frame and the second audio frame.
- the distance of the pair of sample point values corresponding to the candidate position is the distance from the top of all the calculated distances after all the calculated distances are sorted in ascending order.
- the distance proximity condition is that a product of the first difference value and the second difference value is less than or equal to 0; wherein the first difference value is a sample point value at the candidate position in the first audio frame and the second audio frame The difference of the sample point values at the corresponding candidate positions; the second difference is the difference between the sample point value of the next position of the candidate position in the first audio frame and the sample point value at the corresponding position in the second audio frame.
- the distance close condition can be used as follows.
- Formula (1) means:
- i denotes a candidate position in the first audio frame A and the second audio frame B, which may be referred to as a sample point value sequence number, where m is an audio frame length;
- (a i -b i ) is a first difference value, indicating a frame of an audio sampling point candidate value at position i a i and a second audio frame B corresponding sample point candidate at the position i of the difference value b i;
- (a i + 1 -b i + 1) for the first two difference indicating that the next position of the first audio frame in the candidate position i a i + 1 of the sampling point values of a i + 1 and a second position corresponding audio frame B i + 1 at the sampling point values B i + 1
- the difference of formula; (1) represents that the product of the first difference (a i -b i ) and the second difference (a i+1 -b i+1 ) is less than or equal to zero.
- the distance approach condition represented by the above formula (1) is to find the intersection of the first fitting curve composed of the sampling point values of the first audio frame and the second fitting curve composed of the sampling point values in the second audio frame. It is also possible to determine the intersection by other means of finding the intersection of the two curves. If the intersection is exactly the location of a sample point value, the location is added to the candidate location set; if the intersection is not any sample The position of the point value can be added to the candidate location set among the positions of all the positions of the audio frame closest to the intersection. For example, if the first fitting curve and the second fitting curve in FIG. 10 have an intersection X, the two positions S1 or S2 closest to the intersection X may be added to the candidate position set. Other ways to find the intersection of two curves, for example, first obtain the mathematical expression of the two fitting curves, and then directly calculate the intersection by function calculation. The distance represented by the above formula (1) is more efficient than the condition.
- the similarity measurement module 1202b is configured to obtain a distance sum of each sample point value pair in a range of discrete positions of the preset length of the coverage candidate position in the first audio frame and the second audio frame.
- the discrete location range covering the preset length of the candidate location includes a candidate location, the discrete location set includes a fixed number of discrete locations, that is, a preset length, and the locations in the location set are sequentially adjacent.
- the similarity metric module 1202b may specifically select candidate locations from the set of candidate locations one by one, and acquire pairs of sampled point values in the first audio frame and the second audio frame within a discrete location range covering a preset length of the selected candidate location. The distance and.
- the similarity metric module 1202b may employ the following formula (2) to obtain the distance of each sample point value pair within a discrete position range of the preset length of the coverage candidate position in the first audio frame and the second audio frame. with:
- N may take [1, (m-1)/2], preferably [2, (m-1) / 100], preferably 5; candidate position is n + N,
- the discrete position range is N positions with the candidate position n+N centered on the left and right sides and the candidate position n+N constitute a discrete position range [n,...,n+N,...2N with a preset length of 2N+1.
- the determining module 1202c is configured to determine the minimum distance and the corresponding candidate position as the frame splitting position.
- the similarity metric module 1202b is configured to obtain local similarity at the corresponding candidate position in the first audio frame and the second audio frame, and the determining module 1202c is configured to determine the frame splitting bit according to the local similarity. Set.
- the distance and the subsequent candidate distances may be respectively calculated for all the candidate locations in the candidate location set, and the minimum distance and the corresponding candidate location are found as frames.
- Split position Specifically, it can be expressed as the following formula (3):
- the determined frame segmentation position also satisfies the distance proximity condition: the product of the first difference value and the second difference value is less than or equal to 0; wherein the first difference value is the sample point value at the frame segmentation position in the first audio frame and the second audio a difference of sample point values at a corresponding frame division position in the frame; the second difference is a sample point value of a next position of the frame division position in the first audio frame and a sample point value at a corresponding position in the second audio frame Difference.
- the candidate position at the intersection which is the most similar near the intersection of the first fitting curve and the second fitting curve is found as the frame division position.
- the local similarity at the candidate position refers to the degree to which the first fitting curve and the second fitting curve are similar in a fixed range near the candidate position, and the smaller the local similarity calculated by the above formula (2), the more similar. If the first fitted curve and the second fitted curve are similar in the vicinity of the candidate positions, the corresponding two curves have more similar slopes, and the third audio frame transition obtained after the splitting is more gentle, and the noise suppression effect is more it is good.
- the local similarity can also be obtained by calculating the cross-correlation degree by the cross-correlation function.
- the cross-correlation function can also express the similarity degree of the two signals, if applied to this scheme, when calculating the cross-correlation degree of a small number of points, separate The two large large sample points in the same direction may obtain a large cross-correlation, indicating that the two curves are more similar, but not the optimal frame segmentation position.
- the local similarity obtained by the above formula (2) overcomes the shortcomings of calculating the cross-correlation degree by using the cross-correlation function.
- the sampling point value of each position in the formula (2) plays a relatively balanced role in calculating the cross-correlation degree.
- the absolute value of the difference as the action value of the sampling point value of one position, the slope difference before and after the intersection point can be well described, and the most suitable candidate position can be found as the frame segmentation position.
- the audio frame insertion module 1203 is further configured to acquire the second audio for the adjacent first audio frame and the second audio frame acquired from the audio data stream of the specified channel when the sound effect is turned on.
- the sample point value before the frame division position in the frame and the sample point value after the frame division position in the first audio frame are sequentially spliced to generate a third audio frame, and the third audio frame is inserted into the first audio frame and the second audio.
- the inserted third audio frame is fade-in processed, so that the inserted third audio frame gradually transitions from the no-sound state to the complete sound state in time series.
- the audio frame replacement module 1204 is further configured to: when the sound effect is turned off, obtain the sample point value before the frame segmentation position in the first audio frame and the sample point value after the frame segmentation position in the second audio frame, and sequentially splicing Generating a fourth audio frame, and replacing the first audio frame and the second audio frame with the fourth audio frame, and performing a fade-out process on the replaced fourth audio frame, and replacing the fourth audio frame with the timing Gradually transition from a full sound state to a no sound effect.
- the audio frame replacement module 1204 is further configured to acquire the first one of the first audio frame and the second audio frame that are acquired from the audio data stream of the specified channel when the sound effect is turned on.
- the sample point value before the frame division position in the audio frame and the sample point value after the frame division position in the second audio frame are sequentially spliced to generate a fourth audio frame, and the first audio frame and the second audio frame are replaced together
- the fourth audio frame is subjected to the fade-out processing of the replaced fourth audio frame, so that the replaced fourth audio frame gradually transitions from the complete sound state to the no-sound state in time series.
- the audio frame insertion module 1203 is further configured to acquire a frame segmentation in the second audio frame for the adjacent first audio frame and the second audio frame acquired from the audio data stream of the specified channel when the sound effect is turned off. Positioning the previous sample point value and the sample point value after the frame division position in the first audio frame, splicing in order to generate a third audio frame, and inserting the third audio frame between the first audio frame and the second audio frame, And inserting the inserted third audio frame to fade out, so that the inserted third audio frame gradually transitions from the complete sound state to the no sound state in time series.
- the storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a read-only memory (ROM), or a random access memory (RAM).
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- Acoustics & Sound (AREA)
- Signal Processing (AREA)
- Human Computer Interaction (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Quality & Reliability (AREA)
- Mathematical Physics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Signal Processing For Digital Recording And Reproducing (AREA)
- Management Or Editing Of Information On Record Carriers (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
- Stereophonic System (AREA)
Abstract
Description
Claims (20)
- 一种音频数据处理方法,包括:从音频数据流中获取相邻的第一音频帧和第二音频帧,所述第一音频帧在时序上先于所述第二音频帧;确定帧分割位置,所述第一音频帧中所述帧分割位置处的采样点值与所述第二音频帧中所述帧分割位置处的采样点值满足距离接近条件;及获取所述第二音频帧中帧分割位置以前的采样点值以及所述第一音频帧中帧分割位置以后的采样点值,按顺序拼接以生成第三音频帧,并将所述第三音频帧插入所述第一音频帧和第二音频帧之间。
- 根据权利要求1所述的方法,其特征在于,还包括:在对音频数据流进行实时的播放处理时,保留至少一个音频帧长度的采样点值的副本;所述从音频数据流中获取相邻的第一音频帧和第二音频帧包括:在检测到用于插入音频帧的指令时,根据当前正在进行播放处理的采样点值之前保留的副本获得第一音频帧,并根据当前正在进行播放处理的采样点值之后的一个音频帧长度的采样点值获得第二音频帧。
- 根据权利要求1所述的方法,其特征在于,所述确定帧分割位置包括:获取候选位置,所述第一音频帧中所述候选位置处的采样点值与所述第二音频帧中相应候选位置处的采样点值满足距离接近条件;获取第一音频帧和第二音频帧中在相应的候选位置处的局部相似度;及根据所述局部相似度确定帧分割位置。
- 根据权利要求1所述的方法,其特征在于,所述确定帧分割位置包括:获取候选位置,所述第一音频帧中所述候选位置处的采样点值与所述第二音频帧中相应候选位置处的采样点值满足距离接近条件;获取所述第一音频帧和所述第二音频帧中在覆盖所述候选位置的预设长度的离散位置范围内的各采样点值对的距离和;及将最小距离和所对应的候选位置确定为帧分割位置。
- 根据权利要求4所述的方法,其特征在于,所述距离接近条件为:第一差值与第二差值的乘积小于等于0;其中,所述第一差值为所述第一音频帧中所述候选位置处的采样点值与所述第二音频帧中相应候选位置处的采样点值的差值;所述第二差值为所述第一音频帧中所述候选位置的下一位置的采样点值与所述第二音频帧中相应位置处的采样点值的差值。
- 根据权利要求1所述的方法,其特征在于,还包括:对于在开启音效时从指定声道的音频数据流中获取的相邻的所述第一音频帧和所述第二音频帧,执行所述获取所述第二音频帧中帧分割位置以前的采样点值以及所述第一音频帧中帧分割位置以后的采样点值,按顺序拼接以生成第三音频帧,并将所述第三音频帧插入所述第一音频帧和第二音频帧之间的步骤,并对插入的第三音频帧进行淡入处理,使插入的第三音频帧按时序从无音效状态逐渐过渡到完整音效状态。
- 一种音频数据处理方法,包括:从音频数据流中获取相邻的第一音频帧和第二音频帧,所述第一音频帧在时序上先于所述第二音频帧;确定帧分割位置,所述第一音频帧中所述帧分割位置处的采样点值与所述第二音频帧中所述帧分割位置处的采样点值满足距离接近条件;及获取所述第一音频帧中帧分割位置以前的采样点值以及所述第二音频帧中帧分割位置以后的采样点值,按顺序拼接以生成第四音频帧,并将所述第一音频帧和第二音频帧一并替换为所述第四音频帧。
- 根据权利要求7所述的方法,其特征在于,还包括:在对音频数据流进行实时的播放处理时,保留至少一个音频帧长度的采样点值的副本;所述从音频数据流中获取相邻的第一音频帧和第二音频帧包括:在检测到用于插入音频帧的指令时,根据当前正在进行播放处理的采样点值之前保留的副本获得第一音频帧,并根据当前正在进行播放处理的采样 点值之后的一个音频帧长度的采样点值获得第二音频帧。
- 根据权利要求7所述的方法,其特征在于,所述确定帧分割位置包括:获取候选位置,所述第一音频帧中所述候选位置处的采样点值与所述第二音频帧中相应候选位置处的采样点值满足距离接近条件;获取第一音频帧和第二音频帧中在相应的候选位置处的局部相似度;及根据所述局部相似度确定帧分割位置。
- 根据权利要求7所述的方法,其特征在于,所述确定帧分割位置包括:获取候选位置,所述第一音频帧中所述候选位置处的采样点值与所述第二音频帧中相应候选位置处的采样点值满足距离接近条件;获取所述第一音频帧和所述第二音频帧中在覆盖所述候选位置的预设长度的离散位置范围内的各采样点值对的距离和;及将最小距离和所对应的候选位置确定为帧分割位置。
- 根据权利要求10所述的方法,其特征在于,所述距离接近条件为:第一差值与第二差值的乘积小于等于0;其中,所述第一差值为所述第一音频帧中所述候选位置处的采样点值与所述第二音频帧中相应候选位置处的采样点值的差值;所述第二差值为所述第一音频帧中所述候选位置的下一位置的采样点值与所述第二音频帧中相应位置处的采样点值的差值。
- 根据权利要求7所述的方法,其特征在于,还包括:对于在开启音效时从指定声道的音频数据流中获取的相邻的所述第一音频帧和所述第二音频帧,执行所述获取所述第一音频帧中帧分割位置以前的采样点值以及所述第二音频帧中帧分割位置以后的采样点值,按顺序拼接以生成第四音频帧,并将所述第一音频帧和第二音频帧一并替换为所述第四音频帧的步骤,并对替换为的所述第四音频帧进行淡入处理,使替换为的所述第四音频帧按时序从无音效状态逐渐过渡到完整音效状态。
- 一种终端,包括存储器和处理器,所述存储器中储存有计算机可读 指令,其特征在于,所述计算机可读指令被所述处理器执行时,使得所述处理器执行以下步骤:从音频数据流中获取相邻的第一音频帧和第二音频帧,所述第一音频帧在时序上先于所述第二音频帧;确定帧分割位置,所述第一音频帧中所述帧分割位置处的采样点值与所述第二音频帧中所述帧分割位置处的采样点值满足距离接近条件;及获取所述第二音频帧中帧分割位置以前的采样点值以及所述第一音频帧中帧分割位置以后的采样点值,按顺序拼接以生成第三音频帧,并将所述第三音频帧插入所述第一音频帧和第二音频帧之间。
- 根据权利要求13所述的终端,其特征在于,所述计算机可读指令被所述处理器执行时,还使得所述处理器执行以下步骤:在对音频数据流进行实时的播放处理时,保留至少一个音频帧长度的采样点值的副本;所述从音频数据流中获取相邻的第一音频帧和第二音频帧包括:在检测到用于插入音频帧的指令时,根据当前正在进行播放处理的采样点值之前保留的副本获得第一音频帧,并根据当前正在进行播放处理的采样点值之后的一个音频帧长度的采样点值获得第二音频帧。
- 根据权利要求13所述的终端,其特征在于,所述确定帧分割位置包括:获取候选位置,所述第一音频帧中所述候选位置处的采样点值与所述第二音频帧中相应候选位置处的采样点值满足距离接近条件;获取第一音频帧和第二音频帧中在相应的候选位置处的局部相似度;及根据所述局部相似度确定帧分割位置。
- 根据权利要求13所述的终端,其特征在于,所述确定帧分割位置包括:获取候选位置,所述第一音频帧中所述候选位置处的采样点值与所述第二音频帧中相应候选位置处的采样点值满足距离接近条件;获取所述第一音频帧和所述第二音频帧中在覆盖所述候选位置的预设长度的离散位置范围内的各采样点值对的距离和;及将最小距离和所对应的候选位置确定为帧分割位置。
- 根据权利要求16所述的终端,其特征在于,所述距离接近条件为:第一差值与第二差值的乘积小于等于0;其中,所述第一差值为所述第一音频帧中所述候选位置处的采样点值与所述第二音频帧中相应候选位置处的采样点值的差值;所述第二差值为所述第一音频帧中所述候选位置的下一位置的采样点值与所述第二音频帧中相应位置处的采样点值的差值。
- 根据权利要求13所述的终端,其特征在于,所述计算机可读指令被所述处理器执行时,还使得所述处理器执行以下步骤:对于在开启音效时从指定声道的音频数据流中获取的相邻的所述第一音频帧和所述第二音频帧,执行所述获取所述第二音频帧中帧分割位置以前的采样点值以及所述第一音频帧中帧分割位置以后的采样点值,按顺序拼接以生成第三音频帧,并将所述第三音频帧插入所述第一音频帧和第二音频帧之间的步骤,并对插入的第三音频帧进行淡入处理,使插入的第三音频帧按时序从无音效状态逐渐过渡到完整音效状态。
- 一种终端,包括存储器和处理器,所述存储器中储存有计算机可读指令,其特征在于,所述计算机可读指令被所述处理器执行时,使得所述处理器执行以下步骤:从音频数据流中获取相邻的第一音频帧和第二音频帧,所述第一音频帧在时序上先于所述第二音频帧;确定帧分割位置,所述第一音频帧中所述帧分割位置处的采样点值与所述第二音频帧中所述帧分割位置处的采样点值满足距离接近条件;及获取所述第一音频帧中帧分割位置以前的采样点值以及所述第二音频帧中帧分割位置以后的采样点值,按顺序拼接以生成第四音频帧,并将所述第一音频帧和第二音频帧一并替换为所述第四音频帧。
- 根据权利要求19所述的终端,其特征在于,所述计算机可读指令被所述处理器执行时,还使得所述处理器执行以下步骤:在对音频数据流进行实时的播放处理时,保留至少一个音频帧长度的采样点值的副本;所述从音频数据流中获取相邻的第一音频帧和第二音频帧包括:在检测到用于插入音频帧的指令时,根据当前正在进行播放处理的采样点值之前保留的副本获得第一音频帧,并根据当前正在进行播放处理的采样点值之后的一个音频帧长度的采样点值获得第二音频帧。
Priority Applications (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2018529129A JP6765650B2 (ja) | 2016-01-14 | 2017-01-10 | オーディオデータ処理方法および端末 |
EP17738118.3A EP3404652B1 (en) | 2016-01-14 | 2017-01-10 | Audio data processing method and terminal |
KR1020187016293A KR102099029B1 (ko) | 2016-01-14 | 2017-01-10 | 오디오 데이터 처리 방법 및 단말기 |
MYPI2018701827A MY191125A (en) | 2016-01-14 | 2017-01-10 | Audio data processing method and terminal |
US15/951,078 US10194200B2 (en) | 2016-01-14 | 2018-04-11 | Audio data processing method and terminal |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610025708.1A CN106970771B (zh) | 2016-01-14 | 2016-01-14 | 音频数据处理方法和装置 |
CN201610025708.1 | 2016-01-14 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/951,078 Continuation-In-Part US10194200B2 (en) | 2016-01-14 | 2018-04-11 | Audio data processing method and terminal |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2017121304A1 true WO2017121304A1 (zh) | 2017-07-20 |
Family
ID=59310835
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2017/070692 WO2017121304A1 (zh) | 2016-01-14 | 2017-01-10 | 音频数据处理方法和终端 |
Country Status (7)
Country | Link |
---|---|
US (1) | US10194200B2 (zh) |
EP (1) | EP3404652B1 (zh) |
JP (1) | JP6765650B2 (zh) |
KR (1) | KR102099029B1 (zh) |
CN (1) | CN106970771B (zh) |
MY (1) | MY191125A (zh) |
WO (1) | WO2017121304A1 (zh) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108347672B (zh) * | 2018-02-09 | 2021-01-22 | 广州酷狗计算机科技有限公司 | 播放音频的方法、装置及存储介质 |
CN109086026B (zh) * | 2018-07-17 | 2020-07-03 | 阿里巴巴集团控股有限公司 | 播报语音的确定方法、装置和设备 |
CN109346111B (zh) * | 2018-10-11 | 2020-09-04 | 广州酷狗计算机科技有限公司 | 数据处理方法、装置、终端及存储介质 |
CN111613195B (zh) * | 2019-02-22 | 2022-12-09 | 浙江大学 | 音频拼接方法、装置及存储介质 |
CN111954027B (zh) * | 2020-08-06 | 2022-07-08 | 浩联时代(北京)科技有限公司 | 流媒体数据转码方法、装置、计算设备及可读存储介质 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060085197A1 (en) * | 2000-12-28 | 2006-04-20 | Yamaha Corporation | Singing voice-synthesizing method and apparatus and storage medium |
CN101425291A (zh) * | 2007-10-31 | 2009-05-06 | 株式会社东芝 | 语音处理装置及语音处理方法 |
CN101640053A (zh) * | 2009-07-24 | 2010-02-03 | 王祐凡 | 音频处理方法、装置以及音频播放方法、装置 |
CN101789240A (zh) * | 2009-12-25 | 2010-07-28 | 华为技术有限公司 | 语音信号处理方法和装置以及通信系统 |
CN103905843A (zh) * | 2014-04-23 | 2014-07-02 | 无锡天脉聚源传媒科技有限公司 | 一种规避连续i帧的分布式音视频处理装置和处理方法 |
US20140236584A1 (en) * | 2013-02-21 | 2014-08-21 | Qualcomm Incorporated | Systems and methods for quantizing and dequantizing phase information |
Family Cites Families (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5490234A (en) * | 1993-01-21 | 1996-02-06 | Apple Computer, Inc. | Waveform blending technique for text-to-speech system |
JP3017715B2 (ja) * | 1997-10-31 | 2000-03-13 | 松下電器産業株式会社 | 音声再生装置 |
JP3744216B2 (ja) * | 1998-08-07 | 2006-02-08 | ヤマハ株式会社 | 波形形成装置及び方法 |
US7272556B1 (en) * | 1998-09-23 | 2007-09-18 | Lucent Technologies Inc. | Scalable and embedded codec for speech and audio signals |
JP2004109362A (ja) * | 2002-09-17 | 2004-04-08 | Pioneer Electronic Corp | フレーム構造のノイズ除去装置、フレーム構造のノイズ除去方法およびフレーム構造のノイズ除去プログラム |
JP4219898B2 (ja) * | 2002-10-31 | 2009-02-04 | 富士通株式会社 | 音声強調装置 |
JP4406440B2 (ja) * | 2007-03-29 | 2010-01-27 | 株式会社東芝 | 音声合成装置、音声合成方法及びプログラム |
US20090048827A1 (en) * | 2007-08-17 | 2009-02-19 | Manoj Kumar | Method and system for audio frame estimation |
US8762852B2 (en) * | 2010-11-04 | 2014-06-24 | Digimarc Corporation | Smartphone-based methods and systems |
JP5784939B2 (ja) | 2011-03-17 | 2015-09-24 | スタンレー電気株式会社 | 発光素子、発光素子モジュールおよび車両用灯具 |
US9066121B2 (en) * | 2011-08-09 | 2015-06-23 | Google Technology Holdings LLC | Addressable advertising switch by decoupling decoding from service acquisitions |
US9043201B2 (en) * | 2012-01-03 | 2015-05-26 | Google Technology Holdings LLC | Method and apparatus for processing audio frames to transition between different codecs |
CN104519401B (zh) * | 2013-09-30 | 2018-04-17 | 贺锦伟 | 视频分割点获得方法及设备 |
-
2016
- 2016-01-14 CN CN201610025708.1A patent/CN106970771B/zh active Active
-
2017
- 2017-01-10 MY MYPI2018701827A patent/MY191125A/en unknown
- 2017-01-10 WO PCT/CN2017/070692 patent/WO2017121304A1/zh active Application Filing
- 2017-01-10 KR KR1020187016293A patent/KR102099029B1/ko active IP Right Grant
- 2017-01-10 JP JP2018529129A patent/JP6765650B2/ja active Active
- 2017-01-10 EP EP17738118.3A patent/EP3404652B1/en active Active
-
2018
- 2018-04-11 US US15/951,078 patent/US10194200B2/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060085197A1 (en) * | 2000-12-28 | 2006-04-20 | Yamaha Corporation | Singing voice-synthesizing method and apparatus and storage medium |
CN101425291A (zh) * | 2007-10-31 | 2009-05-06 | 株式会社东芝 | 语音处理装置及语音处理方法 |
CN101640053A (zh) * | 2009-07-24 | 2010-02-03 | 王祐凡 | 音频处理方法、装置以及音频播放方法、装置 |
CN101789240A (zh) * | 2009-12-25 | 2010-07-28 | 华为技术有限公司 | 语音信号处理方法和装置以及通信系统 |
US20140236584A1 (en) * | 2013-02-21 | 2014-08-21 | Qualcomm Incorporated | Systems and methods for quantizing and dequantizing phase information |
CN103905843A (zh) * | 2014-04-23 | 2014-07-02 | 无锡天脉聚源传媒科技有限公司 | 一种规避连续i帧的分布式音视频处理装置和处理方法 |
Non-Patent Citations (1)
Title |
---|
See also references of EP3404652A4 * |
Also Published As
Publication number | Publication date |
---|---|
CN106970771A (zh) | 2017-07-21 |
EP3404652A4 (en) | 2018-11-21 |
CN106970771B (zh) | 2020-01-14 |
MY191125A (en) | 2022-05-31 |
KR102099029B1 (ko) | 2020-04-08 |
US10194200B2 (en) | 2019-01-29 |
KR20180082521A (ko) | 2018-07-18 |
US20180234721A1 (en) | 2018-08-16 |
EP3404652B1 (en) | 2019-12-04 |
JP2019508722A (ja) | 2019-03-28 |
JP6765650B2 (ja) | 2020-10-07 |
EP3404652A1 (en) | 2018-11-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2017121304A1 (zh) | 音频数据处理方法和终端 | |
US12114048B2 (en) | Automated voice translation dubbing for prerecorded videos | |
US11456017B2 (en) | Looping audio-visual file generation based on audio and video analysis | |
CN112235631B (zh) | 视频处理方法、装置、电子设备及存储介质 | |
US8767970B2 (en) | Audio panning with multi-channel surround sound decoding | |
US8271872B2 (en) | Composite audio waveforms with precision alignment guides | |
US8744249B2 (en) | Picture selection for video skimming | |
US9613605B2 (en) | Method, device and system for automatically adjusting a duration of a song | |
WO2022143924A1 (zh) | 视频生成方法、装置、电子设备和存储介质 | |
WO2014204997A1 (en) | Adaptive audio content generation | |
EP3929770B1 (en) | Methods, systems, and media for modifying the presentation of video content on a user device based on a consumption of the user device | |
US20060236219A1 (en) | Media timeline processing infrastructure | |
WO2009104402A1 (ja) | 音楽再生装置、音楽再生方法、音楽再生プログラム、及び集積回路 | |
US20160313970A1 (en) | Gapless media generation | |
US20140371891A1 (en) | Local control of digital signal processing | |
KR101212036B1 (ko) | 프로젝트를 분할하여 동영상의 객체정보를 저작하는 방법 및 장치 | |
WO2024131555A1 (zh) | 视频配乐方法、设备、存储介质及程序产品 | |
CN118283216A (zh) | 数据处理方法、电子设备以及存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 17738118 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2018529129 Country of ref document: JP Kind code of ref document: A |
|
ENP | Entry into the national phase |
Ref document number: 20187016293 Country of ref document: KR Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 1020187016293 Country of ref document: KR |
|
NENP | Non-entry into the national phase |
Ref country code: DE |