CN102446506A - Classification identifying method and equipment of audio signals - Google Patents

Classification identifying method and equipment of audio signals Download PDF

Info

Publication number
CN102446506A
CN102446506A CN2010105125058A CN201010512505A CN102446506A CN 102446506 A CN102446506 A CN 102446506A CN 2010105125058 A CN2010105125058 A CN 2010105125058A CN 201010512505 A CN201010512505 A CN 201010512505A CN 102446506 A CN102446506 A CN 102446506A
Authority
CN
China
Prior art keywords
sound signal
energy
threshold value
frame sound
lster
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2010105125058A
Other languages
Chinese (zh)
Other versions
CN102446506B (en
Inventor
金剑
刘贵忠
顿玉洁
杜正中
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN2010105125058A priority Critical patent/CN102446506B/en
Publication of CN102446506A publication Critical patent/CN102446506A/en
Application granted granted Critical
Publication of CN102446506B publication Critical patent/CN102446506B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The embodiment of the application discloses a classification identifying method and classification identifying equipment of audio signals. The method comprises steps of: acquiring a first frame of audio signal and preprocessing the first frame of audio signal; updating audio segments before the first frame audio signal and generating a current audio segment of the first frame audio signal; extracting classification characteristic low short-time energy ratio (LSTER) and jump short-time energy ratio (JSTER) from the current audio segment; identifying the type of the first frame of audio signal according to the classification characteristics, and obtaining initial classified results. When the first frame of audio signal is identified by the embodiment of the application, delaying performance is greatly enhanced and the real-time performance of audio processing is improved as the audio signals are far shorter than the audio segments; as the audio signals are identified by classification characteristic LSTER and JSTER, the accuracy for identification of the audio signals is enhanced and the complexity for identification of the audio signals is decreased.

Description

The classifying identification method of sound signal and device
Technical field
The application relates to communication technical field, and particularly speech audio is handled the classifying identification method and the device of sound intermediate frequency signal.
Background technology
USAC (Unified Coding of Speech and Audio; The speech audio Unified coding) is Unified coding to voice signal and music signal; In cataloged procedure, need correctly identify input signal and have phonetic feature or musical features, thereby select the different coding scheme that voice signal and music signal are encoded respectively.
Referring to Fig. 1, be the system schematic of existing recognizing voice and music signal.This system comprises MLER (the low-yield frame per second of modified) characteristic acquisition module 110, Bayes's posteriority sort module 120 and posteriority decision-making module 130, judges based on characteristic of division in this system that this characteristic of division is specially MLER.Use system shown in Figure 1; At first extract the MLER of i audio fragment through MLER characteristic acquisition module 110; Utilize Bayes's posteriority sort module 120 type of this i audio fragment to be carried out Classification and Identification then according to the threshold value that is provided with in advance; Utilize the result of 130 pairs of aforementioned Classification and Identification of posteriority decision-making module further to verify at last; Be that posteriority decision-making module 130 at first carries out buffer memory to recognition result, and be set at original state, according to the follow-up Classification and Identification result who receives the original state of aforementioned preservation is made a strategic decision then; Utilize the Classification and Identification result of follow-up audio fragment that the classification results that has identified is verified, in the hope of reducing error in classification.
Used unique characteristic of division MLTR that the type of audio fragment is discerned in the prior art; Judge with an audio fragment as output at every turn; The length of an audio fragment was generally about 1 second; Therefore about 1 second coding delay is introduced in corresponding meeting, has reduced the real-time performance of Audio Processing thus; Owing to judge the type of current audio fragment based on the posteriority decision-making technique in the prior art, and need carry out the posteriority verification to the type of current audio fragment, therefore will further strengthen the processing time-delay of speech audio through the type of follow-up audio fragment.
Summary of the invention
The embodiment of the invention provides the classifying identification method and the device of sound signal, and is bigger with the identification processing procedure time-delay that solves existing sound signal, the problem that the Audio Processing real-time performance is not high.
The embodiment of the invention provides a kind of classifying identification method of sound signal, comprising:
Obtain a frame sound signal, a said frame sound signal is carried out pre-service;
Through pretreated result the audio fragment before the said frame sound signal is upgraded, generate the current audio fragment that comprises a said frame sound signal;
From said current audio fragment, extract characteristic of division, said characteristic of division comprises that hanging down short-time energy compares JSTER than LSTER and jump energy;
Discern the type of a said frame sound signal according to said characteristic of division, obtain the preliminary classification result.
The embodiment of the invention provides a kind of Classification and Identification device of sound signal, comprising:
Acquiring unit is used to obtain a frame sound signal;
Pretreatment unit is used for a said frame sound signal is carried out pre-service;
Updating block is used for through pretreated result the audio fragment before the said frame sound signal being upgraded, and generates the current audio fragment that comprises a said frame sound signal;
Extraction unit is used for extracting characteristic of division from said current audio fragment, and said characteristic of division comprises that hanging down short-time energy compares JSTER than LSTER and jump energy;
Recognition unit is used for discerning according to said characteristic of division the type of a said frame sound signal, obtains the preliminary classification result.
Technical scheme by above the application embodiment provides is visible; Obtain a frame sound signal among the application embodiment, a frame sound signal is carried out pre-service, the audio fragment before the one frame sound signal is upgraded through pretreated result; Generation comprises the current audio fragment of this frame sound signal; From current audio fragment, extract characteristic of division LSTER and JSTER, discern the type of a frame sound signal, obtain the preliminary classification result according to characteristic of division.Different with the classification that need discern an audio fragment in the prior art at every turn; A frame sound signal is discerned among the application embodiment at every turn; Because the length of sound signal is far smaller than audio fragment, so delay performance greatly improves, and improved the real-time performance of Audio Processing; Through LSTER and two characteristic of divisions of JSTER sound signal is discerned among the application embodiment, increased the accuracy of sound signal identification, and, therefore reduced the complexity of sound signal identification owing to need not to carry out complicated calculating such as posteriority decision-making.
Description of drawings
In order to be illustrated more clearly in the application embodiment or technical scheme of the prior art; To do to introduce simply to the accompanying drawing of required use in embodiment or the description of the Prior Art below; Obviously, the accompanying drawing in describing below only is some embodiment that put down in writing among the application, for those of ordinary skills; Under the prerequisite of not paying creative work property, can also obtain other accompanying drawing according to these accompanying drawings.
Fig. 1 is the system schematic of existing recognizing voice and music signal;
Fig. 2 is the first embodiment process flow diagram of the classifying identification method of the application's sound signal;
Fig. 3 is the second embodiment process flow diagram of the classifying identification method of the application's sound signal;
Fig. 4 is for carrying out the synoptic diagram of more newly-generated current audio fragment among the application embodiment to audio fragment;
Fig. 5 is the 3rd an embodiment process flow diagram of the classifying identification method of the application's sound signal;
Fig. 6 is the first embodiment block diagram of the Classification and Identification device of the application's sound signal;
Fig. 7 is the second embodiment block diagram of the Classification and Identification device of the application's sound signal.
Embodiment
The application embodiment provides the classifying identification method and the device of sound signal.Among the application embodiment the Classification and Identification of sound signal is mainly referred to identify sound signal and belong to voice signal or music signal.
In order to make those skilled in the art person understand the application's scheme better, and make the application's above-mentioned purpose, feature and advantage can be more obviously understandable, below in conjunction with accompanying drawing and embodiment the application done further detailed explanation.
Referring to Fig. 2, be the first embodiment process flow diagram of the classifying identification method of the application's sound signal:
Step 201: obtain a frame sound signal, this frame sound signal is carried out pre-service.
Concrete, a frame sound signal is divided into some subframes, adjacent two sub-frame are overlapped in some subframes, handle the short-time energy of calculating each subframe through each subframe being added Hamming window.
Step 202: through pretreated result the audio fragment before this frame sound signal is upgraded, generate the current audio fragment that comprises this frame sound signal.
Concrete, with before the short-time energy of a frame sound signal of original position of audio fragment remove, the end of the audio fragment before the short-time energy of each subframe of a frame sound signal is moved into generates current audio fragment.
Step 203: from current audio fragment, extract the low short-time energy of characteristic of division and compare JSTER than LSTER and jump energy.
LSTER (low short-time energy than) refers to that number of sub-frames that energy in the audio fragment is lower than threshold value accounts for the ratio of subframe sum in this audio fragment; JSTER (jump energy ratio) refers to that the number of sub-frames that the energy jump takes place in the audio fragment accounts for the total ratio of subframe in this audio fragment.The extraction of above-mentioned two characteristic of divisions mainly is to calculate the short-time energy of each subframe, so calculated amount is lower.
Concrete; When extracting LSTER; Calculate the mean value threshold value of the short-time energy of all subframes in the current audio fragment; Add up the number of sub-frames of the short-time energy subaverage threshold value of subframe in the current audio fragment,, obtain LSTER the number of sub-frames of subaverage threshold value sum divided by subframe in the current audio fragment.
Concrete; When extracting JSTER; Obtain the short-time energy and prediction energy of the previous subframe of current subframe in the current audio fragment; Calculate the prediction energy of current subframe according to the short-time energy of previous subframe and prediction energy, judge through the prediction energy of more current subframe and the short-time energy of current subframe whether current subframe energy jump takes place, add up the number of the subframe of generation energy jump in the current audio fragment; With the sum of the number of sub-frames that energy jump takes place, obtain JSTER divided by subframe in the current audio fragment.
Step 204: the type according to this frame sound signal of characteristic of division identification obtains the preliminary classification result.
Concrete, obtain the energy value of a frame sound signal, the energy threshold of the energy value of a frame sound signal and preset quiet frame relatively when energy value during less than energy threshold, judges that the type of a frame sound signal is consistent with the type of its previous frame sound signal; When energy value during greater than threshold value, LSTER that relatively extracts and the LSTER threshold value that is provided with in advance as the LSTER that extracts during less than the LSTER threshold value, judge that a frame sound signal is a music signal; As the LSTER that extracts during greater than the LSTER threshold value; JSTER that relatively extracts and the JSTER threshold value that is provided with in advance as the JSTER that extracts during less than the JSTER threshold value, judge that a frame sound signal is a music signal; As the JSTER that extracts during, judge that a frame sound signal is a voice signal greater than the JSTER threshold value.
Referring to Fig. 3, be the second embodiment process flow diagram of the classifying identification method of the application's sound signal, this embodiment shows in detail sound signal is carried out Classification and Identification, and obtains the process of final classification results through smoothing processing:
Step 301: obtain a frame sound signal.
Step 302: a frame sound signal is divided into some subframes, and adjacent two sub-frame are overlapped in some subframes.
When carrying out pre-service, a frame sound signal of reading in is divided into some subframes, what overlapped each subframe was counted between each sub-frame is half the, each subframe is added Hamming window handle, and calculates the short-time energy of each sub-frame.
Wherein, The length of one frame sound signal of the current audio coding that is read in can be according to the difference of specific coding environment and difference, among the application embodiment, when dividing subframe; Suppose that each audio fragment comprises 20 frame sound signals; And comprising four sub-frame in each frame sound signal, the duration of each subframe can be 23ms (millisecond), has the duration of 11.5ms overlapping between per two adjacent subframes.
Step 303: handle the short-time energy of calculating each subframe through each subframe being added Hamming window.
Among the application embodiment; The data length of supposing each subframe is 1024 points, then after the frame sound signal with current input is divided into four sub-frame, is followed successively by each subframe and adds Hamming window and handle; This Hamming window and sub-frame data length are isometric, are 1024 Hamming windows.Calculate the short-time energy that to calculate each subframe through adding Hamming window, obtain short-time energy when four sub-frame of former frame sound signal.
Step 304: through pretreated result the audio fragment before the one frame sound signal is upgraded, generate the current audio fragment that comprises a frame sound signal.
When audio fragment is upgraded; The short-time energy of four sub-frame of the beginning of deserving an audio fragment before the former frame sound signal is removed; The short-time energy that to work as four sub-frame of former frame sound signal moves into the end of this audio fragment; Form new audio fragment, as current audio fragment.Referring to shown in Figure 4, for audio fragment being carried out the synoptic diagram of more newly-generated current audio fragment.Use the information comprise when an audio fragment of former frame sound signal as carrying out The classification basis among the application embodiment to deserving the former frame sound signal.
Step 305: from current audio fragment, extract characteristic of division LSTER.
When extracting LSTER; Calculate the mean value of subframe energy in the current audio fragment; And the number of the subframe of certain number percent of statistics subframe energy subaverage, wherein the value of certain number percent of mean value can be designated as the mean value threshold value, obtains the LSTER of this audio fragment.
Specifically; The classification results of sound signal mainly comprises two kinds of voice signal and music signals, and wherein therefore voice signal causes the short-time energy of each sub-frame unequal because the frame of wherein quiet, voiceless sound and voiced sound alternately occurs; And music signal is because relatively steadily; Therefore the short-time energy of each subframe of music signal changes not quite, compares with voice signal, and the variance that the short-time energy of music signal changes is little; And the ratio of the subframe that short-time energy is lower is also lower, and promptly the LSTER of music signal is lower than the LSTER of voice signal relatively.Therefore, different according to the above-mentioned LSTER characteristic of voice signal and music signal can utilize LSTER as sound signal is carried out The classification basis.
Still be example with the audio fragment that comprises 20 frame sound signals; Comprise 80 sub-frame in this audio fragment altogether, then at first calculate the mean value of the short-time energy of this 80 sub-frame, the short-time energy of each subframe can adopt existing method to calculate; The value of mean value certain percentage of calculating short-time energy then is as the mean value threshold value; This number percent can be set between 5% to 10%, and the short-time energy with 80 sub-frame compares with the mean value threshold value that this calculates respectively then, if be lower than this mean value threshold value then corresponding subframe is confirmed as the lower subframe of short-time energy; The number of calculating the lower subframe of short-time energy at last accounts for the ratio of 80 sub-frame; Promptly account for the ratio of current audio fragment, obtain LSTER, this LSTER specifically can calculate according to following formula:
Wherein, LSTER can calculate through following formula:
LSTER = 1 2 N Σ n = 0 1 [ sgn ( δ · avSTE - STE ( n ) ) + 1 ] , avSTE = 1 N Σ n = 0 N - 1 STE ( n )
In the following formula, N is a number of sub frames in the audio fragment, sgn () is-symbol function, and STE (n) is the short-time energy of n sub-frame, δ is preset coefficient.
Step 306: from current audio fragment, extract characteristic of division JSTER.
When extracting JSTER; Predict the short-time energy of current subframe through the short-time energy of the subframe before each current subframe in the current audio fragment; Judge according to predicting the outcome whether current subframe energy jumps; And add up in view of the above in the fragment number of sub frames of energy saltus step to take place, thereby obtain the JSTER of this audio fragment.
Specifically; When stronger signal occurring suddenly in the music signal; This stronger signal can make the mean value of subframe short-time energy in the audio fragment become big, therefore makes that the LSTER of music signal equally can be bigger, only is difficult to recognition of speech signals and music signal through LSTER.At this moment; The music signal fragment is compared with speech signal segment, and the lower subframe of the short-time energy of music signal fragment concentrates in together on position distribution usually; And the lower subframe of short-time energy is on position distribution in the speech signal segment, and it is more even to distribute usually.Therefore, the application embodiment calculates the number that the short-time energy of subframe is undergone mutation in an audio fragment, thereby embodies above-mentioned location distribution information according to the characteristic of quantum of energy frame on position distribution in short-term in voice signal and the music signal.This location distribution information can represent with JSTER, and the number of sub-frames that promptly subframe energy is undergone mutation in audio fragment accounts for the ratio of audio fragment subframe sum, and this JSTER is to carrying out replenishing of sound signal classification through LSTER.
When extracting JSTER; At first calculate the prediction energy of each subframe; The short-time energy of the previous subframe of the current subframe that this moment hypothesis institute will predict and predict that energy is respectively x1 and y1, the prediction energy of current subframe is y, then can calculate the prediction energy y of current subframe through following formula:
y=0.7y1+0.3x1
After calculating the prediction energy y of current subframe; The short-time energy (being made as x) of this prediction energy y and current subframe is compared; Suppose that whether identify current subframe through variable jump_flag undergos mutation, and then can compare the result whether acquisition undergos mutation according to following formula:
jump _ flag = 1 , y × jump _ thresh > x | | x × jump _ thresh > y 0 , others
In the following formula; When the value of jump_flag is 1, represent that energy jump has taken place current subframe, when the value of jump_flag is 0; Represent that energy jump does not take place current subframe; Jump_thresh is used to characterize the prediction energy of current subframe and the difference degree of its short-time energy, can be set to 0.2 usually, wherein " || " expression " or " relation.
Calculate in the audio fragment number of the subframe that energy jump takes place through following formula, with the number of the subframe that energy jump the takes place sum divided by subframe in the audio fragment, the value that obtains is JSTER.
Step 307: the energy value that obtains a frame sound signal.
Calculate energy value when the former frame sound signal, when a frame sound signal comprises four sub-frame, the energy value that then deserves the former frame sound signal be four sub-frame energy values with.
Step 308: whether the energy value of judging a frame sound signal is less than the energy threshold of preset quiet frame, if then execution in step 309; Otherwise, execution in step 310.
At first; To work as the energy value of former frame sound signal and the energy threshold of quiet frame compares; Quiet frame refers to that energy is lower than the sound signal of certain threshold value; The energy threshold of quiet frame can be in advance be provided with according to the energy of determined quiet frame in some cycle testss, for example, gets the mean value of the energy of these quiet frames.
Step 309: the type of judging a frame sound signal is consistent with the type of its previous frame sound signal, finishes current flow process.
According to comparative result; If work as the energy threshold of the energy value of former frame sound signal less than quiet frame; Judge that then working as the former frame sound signal is quiet frame, because the type of quiet frame can be voice signal, also can be music signal; In order to guarantee that the sound signal classification results keeps identical type in adjacent audio fragment, type that then will this quiet frame is confirmed as consistent with the signal type of its previous frame sound signal.
Step 310: judge that whether the LSTER that extracts is less than the LSTER threshold value that is provided with in advance, if then execution in step 312; Otherwise, execution in step 311.
Step 311: judge that whether the JSTER that extracts is less than the JSTER threshold value that is provided with in advance, if then execution in step 312; Otherwise, execution in step 313.
Step 312: judge that a frame sound signal is a music signal, execution in step 314.
Step 313: judge that a frame sound signal is a voice signal.
Step 314: the preliminary classification result to a frame sound signal carries out smoothly obtaining final classification results, finishes current flow process.
Carrying out when level and smooth; Obtain in two audio fragments before the former frame sound signal of being preserved; The preliminary classification result of all sound signals that comprised; Still being example to comprise 20 frame sound signals in the audio fragment, then is exactly the preliminary classification result who obtains 40 frame sound signals before last frame signal, and adding up the interior preliminary classification result of two audio fragments is the quantity of music signal and voice signal; The signal type that quantification occupies the majority carries out smoothly the preliminary classification result when the former frame sound signal with this signal type of confirming.
For example, suppose to be music signal as the preliminary classification result of former frame sound signal, and the quantity of voice signal is greater than the quantity of music signal in preceding two audio fragments, then final result of determination is a voice signal for deserving the former frame sound signal.
Need to prove that the preliminary classification result who uses the sound signal in two audio fragments in the present embodiment carries out smoothly can being not limited to two audio fragments in the practical application to the preliminary classification result when the former frame sound signal.
Can know by above-mentioned the application embodiment; Calculate energy when the subframe of former frame sound signal; And it is moved into when the end of the previous audio fragment of former frame sound signal, form new audio fragment, through extracting the characteristic of division of this audio fragment; And based on characteristic of division judgement sound signal type; In fact be exactly the type of judging an audio fragment, this is to be arranged in this audio fragment because work as the former frame sound signal, and therefore working as former frame sound signal type is exactly the type of its said audio fragment; When handling the next frame sound signal, because previous audio fragment, is judged the type of the audio fragment after this renewal by corresponding renewal, as the type of next frame sound signal.Promptly each type of judging an audio fragment; With the type of this audio fragment type as last frame sound signal in this audio fragment; When handling the next frame sound signal then; Audio fragment slides backward a frame, and then the next frame sound signal becomes the last frame of new audio fragment, can reduce the time-delay that causes because only handle a frame sound signal thus.
Referring to Fig. 5, be the 3rd embodiment process flow diagram of the classifying identification method of the application's sound signal, this embodiment shows after sound signal is carried out Classification and Identification, the process that LSTER threshold value and JSTER threshold value are adjusted:
Step 501: the preliminary classification result who upgrades the sound signal in the frame sound signal audio fragment before through the preliminary classification result of a frame sound signal.
When LSTER threshold value and JSTER threshold value are adjusted,, upgrade the preliminary classification result of the sound signal in the audio fragment before this frame sound signal at first through preliminary classification result when the former frame sound signal.For example; Comprise 20 frame sound signals in the audio fragment, identify with n1 to n20 respectively, identify with n21 and work as the former frame sound signal; The preliminary classification result that preserved n1 to n20 this moment; Therefore the preliminary classification result with n1 shifts out audio fragment, and the classification results of n21 is moved into this audio fragment, adds up n2 to the n21 preliminary classification result of totally 20 frame sound signals then.
Step 502: calculate that the preliminary classification result is the quantity of music signal in the new audio fragment.
Step 503: whether the quantity of judging music signal half the greater than the sound signal quantity that is comprised in the new audio fragment, if then execution in step 504; Otherwise, execution in step 505.
Step 504: LSTER threshold value and JSTER threshold value are increased a step value respectively, finish current flow process.
As preliminary classification result wherein when being the number of frames of music signal greater than half the (10 frame) of audio clip length; The probability that is categorized as music signal of then predicting the next frame sound signal is greater than being the probability of voice signal, and this moment adjusts the LSTER threshold value and the JSTER threshold value increases a step value respectively.
Step 505: LSTER threshold value and JSTER threshold value are reduced by a step value respectively, finish current flow process.
As preliminary classification result wherein when being the number of frames of music signal less than half the (10 frame) of audio clip length; The probability that is categorized as music signal of then predicting the next frame sound signal is less than being the probability of voice signal, and this moment adjusts the LSTER threshold value and the JSTER threshold value reduces by a step value respectively.
In the foregoing description; Owing to no matter increase or reduce LSTER threshold value and JSTER threshold value; Certain limit is all arranged, promptly show, classify the accuracy rate of result of determination when being lower than 80% being increased to the signal that makes in the voice sequence when LSTER threshold value and JSTER threshold value according to result that a large amount of voice sequences are tested; Then need stop LSTER threshold value and JSTER threshold value are increased, this moment, corresponding LSTER threshold value and JSTER threshold value was LSTER maximal value and JSTER maximal value; In like manner; According to the result who tests because of music sequence is in a large number shown; Be reduced to when LSTER threshold value and JSTER threshold value when making the accuracy rate of the signal classification result of determination in the music sequence be lower than 80%; Then need stop LSTER threshold value and JSTER threshold value are reduced, this moment, corresponding LSTER threshold value and JSTER threshold value was LSTER minimum value and JSTER minimum value.Hence one can see that; Above-mentioned when LSTER threshold value and JSTER threshold value are adjusted, when needs increase a step value, need also further to judge that whether LSTER threshold value and JSTER threshold value after increasing be respectively less than LSTER maximal value and JSTER maximal value; When less than the time; Increase the operation of a step value again, otherwise, do not increase; When needs reduce a step value, whether also need further to judge LSTER threshold value and JSTER threshold value after reducing respectively greater than LSTER minimum value and JSTER minimum value, when greater than the time, reduce the operation of a step value again, otherwise do not reduce.
Can know that by the foregoing description LSTER threshold value and JSTER threshold value be according to the adjustment dynamic change, when first frame sound signal is carried out Classification and Identification, can a given in advance LSTER threshold value and the initial value of JSTER threshold value.
Description by the classifying identification method embodiment of above-mentioned sound signal can be known; The application embodiment need be based on the setting of several parameters; The initialization value of these parameters all is that determined optimized parameter repeats no more at this after carrying out statistical test according to a large amount of audio samples.
In the speech audio Unified coding; Some high frequency expander tool, for example ESBR (Extend SpectralBand Replication strengthens spectral band replication); And the core encoder instrument need obtain the type information when the former frame sound signal; Therefore can use the above-mentioned result who carries out Classification and Identification of the application embodiment, classification results is input to the ESBR module on the one hand carry out the ESBR coding, can input to the core encoder module on the other hand voice signal; The music core encoder is carried out in selection, or carries out the voice core encoder.Except above-mentioned application, the embodiment that among the application sound signal is carried out Classification and Identification can also be applied in content-based audio retrieval, the fields such as summary of video.
Corresponding with the embodiment of the classifying identification method of the application's sound signal, the application also provides the embodiment of the Classification and Identification device of sound signal.
Referring to Fig. 6, be the first embodiment block diagram of the Classification and Identification device of the application's sound signal.
The Classification and Identification device of this sound signal comprises: acquiring unit 610, pretreatment unit 620, updating block 630, extraction unit 640 and recognition unit 650.
Wherein, acquiring unit 610 is used to obtain a frame sound signal.
Pretreatment unit 620 is used for a said frame sound signal is carried out pre-service.
Updating block 630 is used for through pretreated result the audio fragment before the said frame sound signal being upgraded, and generates the current audio fragment that comprises a said frame sound signal.
Extraction unit 640 is used for extracting characteristic of division from said current audio fragment, and said characteristic of division comprises that hanging down short-time energy compares JSTER than LSTER and jump energy.
Recognition unit 650 is used for discerning according to said characteristic of division the type of a said frame sound signal, obtains the preliminary classification result.
Referring to Fig. 7, be the second embodiment block diagram of the Classification and Identification device of the application's sound signal.
The Classification and Identification device of this sound signal comprises: acquiring unit 710, pretreatment unit 720, updating block 730, extraction unit 740, recognition unit 750, smooth unit 760 and adjustment unit 7670.
Wherein, acquiring unit 710 is used to obtain a frame sound signal.
Pretreatment unit 720 is used for a said frame sound signal is carried out pre-service.
Concrete, pretreatment unit 720 can comprise (not shown among Fig. 7): the sub-frame division unit, be used for a said frame sound signal is divided into some subframes, and adjacent two sub-frame are overlapped in said some subframes; Energy calculation unit is used for handling the short-time energy of calculating each subframe through each subframe being added Hamming window.
Updating block 730 is used for through pretreated result the audio fragment before the said frame sound signal being upgraded, and generates the current audio fragment that comprises a said frame sound signal.
Concrete, updating block 730 can comprise (not shown among Fig. 7): energy removes the unit, is used for the short-time energy of a frame sound signal of the original position of the audio fragment before said is removed; Energy moves into the unit, is used for the end that short-time energy with each subframe of a said frame sound signal moves into the audio fragment before said, generates said current audio fragment.
Extraction unit 740 is used for extracting characteristic of division from said current audio fragment, and said characteristic of division comprises that hanging down short-time energy compares JSTER than LSTER and jump energy.
Concrete, extraction unit 740 can comprise (not shown among Fig. 7): the energy threshold computing unit is used for when the characteristic of division that extracts is LSTER, calculating the mean value threshold value of the short-time energy of all subframes in the said current audio fragment; The number of sub-frames statistic unit, the short-time energy that is used for adding up said current audio fragment subframe is lower than the number of sub-frames of said mean value threshold value; The LSTER generation unit is used for the sum of the number of sub-frames that is lower than said mean value threshold value divided by said current audio fragment subframe obtained said LSTER; Extraction unit 740 can also comprise (not shown among Fig. 7): the energy acquiring unit, be used for when the characteristic of division that extracts is JSTER, and obtain the short-time energy and prediction energy of the previous subframe of current subframe in the said current audio fragment; Energy calculation unit is used for calculating according to the short-time energy of said previous subframe and prediction energy the prediction energy of said current subframe; The energy jump judging unit is used for judging through the prediction energy of more said current subframe and the short-time energy of said current subframe whether said current subframe energy jump takes place; Sudden change number statistic unit is used for adding up the number that the subframe of energy jump takes place said current audio fragment; The JSTER generation unit is used for the sum of the number of sub-frames that energy jump takes place divided by said current audio fragment subframe obtained said JSTER.
Recognition unit 750 is used for discerning according to said characteristic of division the type of a said frame sound signal, obtains the preliminary classification result.
Concrete, recognition unit 750 can comprise (not shown among Fig. 7): the energy value acquiring unit is used to obtain the energy value of a said frame sound signal; Recognition unit relatively, the energy threshold of the energy value that is used for a more said frame sound signal and preset quiet frame when said energy value during less than said energy threshold, judges that the type of a said frame sound signal is consistent with the type of its previous frame sound signal; When said energy value during greater than said threshold value, LSTER that relatively extracts and the LSTER threshold value that is provided with in advance as the LSTER of said extraction during less than the LSTER threshold value, judge that a said frame sound signal is a music signal; As the LSTER of said extraction during greater than the LSTER threshold value; JSTER that relatively extracts and the JSTER threshold value that is provided with in advance; As the JSTER of said extraction during less than the JSTER threshold value; Judge that a said frame sound signal is a music signal,, judge that a said frame sound signal is a voice signal as the JSTER of said extraction during greater than the JSTER threshold value.
Smooth unit 760 is used for the preliminary classification result of a said frame sound signal is carried out smoothly obtaining final classification results.
Concrete, smooth unit 760 can comprise (not shown among Fig. 7): preliminary classification is acquiring unit as a result, is used to obtain the preliminary classification result of a said frame sound signal some frame sound signals before of having preserved; Preliminary classification is statistic unit as a result, is used for adding up the preliminary classification result of said some frame sound signals, the number of voice signal and music signal; Final classification results is confirmed the unit, is used for the signal type that voice signal and music signal number occupy the majority is confirmed as the final classification results of a said frame sound signal.
Adjustment unit 770; Be used for through the preliminary classification result of a said frame sound signal said LSTER threshold value and JSTER threshold value being adjusted, said adjusted LSTER threshold value and JSTER threshold value are used for the Classification and Identification of the next frame sound signal of a said frame sound signal.
Concrete; Adjustment unit 770 can remove the preliminary classification result of a frame sound signal of the original position of the audio fragment before said; And, form new audio fragment with the end that the preliminary classification result of a said frame sound signal moves into said audio fragment before; One half of the sound signal quantity that is comprised in greater than said new audio fragment for the quantity of music signal as preliminary classification result in the said new audio fragment; Said LSTER threshold value and JSTER threshold value are increased a step value respectively; One half of the sound signal quantity that is comprised in less than said new audio fragment for the quantity of music signal as preliminary classification result in the said new audio fragment reduces by a step value respectively with said LSTER threshold value and JSTER threshold value.Further; Adjustment unit 770 can be worked as the LSTER threshold value that judge to increase after the step value with the JSTER threshold value during respectively less than LSTER maximal value of presetting and JSTER maximal value, and carrying out saidly increases a step value respectively with LSTER threshold value and JSTER threshold value; Seeing LSTER threshold value after the step value with the JSTER threshold value during when judgement, carrying out and said LSTER threshold value and JSTER threshold value are reduced by a step value respectively respectively greater than LSTER maximal value of presetting and JSTER minimum value.
Description through above embodiment can be known; Obtain a frame sound signal among the application embodiment, a frame sound signal is carried out pre-service, the audio fragment before the one frame sound signal is upgraded through pretreated result; Generation comprises the current audio fragment of this frame sound signal; From current audio fragment, extract characteristic of division LSTER and JSTER, discern the type of a frame sound signal, obtain the preliminary classification result according to characteristic of division.Different with the classification that need discern an audio fragment in the prior art at every turn; A frame sound signal is discerned among the application embodiment at every turn; Because the length of sound signal is far smaller than audio fragment, so delay performance greatly improves, and improved the real-time performance of Audio Processing; Please discern sound signal through LSTER and two characteristic of divisions of JSTER among the embodiment in this, increase the accuracy of sound signal identification, and, therefore reduce the complexity of sound signal identification owing to need not to carry out complicated calculating such as posteriority decision-making.
Those skilled in the art can be well understood to the application and can realize by the mode that software adds essential general hardware platform.Based on such understanding; The part that the application's technical scheme contributes to prior art in essence in other words can be come out with the embodied of software product; This computer software product can be stored in the storage medium, like ROM/RAM, magnetic disc, CD etc., comprises that some instructions are with so that a computer equipment (can be a personal computer; Server, the perhaps network equipment etc.) carry out the described method of some part of each embodiment of the application or embodiment.
Above-described the application's embodiment does not constitute the qualification to the application's protection domain.The modification of being done within any spirit and principle in the application, be equal to replacement and improvement etc., all should be included within the application's the protection domain.

Claims (20)

1. the classifying identification method of a sound signal is characterized in that, comprising:
Obtain a frame sound signal, a said frame sound signal is carried out pre-service;
Through pretreated result the audio fragment before the said frame sound signal is upgraded, generate the current audio fragment that comprises a said frame sound signal;
From said current audio fragment, extract characteristic of division, said characteristic of division comprises that hanging down short-time energy compares JSTER than LSTER and jump energy;
Discern the type of a said frame sound signal according to said characteristic of division, obtain the preliminary classification result.
2. method according to claim 1 is characterized in that, saidly a frame sound signal is carried out pre-service comprises:
A said frame sound signal is divided into some subframes, and adjacent two sub-frame are overlapped in said some subframes;
Handle the short-time energy of calculating each subframe through each subframe being added Hamming window.
3. method according to claim 2 is characterized in that, said audio fragment before the said frame sound signal renewal through pretreated result comprises:
The short-time energy of one frame sound signal of the original position of the audio fragment before said is removed;
The short-time energy of each subframe of a said frame sound signal is moved into the end of the audio fragment before said, generate said current audio fragment.
4. method according to claim 2 is characterized in that, when characteristic of division is LSTER, from said current audio fragment, extracts LSTER and comprises:
Calculate the mean value threshold value of the short-time energy of all subframes in the said current audio fragment;
The short-time energy of adding up subframe in the said current audio fragment is lower than the number of sub-frames of said mean value threshold value;
To be lower than the sum of the number of sub-frames of said mean value threshold value, obtain said LSTER divided by subframe in the said current audio fragment.
5. method according to claim 2 is characterized in that, when characteristic of division is JSTER, from said current audio fragment, extracts JSTER and comprises:
Obtain the short-time energy and prediction energy of the previous subframe of current subframe in the said current audio fragment;
Calculate the prediction energy of said current subframe according to the short-time energy of said previous subframe and prediction energy;
Judge through the prediction energy of more said current subframe and the short-time energy of said current subframe whether said current subframe energy jump takes place;
Add up the number that the subframe of energy jump takes place in the said current audio fragment;
With the sum of the number of sub-frames that energy jump takes place, obtain said JSTER divided by subframe in the said current audio fragment.
6. method according to claim 1 is characterized in that, the said type of discerning a frame sound signal according to said characteristic of division comprises:
Obtain the energy value of a said frame sound signal;
The energy threshold of the energy value of a more said frame sound signal and preset quiet frame when said energy value during less than said energy threshold, judges that the type of a said frame sound signal is consistent with the type of its previous frame sound signal;
When said energy value during greater than said threshold value, LSTER that relatively extracts and the LSTER threshold value that is provided with in advance as the LSTER of said extraction during less than the LSTER threshold value, judge that a said frame sound signal is a music signal;
As the LSTER of said extraction during greater than the LSTER threshold value; JSTER that relatively extracts and the JSTER threshold value that is provided with in advance; As the JSTER of said extraction during less than the JSTER threshold value; Judge that a said frame sound signal is a music signal,, judge that a said frame sound signal is a voice signal as the JSTER of said extraction during greater than the JSTER threshold value.
7. method according to claim 6 is characterized in that, also comprises:
Preliminary classification result through a said frame sound signal adjusts said LSTER threshold value and JSTER threshold value, and said adjusted LSTER threshold value and JSTER threshold value are used for the Classification and Identification of the next frame sound signal of a said frame sound signal.
8. method according to claim 7 is characterized in that, said preliminary classification result through a frame sound signal adjusts said LSTER threshold value and JSTER threshold value and comprises:
The preliminary classification result of one frame sound signal of the original position of the audio fragment before said is removed, and preliminary classification result that will a said frame sound signal moves into the end of said audio fragment before, form new audio fragment;
One half of the sound signal quantity that is comprised in greater than said new audio fragment for the quantity of music signal as preliminary classification result in the said new audio fragment; Said LSTER threshold value and JSTER threshold value are increased a step value respectively; One half of the sound signal quantity that is comprised in less than said new audio fragment for the quantity of music signal as preliminary classification result in the said new audio fragment reduces by a step value respectively with said LSTER threshold value and JSTER threshold value.
9. method according to claim 8 is characterized in that, also comprises:
LSTER threshold value after judge increasing a step value is with the JSTER threshold value during respectively less than LSTER maximal value of presetting and JSTER maximal value, and carrying out saidly increases a step value respectively with LSTER threshold value and JSTER threshold value;
Seeing LSTER threshold value after the step value with the JSTER threshold value during when judgement, carrying out and said LSTER threshold value and JSTER threshold value are reduced by a step value respectively respectively greater than LSTER maximal value of presetting and JSTER minimum value.
10. method according to claim 1 is characterized in that, also comprises: the preliminary classification result to a said frame sound signal carries out smoothly obtaining final classification results.
11. method according to claim 10 is characterized in that, said preliminary classification result to a frame sound signal smoothly comprises:
Obtain the preliminary classification result of a said frame sound signal some frame sound signals before of having preserved;
Add up among the preliminary classification result of said some frame sound signals the number of voice signal and music signal;
The signal type that number in voice signal and the music signal is occupied the majority is confirmed as the final classification results of a said frame sound signal.
12. the Classification and Identification device of a sound signal is characterized in that, comprising:
Acquiring unit is used to obtain a frame sound signal;
Pretreatment unit is used for a said frame sound signal is carried out pre-service;
Updating block is used for through pretreated result the audio fragment before the said frame sound signal being upgraded, and generates the current audio fragment that comprises a said frame sound signal;
Extraction unit is used for extracting characteristic of division from said current audio fragment, and said characteristic of division comprises that hanging down short-time energy compares JSTER than LSTER and jump energy;
Recognition unit is used for discerning according to said characteristic of division the type of a said frame sound signal, obtains the preliminary classification result.
13. device according to claim 12 is characterized in that, said pretreatment unit comprises:
The sub-frame division unit is used for a said frame sound signal is divided into some subframes, and adjacent two sub-frame are overlapped in said some subframes;
Energy calculation unit is used for handling the short-time energy of calculating each subframe through each subframe being added Hamming window.
14. device according to claim 13 is characterized in that, said updating block comprises:
Energy removes the unit, is used for the short-time energy of a frame sound signal of the original position of the audio fragment before said is removed;
Energy moves into the unit, is used for the end that short-time energy with each subframe of a said frame sound signal moves into the audio fragment before said, generates said current audio fragment.
15. device according to claim 13 is characterized in that, said extraction unit comprises:
The energy threshold computing unit is used for when the characteristic of division that extracts is LSTER, calculating the mean value threshold value of the short-time energy of all subframes in the said current audio fragment;
The number of sub-frames statistic unit, the short-time energy that is used for adding up said current audio fragment subframe is lower than the number of sub-frames of said mean value threshold value;
The LSTER generation unit is used for the sum of the number of sub-frames that is lower than said mean value threshold value divided by said current audio fragment subframe obtained said LSTER.
16. device according to claim 13 is characterized in that, said extraction unit comprises:
The energy acquiring unit is used for when the characteristic of division that extracts is JSTER, obtains the short-time energy and prediction energy of the previous subframe of current subframe in the said current audio fragment;
Energy calculation unit is used for calculating according to the short-time energy of said previous subframe and prediction energy the prediction energy of said current subframe;
The energy jump judging unit is used for judging through the prediction energy of more said current subframe and the short-time energy of said current subframe whether said current subframe energy jump takes place;
Sudden change number statistic unit is used for adding up the number that the subframe of energy jump takes place said current audio fragment;
The JSTER generation unit is used for the sum of the number of sub-frames that energy jump takes place divided by said current audio fragment subframe obtained said JSTER.
17. device according to claim 12 is characterized in that said recognition unit comprises:
The energy value acquiring unit is used to obtain the energy value of a said frame sound signal;
Recognition unit relatively, the energy threshold of the energy value that is used for a more said frame sound signal and preset quiet frame when said energy value during less than said energy threshold, judges that the type of a said frame sound signal is consistent with the type of its previous frame sound signal; When said energy value during greater than said threshold value, LSTER that relatively extracts and the LSTER threshold value that is provided with in advance as the LSTER of said extraction during less than the LSTER threshold value, judge that a said frame sound signal is a music signal; As the LSTER of said extraction during greater than the LSTER threshold value; JSTER that relatively extracts and the JSTER threshold value that is provided with in advance; As the JSTER of said extraction during less than the JSTER threshold value; Judge that a said frame sound signal is a music signal,, judge that a said frame sound signal is a voice signal as the JSTER of said extraction during greater than the JSTER threshold value.
18. device according to claim 17 is characterized in that, also comprises:
Adjustment unit; Be used for through the preliminary classification result of a said frame sound signal said LSTER threshold value and JSTER threshold value being adjusted, said adjusted LSTER threshold value and JSTER threshold value are used for the Classification and Identification of the next frame sound signal of a said frame sound signal.
19. device according to claim 12 is characterized in that, also comprises:
Smooth unit is used for the preliminary classification result of a said frame sound signal is carried out smoothly obtaining final classification results.
20. device according to claim 19 is characterized in that, said smooth unit comprises:
Preliminary classification is acquiring unit as a result, is used to obtain the preliminary classification result of a said frame sound signal some frame sound signals before of having preserved;
Preliminary classification is statistic unit as a result, is used for adding up the preliminary classification result of said some frame sound signals, the number of voice signal and music signal;
Final classification results is confirmed the unit, is used for the signal type that voice signal and music signal number occupy the majority is confirmed as the final classification results of a said frame sound signal.
CN2010105125058A 2010-10-11 2010-10-11 Classification identifying method and equipment of audio signals Expired - Fee Related CN102446506B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2010105125058A CN102446506B (en) 2010-10-11 2010-10-11 Classification identifying method and equipment of audio signals

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2010105125058A CN102446506B (en) 2010-10-11 2010-10-11 Classification identifying method and equipment of audio signals

Publications (2)

Publication Number Publication Date
CN102446506A true CN102446506A (en) 2012-05-09
CN102446506B CN102446506B (en) 2013-06-05

Family

ID=46008958

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2010105125058A Expired - Fee Related CN102446506B (en) 2010-10-11 2010-10-11 Classification identifying method and equipment of audio signals

Country Status (1)

Country Link
CN (1) CN102446506B (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103854646A (en) * 2014-03-27 2014-06-11 成都康赛信息技术有限公司 Method for classifying digital audio automatically
CN104091599A (en) * 2013-07-18 2014-10-08 腾讯科技(深圳)有限公司 Audio file processing method and device
CN104217715A (en) * 2013-08-12 2014-12-17 北京诺亚星云科技有限责任公司 Real-time voice sample detection method and system
CN105609111A (en) * 2015-09-25 2016-05-25 巫立斌 Noise identification method in audio signal and system thereof
CN105872855A (en) * 2016-05-26 2016-08-17 广州酷狗计算机科技有限公司 Labeling method and device for video files
CN106297803A (en) * 2016-10-12 2017-01-04 安徽徽云信息科技有限公司 A kind of Computer Distance Education system
CN106409312A (en) * 2015-07-28 2017-02-15 恩智浦有限公司 Audio classifier
CN106875955A (en) * 2015-12-10 2017-06-20 掌赢信息科技(上海)有限公司 The preparation method and electronic equipment of a kind of sound animation
CN107945816A (en) * 2016-10-13 2018-04-20 汤姆逊许可公司 Apparatus and method for audio frame processing
CN108735230A (en) * 2018-05-10 2018-11-02 佛山市博知盾识科技有限公司 Background music recognition methods, device and equipment based on mixed audio
CN109273016A (en) * 2015-03-13 2019-01-25 杜比国际公司 Decode the audio bit stream in filling element with enhancing frequency spectrum tape copy metadata
CN109545192A (en) * 2018-12-18 2019-03-29 百度在线网络技术(北京)有限公司 Method and apparatus for generating model
CN109616142A (en) * 2013-03-26 2019-04-12 杜比实验室特许公司 Device and method for audio classification and processing
CN111261174A (en) * 2018-11-30 2020-06-09 杭州海康威视数字技术股份有限公司 Audio classification method and device, terminal and computer readable storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3607450B2 (en) * 1997-03-05 2005-01-05 Kddi株式会社 Audio information classification device
US20060136211A1 (en) * 2000-04-19 2006-06-22 Microsoft Corporation Audio Segmentation and Classification Using Threshold Values
CN101197135A (en) * 2006-12-05 2008-06-11 华为技术有限公司 Aural signal classification method and device
CN101236742A (en) * 2008-03-03 2008-08-06 中兴通讯股份有限公司 Music/ non-music real-time detection method and device
CN101393741A (en) * 2007-09-19 2009-03-25 中兴通讯股份有限公司 Audio signal classification apparatus and method used in wideband audio encoder and decoder
CN101548313A (en) * 2006-11-16 2009-09-30 国际商业机器公司 Voice activity detection system and method
JP4392805B2 (en) * 2008-04-28 2010-01-06 Kddi株式会社 Audio information classification device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3607450B2 (en) * 1997-03-05 2005-01-05 Kddi株式会社 Audio information classification device
US20060136211A1 (en) * 2000-04-19 2006-06-22 Microsoft Corporation Audio Segmentation and Classification Using Threshold Values
CN101548313A (en) * 2006-11-16 2009-09-30 国际商业机器公司 Voice activity detection system and method
CN101197135A (en) * 2006-12-05 2008-06-11 华为技术有限公司 Aural signal classification method and device
CN101393741A (en) * 2007-09-19 2009-03-25 中兴通讯股份有限公司 Audio signal classification apparatus and method used in wideband audio encoder and decoder
CN101236742A (en) * 2008-03-03 2008-08-06 中兴通讯股份有限公司 Music/ non-music real-time detection method and device
JP4392805B2 (en) * 2008-04-28 2010-01-06 Kddi株式会社 Audio information classification device

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109616142A (en) * 2013-03-26 2019-04-12 杜比实验室特许公司 Device and method for audio classification and processing
CN109616142B (en) * 2013-03-26 2023-11-07 杜比实验室特许公司 Apparatus and method for audio classification and processing
CN104091599A (en) * 2013-07-18 2014-10-08 腾讯科技(深圳)有限公司 Audio file processing method and device
CN104217715A (en) * 2013-08-12 2014-12-17 北京诺亚星云科技有限责任公司 Real-time voice sample detection method and system
CN104217715B (en) * 2013-08-12 2017-06-16 北京诺亚星云科技有限责任公司 A kind of real-time voice sample testing method and system
CN103854646A (en) * 2014-03-27 2014-06-11 成都康赛信息技术有限公司 Method for classifying digital audio automatically
CN103854646B (en) * 2014-03-27 2018-01-30 成都康赛信息技术有限公司 A kind of method realized DAB and classified automatically
CN109273016A (en) * 2015-03-13 2019-01-25 杜比国际公司 Decode the audio bit stream in filling element with enhancing frequency spectrum tape copy metadata
CN109273016B (en) * 2015-03-13 2023-03-28 杜比国际公司 Decoding an audio bitstream having enhanced spectral band replication metadata in a filler element
US11664038B2 (en) 2015-03-13 2023-05-30 Dolby International Ab Decoding audio bitstreams with enhanced spectral band replication metadata in at least one fill element
CN106409312A (en) * 2015-07-28 2017-02-15 恩智浦有限公司 Audio classifier
CN106409312B (en) * 2015-07-28 2021-12-10 汇顶科技(香港)有限公司 Audio classifier
CN105609111A (en) * 2015-09-25 2016-05-25 巫立斌 Noise identification method in audio signal and system thereof
CN106875955A (en) * 2015-12-10 2017-06-20 掌赢信息科技(上海)有限公司 The preparation method and electronic equipment of a kind of sound animation
CN105872855A (en) * 2016-05-26 2016-08-17 广州酷狗计算机科技有限公司 Labeling method and device for video files
CN106297803A (en) * 2016-10-12 2017-01-04 安徽徽云信息科技有限公司 A kind of Computer Distance Education system
CN107945816A (en) * 2016-10-13 2018-04-20 汤姆逊许可公司 Apparatus and method for audio frame processing
CN108735230B (en) * 2018-05-10 2020-12-04 上海麦克风文化传媒有限公司 Background music identification method, device and equipment based on mixed audio
CN108735230A (en) * 2018-05-10 2018-11-02 佛山市博知盾识科技有限公司 Background music recognition methods, device and equipment based on mixed audio
CN111261174B (en) * 2018-11-30 2023-02-17 杭州海康威视数字技术股份有限公司 Audio classification method and device, terminal and computer readable storage medium
CN111261174A (en) * 2018-11-30 2020-06-09 杭州海康威视数字技术股份有限公司 Audio classification method and device, terminal and computer readable storage medium
CN109545192B (en) * 2018-12-18 2022-03-08 百度在线网络技术(北京)有限公司 Method and apparatus for generating a model
CN109545192A (en) * 2018-12-18 2019-03-29 百度在线网络技术(北京)有限公司 Method and apparatus for generating model

Also Published As

Publication number Publication date
CN102446506B (en) 2013-06-05

Similar Documents

Publication Publication Date Title
CN102446506B (en) Classification identifying method and equipment of audio signals
CN101197130B (en) Sound activity detecting method and detector thereof
US9875739B2 (en) Speaker separation in diarization
CN102089803B (en) Method and discriminator for classifying different segments of a signal
CN111243602B (en) Voiceprint recognition method based on gender, nationality and emotion information
JP4568371B2 (en) Computerized method and computer program for distinguishing between at least two event classes
US11004458B2 (en) Coding mode determination method and apparatus, audio encoding method and apparatus, and audio decoding method and apparatus
US20030101050A1 (en) Real-time speech and music classifier
Janicki Spoofing countermeasure based on analysis of linear prediction error.
CN102714034B (en) Signal processing method, device and system
CN101149922A (en) Speech recognition device and speech recognition method
CN103824557A (en) Audio detecting and classifying method with customization function
US20220101859A1 (en) Speaker recognition based on signal segments weighted by quality
KR101618512B1 (en) Gaussian mixture model based speaker recognition system and the selection method of additional training utterance
CN1391211A (en) Exercising method and system to distinguish parameters
CN102376306B (en) Method and device for acquiring level of speech frame
CN1218945A (en) Identification of static and non-static signals
Bonastre et al. Speaker Modeling Using Local Binary Decisions.
KR101862982B1 (en) Voiced/Unvoiced Decision Method Using Deep Neural Network for Linear Predictive Coding-10e Vocoder
Fauve et al. Influence of task duration in text-independent speaker verification.
US8447594B2 (en) Multicodebook source-dependent coding and decoding
Song et al. Analysis and improvement of speech/music classification for 3GPP2 SMV based on GMM
CN103390404A (en) Information processing apparatus, information processing method and information processing program
Shahsavari et al. Speech activity detection using deep neural networks
Yantorno et al. Usable speech detection using a context dependent Gaussian mixture model classifier

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20130605

Termination date: 20191011

CF01 Termination of patent right due to non-payment of annual fee