CN108538311A - Audio frequency classification method, device and computer readable storage medium - Google Patents
Audio frequency classification method, device and computer readable storage medium Download PDFInfo
- Publication number
- CN108538311A CN108538311A CN201810332491.8A CN201810332491A CN108538311A CN 108538311 A CN108538311 A CN 108538311A CN 201810332491 A CN201810332491 A CN 201810332491A CN 108538311 A CN108538311 A CN 108538311A
- Authority
- CN
- China
- Prior art keywords
- audio
- set categories
- network
- classification
- audio signal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/24—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
Abstract
The invention discloses a kind of audio frequency classification method, device and computer readable storage mediums, belong to electronic technology field.This method includes:Acquire audio signal;Audio signal is intercepted or is supplemented, the duration of audio signal is adjusted to preset duration;According to the frequency information of audio signal, target audio is converted audio signals into;The audio frequency characteristics of target audio are extracted by presetting the convolutional network that grader includes;The temporal aspect of audio frequency characteristics is extracted by presetting the thresholding recirculating network that grader includes;According to temporal aspect, by the probability for presetting the pre-set categories that the fully-connected network that grader includes determines that the classification of target audio is identified by each pre-set categories in multiple pre-set categories mark;The pre-set categories of maximum probability identify the classification that identified pre-set categories are determined as target audio during multiple pre-set categories are identified.The present invention remains the integrality of target audio, classification accuracy is higher without being segmented to target audio.
Description
Technical field
The present invention relates to electronic technology field, more particularly to a kind of audio frequency classification method, device and computer-readable storage
Medium.
Background technology
With the fast development of electronic technology, people often upload audio in music application.It is searched for the ease of user
Rope and audio is used, music application often classifies to the magnanimity audio of upload.For example, music application can be to upload
The quality of audio distinguishes, or judges that the audio uploaded is voice or musical background etc..
In the related technology, classified to audio using support vector machine classifier, due to support vector machine classifier
It is limited that another characteristic can be known, so classified using support vector machine classifier, operation is relatively complicated, classification effectiveness compared with
It is low.In addition, usually only sufficiently long audio can just reflect its real property, and carried out using support vector machine classifier
Audio is first often divided into a series of segment when classification, to which the integrality of audio can be destroyed, cause to classify accuracy compared with
It is low.
Invention content
An embodiment of the present invention provides a kind of audio frequency classification method, device and computer readable storage mediums, can solve
Audio classification is less efficient in the related technology and the relatively low problem of accuracy.The technical solution is as follows:
On the one hand, a kind of audio frequency classification method is provided, the method includes:
Acquire audio signal;
The audio signal is intercepted or supplemented, the duration of the audio signal is adjusted to preset duration;
According to the frequency information of the audio signal, the audio signal is converted into target audio;
The audio frequency characteristics of the target audio are extracted by presetting the convolutional network that grader includes;
The thresholding recirculating network for including by the default grader extracts the temporal aspect of the audio frequency characteristics;
According to the temporal aspect, the fully-connected network for including by the default grader determines the target audio
Classification identified by multiple pre-set categories in the probability of pre-set categories that identifies of each pre-set categories;
The pre-set categories of maximum probability identify identified pre-set categories and are determined as during the multiple pre-set categories are identified
The classification of the target audio.
Optionally, the audio frequency characteristics that the target audio is extracted by presetting the convolutional network that grader includes,
Including:
The target audio is divided into multiple audio fragments by the convolutional network;
By the convolutional network by the feature extraction of each audio fragment in the multiple audio fragment be a spy
Sign;
The feature of extraction is made up of to the audio frequency characteristics of the target audio the convolutional network.
Optionally, the thresholding recirculating network for including by the default grader extract the audio frequency characteristics when
Sequence characteristics, including:
The first temporal aspect of the audio frequency characteristics is extracted by the thresholding recirculating network;
Corresponding first characteristic of division of first temporal aspect is determined by the fully-connected network;
Each element in first characteristic of division is substituted into the first preset function, is obtained in first characteristic of division
The weight of each element, the element in first temporal aspect are corresponded with the element in first characteristic of division;
For the either element A in first temporal aspect, by elements A in first characteristic of division corresponding member
The weight of element is multiplied with elements A, obtains corresponding first element of elements A;
Each element in first temporal aspect is replaced with into corresponding first element, obtains the second temporal aspect work
For the temporal aspect of the audio frequency characteristics.
Optionally, described according to the temporal aspect, it is determined by the fully-connected network that the default grader includes
The probability of the classification of the target audio pre-set categories that each pre-set categories identify in being identified by multiple pre-set categories,
Including:
The second characteristic of division of the temporal aspect is determined by the fully-connected network;
The element in second characteristic of division is substituted into the second preset function by the fully-connected network, is obtained described
The probability of the classification of the target audio pre-set categories that each pre-set categories identify in being identified by the multiple pre-set categories.
Optionally, the default grader further includes batch standardization at least one of network and pond network.
Optionally, the audio signal is converted to the target sound by the frequency information according to the audio signal
Frequently, including:
The mel-frequency cepstrum coefficient MFCC for determining the audio signal generates institute according to the MFCC of the audio signal
State target audio;Alternatively,
The frequency spectrum for determining the audio signal generates the target audio according to the frequency spectrum of the audio signal.
Optionally, it is described by preset the grader convolutional network that includes extract the target audio audio frequency characteristics it
Before, further include:
Obtain multiple trained audio collections, all training that each of the multiple trained audio collection training audio collection includes
Audio corresponds to same pre-set categories mark;
Trained disaggregated model is treated using the multiple trained audio collection to be trained, and obtains the default grader.
On the one hand, a kind of audio classification device is provided, described device includes:
Acquisition module, for acquiring audio signal;
Module is adjusted, for the audio signal to be intercepted or supplemented, the duration of the audio signal is adjusted
For preset duration;
The audio signal is converted to target audio by conversion module for the frequency information according to the audio signal;
First extraction module, the audio for extracting the target audio by presetting the convolutional network that grader includes
Feature;
Second extraction module, it is special that the thresholding recirculating network for including by the default grader extracts the audio
The temporal aspect of sign;
First determining module is used for according to the temporal aspect, the fully connected network for including by the default grader
Network determines the pre-set categories that the classification of the target audio is identified by each pre-set categories in multiple pre-set categories mark
Probability;
Second determining module, the pre-set categories for maximum probability in identifying the multiple pre-set categories are identified
Pre-set categories be determined as the classification of the target audio.
Optionally, first extraction module includes:
Submodule is split, for the target audio to be divided into multiple audio fragments by the convolutional network;
First extracting sub-module, for passing through the convolutional network by each audio fragment in the multiple audio fragment
Feature extraction be a feature;
Submodule is formed, the audio for the feature of extraction to be made up of to the target audio the convolutional network is special
Sign.
Optionally, second extraction module includes:
Second extracting sub-module, the first sequential for extracting the audio frequency characteristics by the thresholding recirculating network are special
Sign;
First determination sub-module, for determining that first temporal aspect is first point corresponding by the fully-connected network
Category feature;
First substitutes into submodule, for each element in first characteristic of division to be substituted into the first preset function, obtains
To the weight of each element in first characteristic of division, the element in first temporal aspect and first characteristic of division
In element correspond;
Multiplication submodule is used for for the either element A in first temporal aspect, by elements A at described first point
The weight of corresponding element is multiplied with elements A in category feature, obtains corresponding first element of elements A;
Submodule is replaced to obtain for each element in first temporal aspect to be replaced with corresponding first element
Temporal aspect to the second temporal aspect as the audio frequency characteristics.
Optionally, first determining module includes:
Second determination sub-module, the second characteristic of division for determining the temporal aspect by the fully-connected network;
Second substitutes into submodule, for by the fully-connected network by the element substitution the in second characteristic of division
Two preset functions, the classification for obtaining the target audio are marked by each pre-set categories in the multiple pre-set categories mark
The probability of the pre-set categories of knowledge.
Optionally, the default grader further includes batch standardization at least one of network and pond network.
Optionally, the conversion module includes:
Third determination sub-module, the mel-frequency cepstrum coefficient MFCC for determining the audio signal, according to the sound
The MFCC of frequency signal generates the target audio;
4th determination sub-module, the frequency spectrum for determining the audio signal are generated according to the frequency spectrum of the audio signal
The target audio.
Optionally, described device further includes:
Acquisition module, for obtaining multiple trained audio collections, each of the multiple trained audio collection trains audio collection
Including all trained audios correspond to same pre-set categories mark;
Training module is trained for treating trained disaggregated model using the multiple trained audio collection, obtains institute
State default grader.
On the one hand, a kind of audio classification device is provided, described device includes processor, memory and is stored in described deposit
On reservoir and the program code that can run on the processor, the processor are realized above-mentioned when executing said program code
Audio frequency classification method.
On the one hand, a kind of computer readable storage medium is provided, instruction is stored on the computer readable storage medium,
The step of above-mentioned audio frequency classification method is realized when described instruction is executed by processor.
The advantageous effect that technical solution provided in an embodiment of the present invention is brought is:
In embodiments of the present invention, audio signal is first acquired, then the audio signal is intercepted or supplemented, it should
The duration of audio signal is adjusted to preset duration, and the duration of the audio signal will be by regular to one more suitable model at this time
It encloses, then according to the frequency information of the audio signal, which is converted into target audio.Later, pass through default classification
The audio frequency characteristics for the convolutional network extraction target audio that device includes, realize the dimensionality reduction to the feature of each audio fragment so that
The dimension of the audio frequency characteristics extracted is relatively low.Later, the audio is extracted by presetting the thresholding recirculating network that grader includes
The temporal aspect of feature.According to the temporal aspect, target audio is determined by presetting the fully-connected network that grader includes
The probability of the classification pre-set categories that each pre-set categories identify in being identified by multiple pre-set categories, and will be multiple default
The pre-set categories of maximum probability identify the classification that identified pre-set categories are determined as target audio in classification logotype.This is sorted
Journey is simple and practicable, and classification effectiveness is higher, and due to being not necessarily to be segmented target audio, remains the integrality of target audio,
Therefore classification accuracy is also higher.
Description of the drawings
To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment
Attached drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for
For those of ordinary skill in the art, without creative efforts, other are can also be obtained according to these attached drawings
Attached drawing.
Fig. 1 is a kind of flow chart of audio frequency classification method provided in an embodiment of the present invention;
Fig. 2 is the flow chart of another audio frequency classification method provided in an embodiment of the present invention;
Fig. 3 is a kind of schematic diagram of collected audio signal provided in an embodiment of the present invention;
Fig. 4 is a kind of schematic diagram of target audio provided in an embodiment of the present invention;
Fig. 5 is the schematic diagram of another target audio provided in an embodiment of the present invention;
Fig. 6 is a kind of structural schematic diagram of default grader provided in an embodiment of the present invention;
Fig. 7 is the structural schematic diagram of the first audio classification device provided in an embodiment of the present invention;
Fig. 8 is a kind of structural schematic diagram of first extraction module provided in an embodiment of the present invention;
Fig. 9 is a kind of structural schematic diagram of second extraction module provided in an embodiment of the present invention;
Figure 10 is a kind of structural schematic diagram of first determining module provided in an embodiment of the present invention;
Figure 11 is a kind of structural schematic diagram of conversion module provided in an embodiment of the present invention;
Figure 12 is the structural schematic diagram of second of audio classification device provided in an embodiment of the present invention;
Figure 13 is the structural schematic diagram of the third audio classification device provided in an embodiment of the present invention.
Specific implementation mode
To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with attached drawing to embodiment party of the present invention
Formula is described in further detail.
In order to make it easy to understand, before carrying out detailed explanation to the embodiment of the present invention, first to the embodiment of the present invention
The application scenarios being related to are introduced.
With the fast development of electronic technology, people often upload audio in music application, are searched for the ease of user
Rope and use, music application often classify to the magnanimity audio of upload.Currently, being come pair using support vector machine classifier
Audio is classified, since can to know another characteristic limited for support vector machine classifier, so using support vector machine classifier
Classify, operation is relatively complicated, and classification effectiveness is relatively low.In addition, usually only sufficiently long audio can just reflect that it is true
Real attribute, and using support vector machine classifier come go classification when audio is first often divided into a series of segment, to meeting
The integrality for destroying audio causes classification accuracy relatively low.For this purpose, the present invention provides a kind of audio frequency classification method, to improve
The efficiency of audio classification and accuracy.
Next it will describe in detail to audio frequency classification method provided in an embodiment of the present invention in conjunction with attached drawing.
Fig. 1 is a kind of flow chart of audio frequency classification method provided in an embodiment of the present invention.Referring to Fig. 1, this method include with
Lower step:
Step 101:Acquire audio signal.
Step 102:The audio signal is intercepted or supplemented, when the duration of the audio signal is adjusted to default
It is long.
Step 103:According to the frequency information of the audio signal, which is converted into target audio.
Step 104:The audio frequency characteristics of target audio are extracted by presetting the convolutional network that grader includes.
Step 105:The temporal aspect of the audio frequency characteristics is extracted by presetting the thresholding recirculating network that grader includes.
Step 106:According to the temporal aspect, target audio is determined by presetting the fully-connected network that grader includes
The probability of the classification pre-set categories that each pre-set categories identify in being identified by multiple pre-set categories.
Step 107:The pre-set categories of maximum probability identify identified pre-set categories during multiple pre-set categories are identified
It is determined as the classification of target audio.
In embodiments of the present invention, audio signal is first acquired, then the audio signal is intercepted or supplemented, it should
The duration of audio signal is adjusted to preset duration, and the duration of the audio signal will be by regular to one more suitable model at this time
It encloses, then according to the frequency information of the audio signal, which is converted into target audio.Later, pass through default classification
The audio frequency characteristics for the convolutional network extraction target audio that device includes, realize the dimensionality reduction to the feature of each audio fragment so that
The dimension of the audio frequency characteristics extracted is relatively low.Later, the audio is extracted by presetting the thresholding recirculating network that grader includes
The temporal aspect of feature.According to the temporal aspect, target audio is determined by presetting the fully-connected network that grader includes
The probability of the classification pre-set categories that each pre-set categories identify in being identified by multiple pre-set categories, and will be multiple default
The pre-set categories of maximum probability identify the classification that identified pre-set categories are determined as target audio in classification logotype.This is sorted
Journey is simple and practicable, and classification effectiveness is higher, and due to being not necessarily to be segmented target audio, remains the integrality of target audio,
Therefore classification accuracy is also higher.
Optionally, the audio frequency characteristics of target audio are extracted by presetting the convolutional network that grader includes, including:
Target audio is divided into multiple audio fragments by convolutional network;
By convolutional network by the feature extraction of each audio fragment in multiple audio fragment be a feature;
The feature of extraction is made up of to the audio frequency characteristics of target audio convolutional network.
Optionally, the temporal aspect of the audio frequency characteristics, packet are extracted by presetting the thresholding recirculating network that grader includes
It includes:
The first temporal aspect of the audio frequency characteristics is extracted by thresholding recirculating network;
Corresponding first characteristic of division of the first temporal aspect is determined by fully-connected network;
Each element in first characteristic of division is substituted into the first preset function, obtains each element in the first characteristic of division
Weight, the element in element and the first characteristic of division in the first temporal aspect corresponds;
For the either element A in the first temporal aspect, by the weight of elements A corresponding element in the first characteristic of division
It is multiplied with elements A, obtains corresponding first element of elements A;
Each element in first temporal aspect is replaced with into corresponding first element, obtains the second temporal aspect as sound
The temporal aspect of frequency feature.
Optionally, according to temporal aspect, the class of target audio is determined by presetting the fully-connected network that grader includes
The probability for the pre-set categories that each pre-set categories identify in not identified by multiple pre-set categories, including:
The second characteristic of division of temporal aspect is determined by fully-connected network;
The element in the second characteristic of division is substituted into the second preset function by fully-connected network, obtains the class of target audio
The probability for the pre-set categories that each pre-set categories identify in not identified by multiple pre-set categories.
Optionally, default grader further includes batch standardization at least one of network and pond network.
Optionally, according to the frequency information of audio signal, target audio is converted audio signals into, including:
The mel-frequency cepstrum coefficient MFCC for determining the audio signal generates target sound according to the MFCC of the audio signal
Frequently;Alternatively,
The frequency spectrum for determining the audio signal generates target audio according to the frequency spectrum of the audio signal.
Optionally, it before the audio frequency characteristics that target audio is extracted by presetting the convolutional network that grader includes, also wraps
It includes:
Obtain multiple trained audio collections, all trained sounds that each of multiple trained audio collection training audio collection includes
The corresponding same pre-set categories mark of frequency;
Trained disaggregated model is treated using multiple trained audio collection to be trained, and obtains default grader.
Above-mentioned all optional technical solutions, can form the alternative embodiment of the present invention according to any combination, and the present invention is real
Example is applied no longer to repeat this one by one.
Fig. 2 is a kind of flow chart of audio frequency classification method provided in an embodiment of the present invention.The embodiment of the present invention will combine Fig. 2
Expansion discussion is carried out to embodiment shown in FIG. 1.Referring to Fig. 2, this approach includes the following steps:
Step 201:Acquire audio signal.
In practical application, generally sound is sampled with fixed sample frequency, collected audio signal can wrap
Multiple sampled points are included, before and after the sampling instant that each sampled point is represented before and after the position of each sampled point, the value of each sampled point
For the amplitude of each sampled point, multiple sampled point can be linked to be an audio curve, only include at this time to shake in the audio curve
Width information, that is to say, which is one-dimensional audio signal.For example, Fig. 3 is the audio curve of the audio signal, the audio
The value of each point is the amplitude of each point on curve, which is one-dimensional audio signal.
Step 202:The audio signal is intercepted or supplemented, when the duration of the audio signal is adjusted to default
It is long.
It should be noted that preset duration can be in advance configured according to different demands, and preset duration can be arranged
It is longer, for example, preset duration can be 2 minutes, 3 minutes, 4 minutes etc..
It is by the audio signal in addition, the duration of the audio signal is adjusted to preset duration in the embodiment of the present invention
Regular to one more suitable range of duration, so as to improve the accuracy subsequently classified to the audio signal.
Specifically, it is preset duration by audio signal interception when the duration of the audio signal is more than preset duration
Audio signal;When the duration of the audio signal is less than preset duration, the audio that audio signal supplement is preset duration is believed
Number.
Wherein, when the duration of the audio signal is more than preset duration, by the sound that audio signal interception is preset duration
The realization process of frequency signal can be:A time point is selected from the time point of the audio signal, is opened from the time point of selection
Begin, intercepts the audio signal of preset duration.
For example, preset duration is 3 minutes, then a time point can be selected from the time point of the audio signal, it is assumed that
The time point selected is 1 second, then can intercept 3 minutes audio signals since 1 second time point of the audio signal.
Wherein, when the duration of the audio signal is less than preset duration, by the sound that audio signal supplement is preset duration
The realization process of frequency signal can be:The audio signal is first converted into digital signal, then is continued at the end of the digital signal
0 is mended, until the digital signal reaches preset duration, the digital signal for being then up to preset duration is converted to audio letter
Number.
Step 203:According to the frequency information of the audio signal, which is converted into target audio.
Specifically, the realization process of step 203 may include the possible realization method of the following two kinds.
The first possible realization method:Determine MFCC (the Mel-scale Frequency of the audio signal
Cepstral Coefficients, mel-frequency cepstrum coefficient), according to the MFCC of the audio signal, generate target audio.
It should be noted that mel-frequency is non-linear depending on being judged the sense organ of equidistant change in pitch based on human ear
Frequency, with actual frequency, at nonlinear correspondence relation, MFCC can be calculated based on this relationship in it.
For example, the nonlinear correspondence relation between mel-frequency and actual frequency can use formulaApproximate representation, wherein Mel (f) is mel-frequency, and f is actual frequency.
Wherein it is determined that the realization process of the MFCC of the audio signal can be:Preemphasis is carried out to the audio signal;To pre-
Audio signal after exacerbation carries out framing;Adding window is carried out to each frame in the audio signal after framing;By the audio after adding window
Signal is converted by time domain to frequency domain, and the frequency spectrum of the audio signal is obtained;The work(of the audio signal is obtained to the frequency spectrum modulus square
Rate is composed;The power spectrum is filtered by the triangle bandpass filter of one group of Meier scale;To the triangle bandpass filter group
Output seek logarithm, obtain logarithmic energy;To the logarithmic energy carry out DCT (discrete cosine transform, from
Dissipate cosine transform) obtain MFCC.
It should be noted that preemphasis can mend the decaying of the high fdrequency component of audio signal in transmission process
It repays, audio signal can be carried out preemphasis by when practical application by a high-pass filter.
In addition, framing refers to that audio signal is divided into multiple short time intervals, each short time interval is a frame.Since audio is believed
Number stationarity is only presented in a relatively short period of time, it is therefore desirable to framing be carried out to audio signal, and to avoid dropped audio signal
Information, can there is one section of overlapping region, overlapping region to be generally the 1/2 or 1/3 of frame length between consecutive frame.
It, can be to the work(furthermore after being filtered to the power spectrum by the triangle bandpass filter of one group of Meier scale
Rate spectrum is smoothed, and harmonic carcellation interference highlights the formant of audio signal.
Wherein, can be to the realization process of each frame progress adding window in the audio signal after framing:After framing
Each frame in audio signal is multiplied by specified window.After each frame in the audio signal is multiplied by specified window, each frame can be eliminated
The signal discontinuity that both ends are likely to result in.Specified window can be in advance configured according to different demands, for example, specified window can
Think hamming (Hamming) window etc..
Wherein, the audio signal after adding window is converted by time domain to frequency domain, obtains the realization of the frequency spectrum of the audio signal
Journey can be:Audio signal after adding window is carried out FFT (Fast Fourier Transform, Fast Fourier Transform (FFT)) to obtain
The frequency spectrum of the audio signal.It is of course also possible to otherwise convert the audio signal after adding window to frequency domain by time domain, obtain
To the frequency spectrum of the audio signal, the embodiment of the present invention is not construed as limiting this.
Wherein, according to the MFCC of the audio signal, generating the realization process of target audio can be:According to the audio signal
Duration, the time point of the audio signal is determined, using the time point of the audio signal as horizontal axis coordinate, by the audio signal
MFCC generates target audio as ordinate of orthogonal axes.
It should be noted that the time point of the audio signal is used to indicate the acquisition progress of the audio signal, such as the audio
The time point of signal can be 1 second, 2 seconds etc., and the sampled point that time point is 1 second in the audio signal is to be passed through after starting acquisition
The sampled point obtained at 1 second is crossed, the sampled point that time point is 2 seconds in the audio signal is to be obtained when after starting acquisition by 2 seconds
The sampled point arrived.
In addition, target audio includes the time point information and MFCC information of the audio signal at this time, thus target audio is
Two-dimentional audio signal.For example, as shown in figure 4, target audio is two-dimentional audio signal, the horizontal axis coordinate of target audio is the audio
The time point of signal, the ordinate of orthogonal axes of target audio are the MFCC of the audio signal.
It is worth noting that can be according to the MFCC of the audio signal, next life in the first above-mentioned possible realization method
At target audio, the target audio so generated includes not only the acoustic feature of the audio signal, but also operation dimension is relatively low,
Calculation amount when subsequently classifying to target audio so as to reduce.
Second of possible realization method:The frequency spectrum for determining the audio signal generates mesh according to the frequency spectrum of the audio signal
Mark with phonetic symbols frequency.
Specifically, it is determined that the realization process of the frequency spectrum of the audio signal can be:The audio signal is converted by time domain
Frequency domain obtains the frequency spectrum of the audio signal;The frequency of the audio signal is obtained from the frequency spectrum of the audio signal;According to the audio
The duration of signal determines the time point of the audio signal;Using the time point of the audio signal as horizontal axis coordinate, which is believed
Number frequency as ordinate of orthogonal axes, generate target audio.
Wherein, which is converted into frequency domain by time domain, the realization process for obtaining the frequency spectrum of the audio signal can be with
For:Audio signal progress FFT is obtained into the frequency spectrum of the audio signal.It is of course also possible to which the audio is believed otherwise
It number is converted to frequency domain by time domain, obtains the frequency spectrum of the audio signal, the embodiment of the present invention is not construed as limiting this.
It should be noted that target audio includes the time point information and actual frequency information of the audio signal at this time, because
And target audio is two-dimentional audio signal.For example, as shown in figure 5, target audio is two-dimentional audio signal, the horizontal axis of target audio
Coordinate is the time point of the audio signal, and the ordinate of orthogonal axes of target audio is the frequency of the audio signal.
It is worth noting that can be according to the frequency spectrum of the audio signal, next life in above-mentioned second of possible realization method
At target audio, the target audio so generated contains the complete characterization of the audio signal, so as to improve subsequently to mesh
Accuracy when mark with phonetic symbols frequency is classified.
It should be noted that after getting target audio according to above-mentioned steps 201- steps 203, it can also be according to as follows
Step 204- steps 207 by presetting grader determine the classification of target audio.
In addition, default grader is for classifying to audio, in practical application, some audio is input into default classification
After device, default grader can determine the probability for the pre-set categories that the classification of the audio is identified by each pre-set categories simultaneously
Output.Default grader may include convolutional network, thresholding recirculating network and fully-connected network, and default grader can also wrap
Include at least one of batch standardization network, pond network etc..
Step 204:The audio frequency characteristics of target audio are extracted by presetting the convolutional network that grader includes.
Specifically, target audio is divided by multiple audio fragments by convolutional network, it then again should by convolutional network
The feature extraction of each audio fragment in multiple audio fragments is a feature, finally by convolutional network by the feature of extraction
Form the audio frequency characteristics of target audio.
It should be noted that since the feature extraction of each audio fragment can be a feature by convolutional network, realize
To the dimensionality reduction of the feature of each audio fragment, therefore the shortening to the duration of target audio may be implemented so that the sound extracted
The dimension of frequency feature is relatively low, so as to convenient for other networks in subsequently default grader directly to the lower audio of the dimension
Feature is handled.
Further, when further including criticizing standardization (BatchNormalization) network in default grader, pass through
It, can also be by batch standardization network to the audio frequency characteristics that extract after the convolutional network extracts the audio frequency characteristics of target audio
It is handled, to make audio frequency characteristics be distributed in a stable range, to improve other networks in follow-up default grader
Processing accuracy to audio frequency characteristics.
Specifically, criticizing standardization network can subtract each element in the audio frequency characteristics extracted in the audio frequency characteristics
All elements average value so that the average value of all elements in the new audio frequency characteristics obtained after subtracting each other is 0.Certainly,
In practical applications, can also otherwise the audio frequency characteristics extracted be handled by criticizing standardization network, the present invention
Embodiment is not construed as limiting this.
Further, when further including pond network in default grader, target audio is extracted by the convolutional network
Audio frequency characteristics after, the audio frequency characteristics extracted can also be handled by pond network, to reduce the audio extracted
Feature, to reduce the operand of other networks in follow-up default grader.
Specifically, all elements in the audio frequency characteristics extracted can be divided into multiple element groups by pond network, to each
Element in a element group is averaged or maximum value, obtains the corresponding second element of each element group, by each element group pair
The second element answered forms new audio frequency characteristics.Certainly, in practical applications, pond network can also be otherwise to carrying
The audio frequency characteristics got are handled, and the embodiment of the present invention is not construed as limiting this.
It, can be by audio frequency characteristics that the convolutional network extracts directly by following it should be noted that in practical application
Step 205 is worked on, alternatively, can also the audio frequency characteristics that the convolutional network extracts be passed through batch standardization network
And/or pond network is handled, and to obtain new audio frequency characteristics, which is continued by following step 205
It is operated.
Further, before the audio frequency characteristics that target audio is extracted by presetting the convolutional network that grader includes, also
Default grader can first be generated.Specifically, multiple trained audio collections are obtained, each of multiple trained audio collection trains sound
Frequently collect all trained audios for including and correspond to same pre-set categories mark, trained classification is treated using multiple trained audio collection
Model is trained, and obtains default grader.
It should be noted that each of multiple trained audio collection training audio collection is both provided with sample labeling, it is each
The sample labeling of training audio collection is the corresponding pre-set categories mark of each trained audio collection.For example, sample labeling can wrap
Positive sample label and negative sample label are included, positive sample label can be that pre-set categories identify 1, and negative sample label can be default class
Do not identify 0.
Wherein, it treats trained disaggregated model using multiple trained audio collection to be trained, obtains default grader
Realization process can be:A trained audio collection is selected from multiple trained audio collection, to the training audio collection selected
Following processing is executed, until having handled each of multiple trained audio collection training audio collection:The training that will be selected
Input of the audio collection as disaggregated model to be trained is identified according to the output data of the disaggregated model from multiple pre-set categories
The corresponding reference category mark of each of training audio collection that middle determination is selected training audio, then by each trained sound
Frequently corresponding reference category mark pre-set categories corresponding with each trained audio, which identify, is compared, according to comparison result pair
Parameter in the disaggregated model is adjusted;Disaggregated model after the completion of parameter adjustment is determined as default grader.
Wherein, the training sound selected is determined from multiple pre-set categories mark according to the output data of the disaggregated model
Frequently the realization process of the corresponding reference category mark of each of concentration training audio can be:For the training audio collection selected
In any trained audio A, the classification of the training audio A that the output data according to the disaggregated model includes is multiple default
Each pre-set categories identify the probability of identified pre-set categories in classification logotype, most by probability in multiple pre-set categories mark
Big pre-set categories mark is determined as training the corresponding reference category mark of audio A.
Step 205:The temporal aspect of the audio frequency characteristics is extracted by presetting the thresholding recirculating network that grader includes.
It should be noted that thresholding recirculating network in practical applications can be GRU (Gated Recurrent Unit,
Thresholding cycling element), feature of the input data on time dimension can be extracted.
In addition, since audio signal is often with time correlation, each element in the audio frequency characteristics extracted is not only
It may be related to the element before it, it is also possible to it is related to element after it, it that is to say, each member in audio frequency characteristics
There may be dependences with the element before and after it for element.And thresholding recirculating network is bi-directional cyclic network, thresholding recirculating network can
Usually to extract the feature of the element, the spy of each element of extraction according to member of some element in the audio frequency characteristics before and after it
Sign can form the temporal aspect of the audio frequency characteristics.
Specifically, the sequential spy of the audio frequency characteristics can be directly extracted by presetting the thresholding recirculating network in grader
Sign, alternatively, the temporal aspect of the audio frequency characteristics can also jointly be extracted by thresholding recirculating network and fully-connected network.
Wherein, when extracting the temporal aspect of the audio frequency characteristics jointly by thresholding recirculating network and fully-connected network, pass through
Thresholding recirculating network extracts the first temporal aspect of the audio frequency characteristics;Determine that the first temporal aspect is corresponding by fully-connected network
First characteristic of division;Each element in first characteristic of division is substituted into the first preset function, is obtained each in the first characteristic of division
The weight of a element, the element in element and the first characteristic of division in the first temporal aspect correspond;For the first sequential
Either element A in feature, by elements A, the weight of corresponding element is multiplied with elements A in the first characteristic of division, obtains element
Corresponding first elements of A;Each element in first temporal aspect is replaced with into corresponding first element, obtains the second sequential spy
Levy the temporal aspect as the audio frequency characteristics.
It should be noted that including multiple nodes in fully-connected network comprising each node and previous net
All nodes that network includes are connected, and fully-connected network can integrate all elements in the first temporal aspect extracted before
Get up to obtain the first characteristic of division.
In addition, the first preset function can be in advance configured according to different demands, for example, the first preset function can be
Softmax functions etc..
Wherein, determine that the realization process of corresponding first characteristic of division of the first temporal aspect can be with by fully-connected network
For:First temporal aspect of input and parameter preset matrix multiple are obtained the first temporal aspect corresponding by fully-connected network
One characteristic of division.Certainly, in practical applications, fully-connected network (such as convolution mode) can also determine first otherwise
Corresponding first characteristic of division of temporal aspect, this is not limited by the present invention.
It should be noted that parameter preset matrix can be pre-set according to different demands.Usual first temporal aspect can
Think that the row vector of a 1*N, parameter preset matrix can be the matrix of a N*N, the first characteristic of division obtained at this time is i.e.
For the row vector of a 1*N, N is positive integer.
It is worth noting that in embodiments of the present invention, inserting attention mechanism, which can be by thresholding
The temporal aspect that recirculating network extracts first is converted to the weight of each element in temporal aspect, and the weight of each element is used for
Importance of each element in temporal aspect is described, in this way, can subsequently refer to the weight of each element in temporal aspect
Continue to handle temporal aspect, to increase influence of the larger element of weight to temporal aspect, reduces the smaller member of weight
Interference of the element to temporal aspect so that the temporal aspect extracted is more accurate.
It further, can be by sequential spy after obtaining the second temporal aspect as the temporal aspect of the audio frequency characteristics
Sign is directly worked on by following step 206;Alternatively, can also by the temporal aspect again by thresholding recirculating network into
Row processing, to obtain new temporal aspect, which is worked on by following step 206;Alternatively, working as
Further include that when criticizing standardization network, which can also be passed through to thresholding recirculating network and batch standardization in default grader
Network is handled, and come obtained new temporal aspect, which is continued to grasp by following step 206
Make, which is distributed in a stable range, thus can improve other networks in follow-up default grader
Processing accuracy to temporal aspect.
Step 206:According to the temporal aspect, target audio is determined by presetting the fully-connected network that grader includes
The probability of the classification pre-set categories that each pre-set categories identify in being identified by multiple pre-set categories.
It should be noted that multiple pre-set categories marks can be in advance configured according to different demands, for example, multiple pre-
If classification logotype is mark 1 and mark 0, wherein the classification that mark 1 is identified is high quality classification, the classification that mark 0 is identified
For low quality classification, alternatively, the classification that mark 1 is identified is voice classification, the classification that mark 0 is identified is audio accompaniment class
Not.
Specifically, the second characteristic of division that the temporal aspect is determined by fully-connected network, by fully-connected network by
Element in two characteristic of division substitutes into the second preset function, and the classification for obtaining target audio is every in multiple pre-set categories mark
A pre-set categories identify the probability of identified pre-set categories.
It should be noted that the second preset function can be in advance configured according to different demands, such as the second preset function
Can be softmax functions etc., the output valve of the second preset function is a vector, and each element in the vector represents target
The probability for the pre-set categories that the classification of audio is identified by the corresponding pre-set categories of each element.For example, multiple default
Classification logotype is mark 1, mark 2, mark 3, and the second preset function is softmax functions, and by the member in the second characteristic of division
After element substitutes into the softmax functions, the output valve of the softmax functions is vectorial (0.02,0.08,0.9), wherein element
0.02 corresponding pre-set categories are identified as mark 1, and 0.08 corresponding pre-set categories of element are identified as mark 2, and element 0.9 is corresponding
Pre-set categories are identified as mark 3, then the probability for the pre-set categories that the classification of target audio is identified by mark 1 is 0.02, target
The probability for the pre-set categories that the classification of audio is identified by mark 2 is 0.08, and the classification of target audio is identified pre- by mark 3
If the probability of classification is 0.9.
In practical application, when default grader is the default grader of two classification, the realization process of step 206 can be with
For:Corresponding second characteristic of division of the temporal aspect is determined by fully-connected network, and second is classified by fully-connected network
Element in feature substitutes into third preset function, and the classification for obtaining target audio is preset by what the first pre-set categories identified
The probability of classification subtracts the probability for the pre-set categories that the classification of target audio is identified by the first pre-set categories by 1, obtains
The probability for the pre-set categories that the classification of target audio is identified by the second pre-set categories.
It should be noted that when default grader is the default grader of two classification, multiple pre-set categories mark
Quantity is two, i.e., multiple pre-set categories are identified as the first pre-set categories mark and the second pre-set categories mark.
In addition, third preset function can be in advance configured according to different demands, as third preset function can be
Sigmoid functions etc..
For example, multiple pre-set categories are identified as the first pre-set categories mark and the second pre-set categories mark, third is default
Function is sigmoid functions, and after the element in the second characteristic of division is substituted into the sigmoid functions, the sigmoid functions
Output valve is 0.8, then the probability for the pre-set categories that the classification of target audio is identified by the first pre-set categories is 0.8, mesh
The probability for the pre-set categories that the classification of mark with phonetic symbols frequency is identified by the second pre-set categories is 0.2.
Step 207:The pre-set categories of maximum probability identify identified pre-set categories during multiple pre-set categories are identified
It is determined as the classification of target audio.
For example, there are two pre-set categories to identify, 1 and mark 0 are respectively identified, the classification that mark 1 is identified is high quality
Classification, the classification that mark 0 is identified are low quality classification, it is assumed that the high quality class that the classification of target audio is identified by mark 1
Other probability is 0.8, and the probability for the low quality classification that the classification of target audio is identified by mark 0 is 0.2, then at this time can will
The pre-set categories that the mark 1 of maximum probability is identified in the two pre-set categories mark are determined as the classification of target audio, i.e. mesh
The classification of mark with phonetic symbols frequency is high quality classification.
In order to make it easy to understand, to carry out in detail audio frequency classification method provided in an embodiment of the present invention with reference to Fig. 6
It illustrates.
Referring to Fig. 6, default grader includes convolutional network C, thresholding recirculating network G (bi), fully-connected network D, batch specification
Change network B and pond network P.
First, the audio frequency characteristics that target audio is extracted by convolutional network C pass through crowd standardization network B and pond network P
The audio frequency characteristics extracted are handled.Then, the first sequential that audio frequency characteristics are extracted by thresholding recirculating network G (bi) is special
Sign handles the first temporal aspect extracted by batch standardization network B, then determines first by fully-connected network D
The weight of each element in corresponding first characteristic of division of temporal aspect.Later, for any member in the first temporal aspect
Plain A, by element multiplication network (Element-wise Multiply) by elements A in the first characteristic of division corresponding member
The weight of element is multiplied with elements A, obtains corresponding first element of elements A, and each element in the first temporal aspect is replaced
For corresponding first element of each element, temporal aspect of second temporal aspect as audio frequency characteristics is obtained.Later, pass through thresholding
Recirculating network G (bi) and batch standardization network B handle the temporal aspect of audio frequency characteristics.Later, pass through fully-connected network D
Temporal aspect is handled, the classification for obtaining target audio is marked by each pre-set categories in multiple pre-set categories mark
The probability of the pre-set categories of knowledge.Finally, the pre-set categories of maximum probability are identified in multiple pre-set categories being identified
Pre-set categories are determined as the classification of target audio.
In embodiments of the present invention, audio signal is first acquired, then the audio signal is intercepted or supplemented, it should
The duration of audio signal is adjusted to preset duration, and the duration of the audio signal will be by regular to one more suitable model at this time
It encloses, then according to the frequency information of the audio signal, which is converted into target audio.Later, pass through default classification
The audio frequency characteristics for the convolutional network extraction target audio that device includes, realize the dimensionality reduction to the feature of each audio fragment so that
The dimension of the audio frequency characteristics extracted is relatively low.Later, the audio is extracted by presetting the thresholding recirculating network that grader includes
The temporal aspect of feature.According to the temporal aspect, target audio is determined by presetting the fully-connected network that grader includes
The probability of the classification pre-set categories that each pre-set categories identify in being identified by multiple pre-set categories, and will be multiple default
The pre-set categories of maximum probability identify the classification that identified pre-set categories are determined as target audio in classification logotype.This is sorted
Journey is simple and practicable, and classification effectiveness is higher, and due to being not necessarily to be segmented target audio, remains the integrality of target audio,
Therefore classification accuracy is also higher.
Next audio classification device provided in an embodiment of the present invention is introduced.
Fig. 7 is a kind of structural schematic diagram of audio classification device provided in an embodiment of the present invention.Referring to Fig. 7, the device packet
Include acquisition module 301, adjustment module 302, conversion module 303, the first extraction module 304, the second extraction module 305, first really
Cover half block 306 and the second determining module 307.
Acquisition module 301, for acquiring audio signal.
The duration of the audio signal is adjusted to by adjustment module 302 for the audio signal to be intercepted or supplemented
Preset duration.
The audio signal is converted to target audio by conversion module 303 for the frequency information according to the audio signal.
First extraction module 304, the audio for extracting target audio by presetting the convolutional network that grader includes
Feature.
Second extraction module 305, for extracting audio frequency characteristics by presetting the thresholding recirculating network that grader includes
Temporal aspect.
First determining module 306, for according to the temporal aspect, the fully-connected network for including by default grader to be true
The probability of the classification for the audio that the sets the goal pre-set categories that each pre-set categories identify in being identified by multiple pre-set categories.
Second determining module 307, the pre-set categories for maximum probability in identifying multiple pre-set categories are marked
The pre-set categories of knowledge are determined as the classification of target audio.
Optionally, referring to Fig. 8, the first extraction module 304 includes:
Submodule 3041 is split, target audio is divided into multiple audio fragments for passing through convolutional network.
First extracting sub-module 3042, for passing through convolutional network by each audio fragment in multiple audio fragment
Feature extraction is a feature.
Submodule 3043 is formed, the audio frequency characteristics for the feature of extraction to be made up of to target audio convolutional network.
Optionally, referring to Fig. 9, the second extraction module 305 includes:
Second extracting sub-module 3051, the first temporal aspect for extracting audio frequency characteristics by thresholding recirculating network.
First determination sub-module 3052 determines that corresponding first classification of the first temporal aspect is special for passing through fully-connected network
Sign.
First substitutes into submodule 3053, for each element in the first characteristic of division to be substituted into the first preset function, obtains
To the weight of each element in the first characteristic of division, the element in element and the first characteristic of division in the first temporal aspect is one by one
It is corresponding.
Multiplication submodule 3054 is used for for the either element A in the first temporal aspect, and elements A is special in the first classification
The weight of corresponding element is multiplied with elements A in sign, obtains corresponding first element of elements A.
Submodule 3055 is replaced to obtain for each element in the first temporal aspect to be replaced with corresponding first element
Temporal aspect to the second temporal aspect as audio frequency characteristics.
Optionally, referring to Figure 10, the first determining module 306 includes:
Second determination sub-module 3061, the second characteristic of division for determining temporal aspect by fully-connected network.
Second substitutes into submodule 3062, pre- for the element in the second characteristic of division to be substituted into second by fully-connected network
If function, the pre-set categories that the classification of target audio is identified by each pre-set categories in multiple pre-set categories mark are obtained
Probability.
Optionally, default grader further includes batch standardization at least one of network and pond network.
Optionally, referring to Figure 11, conversion module 303 includes:
Third determination sub-module 3031, the mel-frequency cepstrum coefficient MFCC for determining the audio signal, according to the sound
The MFCC of frequency signal generates target audio.
4th determination sub-module 3032, the frequency spectrum for determining the audio signal are generated according to the frequency spectrum of the audio signal
Target audio.
Optionally, referring to Figure 12, which further includes:
Acquisition module 308, for obtaining multiple trained audio collections, each of multiple trained audio collection trains audio collection
Including all trained audios correspond to same pre-set categories mark.
Training module 309 is trained for treating trained disaggregated model using multiple trained audio collection, obtains pre-
If grader.
In embodiments of the present invention, audio signal is first acquired, then the audio signal is intercepted or supplemented, it should
The duration of audio signal is adjusted to preset duration, and the duration of the audio signal will be by regular to one more suitable model at this time
It encloses, then according to the frequency information of the audio signal, which is converted into target audio.Later, pass through default classification
The audio frequency characteristics for the convolutional network extraction target audio that device includes, realize the dimensionality reduction to the feature of each audio fragment so that
The dimension of the audio frequency characteristics extracted is relatively low.Later, the audio is extracted by presetting the thresholding recirculating network that grader includes
The temporal aspect of feature.According to the temporal aspect, target audio is determined by presetting the fully-connected network that grader includes
The probability of the classification pre-set categories that each pre-set categories identify in being identified by multiple pre-set categories, and will be multiple default
The pre-set categories of maximum probability identify the classification that identified pre-set categories are determined as target audio in classification logotype.This is sorted
Journey is simple and practicable, and classification effectiveness is higher, and due to being not necessarily to be segmented target audio, remains the integrality of target audio,
Therefore classification accuracy is also higher.
It should be noted that:The audio classification device that above-described embodiment provides is when classifying to audio, only with above-mentioned
The division progress of each function module, can be as needed and by above-mentioned function distribution by different for example, in practical application
Function module is completed, i.e., the internal structure of device is divided into different function modules, with complete it is described above whole or
Partial function.In addition, the audio classification device that above-described embodiment provides belongs to same design with audio frequency classification method embodiment,
Specific implementation process refers to embodiment of the method, and which is not described herein again.
Figure 13 is a kind of structural schematic diagram of audio classification device provided in an embodiment of the present invention.Referring to Figure 13, the audio
Sorter can be terminal 400, which can be:Smart mobile phone, tablet computer, MP3 player (Moving
Picture Experts Group Audio Layer III, dynamic image expert's compression standard audio level 3), MP4
(Moving Picture Experts GroupAudio Layer IV, dynamic image expert's compression standard audio level 4) is broadcast
Put device, laptop or desktop computer.Terminal 400 be also possible to be referred to as user equipment, portable terminal, laptop terminal,
Other titles such as terminal console.
In general, terminal 400 includes:Processor 401 and memory 402.
Processor 401 may include one or more processing cores, such as 4 core processors, 8 core processors etc..Place
DSP (Digital Signal Processing, Digital Signal Processing), FPGA (Field- may be used in reason device 401
Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array, may be programmed
Logic array) at least one of example, in hardware realize.Processor 401 can also include primary processor and coprocessor, master
Processor is the processor for being handled data in the awake state, also referred to as CPU (Central Processing
Unit, central processing unit);Coprocessor is the low power processor for being handled data in the standby state.
In some embodiments, processor 401 can be integrated with GPU (Graphics Processing Unit, image processor),
GPU is used to be responsible for the rendering and drafting of content to be shown needed for display screen.In some embodiments, processor 401 can also wrap
AI (Artificial Intelligence, artificial intelligence) processor is included, the AI processors are for handling related machine learning
Calculating operation.
Memory 402 may include one or more computer readable storage mediums, which can
To be non-transient.Memory 402 may also include high-speed random access memory and nonvolatile memory, such as one
Or multiple disk storage equipments, flash memory device.In some embodiments, the non-transient computer in memory 402 can
Storage medium is read for storing at least one instruction, at least one instruction is for performed to realize this Shen by processor 401
Please in embodiment of the method provide audio frequency classification method.
In some embodiments, terminal 400 is also optional includes:Peripheral device interface 403 and at least one peripheral equipment.
It can be connected by bus or signal wire between processor 401, memory 402 and peripheral device interface 403.Each peripheral equipment
It can be connected with peripheral device interface 403 by bus, signal wire or circuit board.Specifically, peripheral equipment includes:Radio circuit
404, at least one of touch display screen 405, camera 406, voicefrequency circuit 407, positioning component 408 and power supply 409.
Peripheral device interface 403 can be used for I/O (Input/Output, input/output) is relevant at least one outer
Peripheral equipment is connected to processor 401 and memory 402.In some embodiments, processor 401, memory 402 and peripheral equipment
Interface 403 is integrated on same chip or circuit board;In some other embodiments, processor 401, memory 402 and outer
Any one or two in peripheral equipment interface 403 can realize on individual chip or circuit board, the present embodiment to this not
It is limited.
Radio circuit 404 is for receiving and emitting RF (Radio Frequency, radio frequency) signal, also referred to as electromagnetic signal.It penetrates
Frequency circuit 404 is communicated by electromagnetic signal with communication network and other communication equipments.Radio circuit 404 turns electric signal
It is changed to electromagnetic signal to be sent, alternatively, the electromagnetic signal received is converted to electric signal.Optionally, radio circuit 404 wraps
It includes:Antenna system, RF transceivers, one or more amplifiers, tuner, oscillator, digital signal processor, codec chip
Group, user identity module card etc..Radio circuit 404 can be carried out by least one wireless communication protocol with other terminals
Communication.The wireless communication protocol includes but not limited to:WWW, Metropolitan Area Network (MAN), Intranet, each third generation mobile communication network (2G, 3G,
4G and 5G), WLAN and/or WiFi (Wireless Fidelity, Wireless Fidelity) network.In some embodiments, it penetrates
Frequency circuit 404 can also include the related circuits of NFC (Near Field Communication, wireless near field communication), this
Application is not limited this.
Display screen 405 is for showing UI (User Interface, user interface).The UI may include figure, text, figure
Mark, video and its their arbitrary combination.When display screen 405 is touch display screen, display screen 405 also there is acquisition to show
The ability of the surface of screen 405 or the touch signal of surface.The touch signal can be used as control signal to be input to processor
401 are handled.At this point, display screen 405 can be also used for providing virtual push button and/or dummy keyboard, also referred to as soft button and/or
Soft keyboard.In some embodiments, display screen 405 can be one, and the front panel of terminal 400 is arranged;In other embodiments
In, display screen 405 can be at least two, be separately positioned on the different surfaces of terminal 400 or in foldover design;In still other reality
Apply in example, display screen 405 can be flexible display screen, be arranged on the curved surface of terminal 400 or fold plane on.Even, it shows
Display screen 405 can also be arranged to non-rectangle irregular figure, namely abnormity screen.LCD (Liquid may be used in display screen 405
Crystal Display, liquid crystal display), OLED (Organic Light-Emitting Diode, Organic Light Emitting Diode)
Etc. materials prepare.
CCD camera assembly 406 is for acquiring image or video.Optionally, CCD camera assembly 406 include front camera and
Rear camera.In general, the front panel in terminal is arranged in front camera, rear camera is arranged at the back side of terminal.One
In a little embodiments, rear camera at least two is main camera, depth of field camera, wide-angle camera, focal length camera shooting respectively
Any one in head, to realize that main camera and the fusion of depth of field camera realize background blurring function, main camera and wide-angle
Camera fusion realizes that pan-shot and VR (Virtual Reality, virtual reality) shooting functions or other fusions are clapped
Camera shooting function.In some embodiments, CCD camera assembly 406 can also include flash lamp.Flash lamp can be monochromatic warm flash lamp,
It can also be double-colored temperature flash lamp.Double-colored temperature flash lamp refers to the combination of warm light flash lamp and cold light flash lamp, be can be used for not
With the light compensation under colour temperature.
Voicefrequency circuit 407 may include microphone and loud speaker.Microphone is used to acquire the sound wave of user and environment, and will
Sound wave, which is converted to electric signal and is input to processor 401, to be handled, or is input to radio circuit 404 to realize voice communication.
For stereo acquisition or the purpose of noise reduction, microphone can be multiple, be separately positioned on the different parts of terminal 400.Mike
Wind can also be array microphone or omnidirectional's acquisition type microphone.Loud speaker is then used to that processor 401 or radio circuit will to be come from
404 electric signal is converted to sound wave.Loud speaker can be traditional wafer speaker, can also be piezoelectric ceramic loudspeaker.When
When loud speaker is piezoelectric ceramic loudspeaker, the audible sound wave of the mankind can be not only converted electrical signals to, it can also be by telecommunications
Number the sound wave that the mankind do not hear is converted to carry out the purposes such as ranging.In some embodiments, voicefrequency circuit 407 can also include
Earphone jack.
Positioning component 408 is used for the current geographic position of positioning terminal 400, to realize navigation or LBS (Location
Based Service, location based service).Positioning component 408 can be the GPS (Global based on the U.S.
Positioning System, global positioning system), China dipper system or Russia Galileo system positioning group
Part.
Power supply 409 is used to be powered for the various components in terminal 400.Power supply 409 can be alternating current, direct current,
Disposable battery or rechargeable battery.When power supply 409 includes rechargeable battery, which can be wired charging electricity
Pond or wireless charging battery.Wired charging battery is the battery to be charged by Wireline, and wireless charging battery is by wireless
The battery of coil charges.The rechargeable battery can be also used for supporting fast charge technology.
In some embodiments, terminal 400 further include there are one or multiple sensors 410.The one or more sensors
410 include but not limited to:Acceleration transducer 411, gyro sensor 412, pressure sensor 413, fingerprint sensor 414,
Optical sensor 415 and proximity sensor 416.
The acceleration that acceleration transducer 411 can detect in three reference axis of the coordinate system established with terminal 400 is big
It is small.For example, acceleration transducer 411 can be used for detecting component of the acceleration of gravity in three reference axis.Processor 401 can
With the acceleration of gravity signal acquired according to acceleration transducer 411, control touch display screen 405 is regarded with transverse views or longitudinal direction
Figure carries out the display of user interface.Acceleration transducer 411 can be also used for game or the acquisition of the exercise data of user.
Gyro sensor 412 can be with the body direction of detection terminal 400 and rotational angle, and gyro sensor 412 can
To cooperate with acquisition user to act the 3D of terminal 400 with acceleration transducer 411.Processor 401 is according to gyro sensor 412
Following function may be implemented in the data of acquisition:When action induction (for example changing UI according to the tilt operation of user), shooting
Image stabilization, game control and inertial navigation.
The lower layer of side frame and/or touch display screen 405 in terminal 400 can be arranged in pressure sensor 413.Work as pressure
The gripping signal that user can be detected in the side frame of terminal 400 to terminal 400 is arranged in sensor 413, by processor 401
Right-hand man's identification or prompt operation are carried out according to the gripping signal that pressure sensor 413 acquires.When the setting of pressure sensor 413 exists
When the lower layer of touch display screen 405, the pressure operation of touch display screen 405 is realized to UI circle according to user by processor 401
Operability control on face is controlled.Operability control includes button control, scroll bar control, icon control, menu
At least one of control.
Fingerprint sensor 414 is used to acquire the fingerprint of user, collected according to fingerprint sensor 414 by processor 401
The identity of fingerprint recognition user, alternatively, by fingerprint sensor 414 according to the identity of collected fingerprint recognition user.It is identifying
When the identity for going out user is trusted identity, the user is authorized to execute relevant sensitive operation, the sensitive operation packet by processor 401
Include solution lock screen, check encryption information, download software, payment and change setting etc..Terminal can be set in fingerprint sensor 414
400 front, the back side or side.When being provided with physical button or manufacturer Logo in terminal 400, fingerprint sensor 414 can be with
It is integrated with physical button or manufacturer Logo.
Optical sensor 415 is for acquiring ambient light intensity.In one embodiment, processor 401 can be according to optics
The ambient light intensity that sensor 415 acquires controls the display brightness of touch display screen 405.Specifically, when ambient light intensity is higher
When, the display brightness of touch display screen 405 is turned up;When ambient light intensity is relatively low, the display for turning down touch display screen 405 is bright
Degree.In another embodiment, the ambient light intensity that processor 401 can also be acquired according to optical sensor 415, dynamic adjust
The acquisition parameters of CCD camera assembly 406.
Proximity sensor 416, also referred to as range sensor are generally arranged at the front panel of terminal 400.Proximity sensor 416
The distance between front for acquiring user and terminal 400.In one embodiment, when proximity sensor 416 detects use
When family and the distance between the front of terminal 400 taper into, touch display screen 405 is controlled from bright screen state by processor 401
It is switched to breath screen state;When proximity sensor 416 detects user and the distance between the front of terminal 400 becomes larger,
Touch display screen 405 is controlled by processor 401 and is switched to bright screen state from breath screen state.
It will be understood by those skilled in the art that the restriction of the not structure paired terminal 400 of structure shown in Figure 13, can wrap
It includes than illustrating more or fewer components, either combine certain components or is arranged using different components.
To sum up, the embodiment of the present invention provides not only a kind of audio classification device as shown in fig. 13 that, for realizing Fig. 1 or
Audio frequency classification method described in Fig. 2 embodiments additionally provides a kind of computer readable storage medium, the computer-readable storage
It is stored with instruction on medium, the audio classification side described in above-mentioned Fig. 1 or Fig. 2 embodiments is realized when described instruction is executed by processor
Method.
One of ordinary skill in the art will appreciate that realizing that all or part of step of above-described embodiment can pass through hardware
It completes, relevant hardware can also be instructed to complete by program, the program can be stored in a kind of computer-readable
In storage medium, storage medium mentioned above can be read-only memory, disk or CD etc..
The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all the present invention spirit and
Within principle, any modification, equivalent replacement, improvement and so on should all be included in the protection scope of the present invention.
Claims (16)
1. a kind of audio frequency classification method, which is characterized in that the method includes:
Acquire audio signal;
The audio signal is intercepted or supplemented, the duration of the audio signal is adjusted to preset duration;
According to the frequency information of the audio signal, the audio signal is converted into target audio;
The audio frequency characteristics of the target audio are extracted by presetting the convolutional network that grader includes;
The thresholding recirculating network for including by the default grader extracts the temporal aspect of the audio frequency characteristics;
According to the temporal aspect, the fully-connected network for including by the default grader determines the class of the target audio
The probability for the pre-set categories that each pre-set categories identify in not identified by multiple pre-set categories;
The identified pre-set categories of the pre-set categories mark of maximum probability are determined as described during the multiple pre-set categories are identified
The classification of target audio.
2. the method as described in claim 1, which is characterized in that described to be extracted by presetting the convolutional network that grader includes
The audio frequency characteristics of the target audio, including:
The target audio is divided into multiple audio fragments by the convolutional network;
By the convolutional network by the feature extraction of each audio fragment in the multiple audio fragment be a feature;
The feature of extraction is made up of to the audio frequency characteristics of the target audio the convolutional network.
3. the method as described in claim 1, which is characterized in that the thresholding cycle for including by the default grader
Network extracts the temporal aspect of the audio frequency characteristics, including:
The first temporal aspect of the audio frequency characteristics is extracted by the thresholding recirculating network;
Corresponding first characteristic of division of first temporal aspect is determined by the fully-connected network;
Each element in first characteristic of division is substituted into the first preset function, is obtained each in first characteristic of division
The weight of element, the element in first temporal aspect are corresponded with the element in first characteristic of division;
For the either element A in first temporal aspect, by elements A in first characteristic of division corresponding element
Weight is multiplied with elements A, obtains corresponding first element of elements A;
Each element in first temporal aspect is replaced with into corresponding first element, obtains the second temporal aspect as institute
State the temporal aspect of audio frequency characteristics.
4. the method as described in claim 1, which is characterized in that it is described according to the temporal aspect, pass through the default classification
The fully-connected network that device includes determines that the classification of the target audio is each pre-set categories mark in multiple pre-set categories mark
Know the probability of identified pre-set categories, including:
The second characteristic of division of the temporal aspect is determined by the fully-connected network;
The element in second characteristic of division is substituted into the second preset function by the fully-connected network, obtains the target
The probability of the classification of the audio pre-set categories that each pre-set categories identify in being identified by the multiple pre-set categories.
5. the method as described in claim 1, which is characterized in that the default grader further includes batch standardization network and pond
At least one of network.
6. the method as described in claim 1, which is characterized in that the frequency information according to the audio signal, it will be described
Audio signal is converted to the target audio, including:
Determine that the mel-frequency cepstrum coefficient MFCC of the audio signal generates the mesh according to the MFCC of the audio signal
Mark with phonetic symbols frequency;Alternatively,
The frequency spectrum for determining the audio signal generates the target audio according to the frequency spectrum of the audio signal.
7. the method as described in claim 1, which is characterized in that described to be extracted by presetting the convolutional network that grader includes
Before the audio frequency characteristics of the target audio, further include:
Obtain multiple trained audio collections, all trained audios that each of the multiple trained audio collection training audio collection includes
Corresponding same pre-set categories mark;
Trained disaggregated model is treated using the multiple trained audio collection to be trained, and obtains the default grader.
8. a kind of audio classification device, which is characterized in that described device includes:
Acquisition module, for acquiring audio signal;
The duration of the audio signal is adjusted to pre- by adjustment module for the audio signal to be intercepted or supplemented
If duration;
The audio signal is converted to target audio by conversion module for the frequency information according to the audio signal;
First extraction module, the audio spy for extracting the target audio by presetting the convolutional network that grader includes
Sign;
Second extraction module, the thresholding recirculating network for including by the default grader extract the audio frequency characteristics
Temporal aspect;
First determining module, for according to the temporal aspect, the fully-connected network for including by the default grader to be true
The classification of the fixed target audio is general by the pre-set categories that each pre-set categories identify in multiple pre-set categories mark
Rate;
Second determining module, the pre-set categories for maximum probability in identifying the multiple pre-set categories are identified pre-
If classification is determined as the classification of the target audio.
9. device as claimed in claim 8, which is characterized in that first extraction module includes:
Submodule is split, for the target audio to be divided into multiple audio fragments by the convolutional network;
First extracting sub-module, for passing through the convolutional network by the spy of each audio fragment in the multiple audio fragment
Sign is extracted as a feature;
Form submodule, the audio frequency characteristics for the feature of extraction to be made up of to the target audio the convolutional network.
10. device as claimed in claim 8, which is characterized in that second extraction module includes:
Second extracting sub-module, the first temporal aspect for extracting the audio frequency characteristics by the thresholding recirculating network;
First determination sub-module, for determining that corresponding first classification of first temporal aspect is special by the fully-connected network
Sign;
First substitutes into submodule, for each element in first characteristic of division to be substituted into the first preset function, obtains institute
The weight of each element in the first characteristic of division is stated, in the element in first temporal aspect and first characteristic of division
Element corresponds;
Multiplication submodule is used for for the either element A in first temporal aspect, and elements A is special in first classification
The weight of corresponding element is multiplied with elements A in sign, obtains corresponding first element of elements A;
Submodule is replaced, for each element in first temporal aspect to be replaced with corresponding first element, obtains the
Temporal aspect of two temporal aspects as the audio frequency characteristics.
11. device as claimed in claim 8, which is characterized in that first determining module includes:
Second determination sub-module, the second characteristic of division for determining the temporal aspect by the fully-connected network;
Second substitutes into submodule, pre- for the element in second characteristic of division to be substituted into second by the fully-connected network
If function, obtains each pre-set categories in the multiple pre-set categories mark of classification of the target audio and identified
The probability of pre-set categories.
12. device as claimed in claim 8, which is characterized in that the default grader further includes batch standardization network and pond
Change at least one of network.
13. device as claimed in claim 8, which is characterized in that the conversion module includes:
Third determination sub-module, the mel-frequency cepstrum coefficient MFCC for determining the audio signal believe according to the audio
Number MFCC, generate the target audio;
4th determination sub-module, the frequency spectrum for determining the audio signal, according to the frequency spectrum of the audio signal, described in generation
Target audio.
14. device as claimed in claim 8, which is characterized in that described device further includes:
Acquisition module, for obtaining multiple trained audio collections, each of the multiple trained audio collection training audio collection includes
All trained audios correspond to same pre-set categories mark;
Training module is trained for treating trained disaggregated model using the multiple trained audio collection, obtains described pre-
If grader.
15. a kind of audio classification device, which is characterized in that described device includes:
Processor;
Memory for storing processor-executable instruction;
Wherein, the processor is configured as the step of perform claim requires any one method described in 1-7.
16. a kind of computer readable storage medium, instruction is stored on the computer readable storage medium, which is characterized in that
The step of any one method described in claim 1-7 is realized when described instruction is executed by processor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810332491.8A CN108538311B (en) | 2018-04-13 | 2018-04-13 | Audio classification method, device and computer-readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810332491.8A CN108538311B (en) | 2018-04-13 | 2018-04-13 | Audio classification method, device and computer-readable storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108538311A true CN108538311A (en) | 2018-09-14 |
CN108538311B CN108538311B (en) | 2020-09-15 |
Family
ID=63480527
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810332491.8A Active CN108538311B (en) | 2018-04-13 | 2018-04-13 | Audio classification method, device and computer-readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108538311B (en) |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109671425A (en) * | 2018-12-29 | 2019-04-23 | 广州酷狗计算机科技有限公司 | Audio frequency classification method, device and storage medium |
CN110136744A (en) * | 2019-05-24 | 2019-08-16 | 腾讯音乐娱乐科技(深圳)有限公司 | A kind of audio-frequency fingerprint generation method, equipment and storage medium |
CN110136696A (en) * | 2019-05-22 | 2019-08-16 | 上海声构信息科技有限公司 | The monitor processing method and system of audio data |
CN110334240A (en) * | 2019-07-08 | 2019-10-15 | 联想(北京)有限公司 | Information processing method, system and the first equipment, the second equipment |
CN110956980A (en) * | 2019-12-10 | 2020-04-03 | 腾讯科技(深圳)有限公司 | Media data processing method, device and storage medium |
CN111046216A (en) * | 2019-12-06 | 2020-04-21 | 广州国音智能科技有限公司 | Audio information access method, device, equipment and computer readable storage medium |
CN111090758A (en) * | 2019-12-10 | 2020-05-01 | 腾讯科技(深圳)有限公司 | Media data processing method, device and storage medium |
CN111261174A (en) * | 2018-11-30 | 2020-06-09 | 杭州海康威视数字技术股份有限公司 | Audio classification method and device, terminal and computer readable storage medium |
CN111259189A (en) * | 2018-11-30 | 2020-06-09 | 马上消费金融股份有限公司 | Music classification method and device |
CN111613213A (en) * | 2020-04-29 | 2020-09-01 | 广州三人行壹佰教育科技有限公司 | Method, device, equipment and storage medium for audio classification |
WO2020228226A1 (en) * | 2019-05-14 | 2020-11-19 | 腾讯音乐娱乐科技(深圳)有限公司 | Instrumental music detection method and apparatus, and storage medium |
CN112185396A (en) * | 2020-09-10 | 2021-01-05 | 国家海洋局南海调查技术中心(国家海洋局南海浮标中心) | Offshore wind farm biological monitoring method and system based on passive acoustics |
CN112447187A (en) * | 2019-09-02 | 2021-03-05 | 富士通株式会社 | Device and method for recognizing sound event |
CN114827085A (en) * | 2022-06-24 | 2022-07-29 | 鹏城实验室 | Root server correctness monitoring method, device, equipment and storage medium |
CN116761114A (en) * | 2023-07-14 | 2023-09-15 | 润芯微科技(江苏)有限公司 | Method and system for adjusting playing sound of vehicle-mounted sound equipment |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102668378A (en) * | 2009-12-25 | 2012-09-12 | 佳能株式会社 | Information processing apparatus or information processing method |
CN103456301A (en) * | 2012-05-28 | 2013-12-18 | 中兴通讯股份有限公司 | Ambient sound based scene recognition method and device and mobile terminal |
CN104064212A (en) * | 2014-06-25 | 2014-09-24 | 深圳市中兴移动通信有限公司 | Sound recording method and device |
CN104091594A (en) * | 2013-08-16 | 2014-10-08 | 腾讯科技(深圳)有限公司 | Audio classifying method and device |
CN105788592A (en) * | 2016-04-28 | 2016-07-20 | 乐视控股(北京)有限公司 | Audio classification method and apparatus thereof |
CN105895110A (en) * | 2016-06-30 | 2016-08-24 | 北京奇艺世纪科技有限公司 | Method and device for classifying audio files |
CN107408384A (en) * | 2015-11-25 | 2017-11-28 | 百度(美国)有限责任公司 | The end-to-end speech recognition of deployment |
CN107527626A (en) * | 2017-08-30 | 2017-12-29 | 北京嘉楠捷思信息技术有限公司 | Audio identification system |
CN107545890A (en) * | 2017-08-31 | 2018-01-05 | 桂林电子科技大学 | A kind of sound event recognition method |
CN107689223A (en) * | 2017-08-30 | 2018-02-13 | 北京嘉楠捷思信息技术有限公司 | Audio identification method and device |
US20180077387A1 (en) * | 2016-03-23 | 2018-03-15 | Global Tel*Link Corporation | Secure Nonscheduled Video Visitation System |
-
2018
- 2018-04-13 CN CN201810332491.8A patent/CN108538311B/en active Active
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102668378A (en) * | 2009-12-25 | 2012-09-12 | 佳能株式会社 | Information processing apparatus or information processing method |
CN103456301A (en) * | 2012-05-28 | 2013-12-18 | 中兴通讯股份有限公司 | Ambient sound based scene recognition method and device and mobile terminal |
CN104091594A (en) * | 2013-08-16 | 2014-10-08 | 腾讯科技(深圳)有限公司 | Audio classifying method and device |
CN104064212A (en) * | 2014-06-25 | 2014-09-24 | 深圳市中兴移动通信有限公司 | Sound recording method and device |
CN107408384A (en) * | 2015-11-25 | 2017-11-28 | 百度(美国)有限责任公司 | The end-to-end speech recognition of deployment |
US20180077387A1 (en) * | 2016-03-23 | 2018-03-15 | Global Tel*Link Corporation | Secure Nonscheduled Video Visitation System |
CN105788592A (en) * | 2016-04-28 | 2016-07-20 | 乐视控股(北京)有限公司 | Audio classification method and apparatus thereof |
CN105895110A (en) * | 2016-06-30 | 2016-08-24 | 北京奇艺世纪科技有限公司 | Method and device for classifying audio files |
CN107527626A (en) * | 2017-08-30 | 2017-12-29 | 北京嘉楠捷思信息技术有限公司 | Audio identification system |
CN107689223A (en) * | 2017-08-30 | 2018-02-13 | 北京嘉楠捷思信息技术有限公司 | Audio identification method and device |
CN107545890A (en) * | 2017-08-31 | 2018-01-05 | 桂林电子科技大学 | A kind of sound event recognition method |
Cited By (25)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111261174B (en) * | 2018-11-30 | 2023-02-17 | 杭州海康威视数字技术股份有限公司 | Audio classification method and device, terminal and computer readable storage medium |
CN111261174A (en) * | 2018-11-30 | 2020-06-09 | 杭州海康威视数字技术股份有限公司 | Audio classification method and device, terminal and computer readable storage medium |
CN111259189A (en) * | 2018-11-30 | 2020-06-09 | 马上消费金融股份有限公司 | Music classification method and device |
CN109671425A (en) * | 2018-12-29 | 2019-04-23 | 广州酷狗计算机科技有限公司 | Audio frequency classification method, device and storage medium |
CN109671425B (en) * | 2018-12-29 | 2021-04-06 | 广州酷狗计算机科技有限公司 | Audio classification method, device and storage medium |
WO2020228226A1 (en) * | 2019-05-14 | 2020-11-19 | 腾讯音乐娱乐科技(深圳)有限公司 | Instrumental music detection method and apparatus, and storage medium |
CN110136696A (en) * | 2019-05-22 | 2019-08-16 | 上海声构信息科技有限公司 | The monitor processing method and system of audio data |
CN110136696B (en) * | 2019-05-22 | 2021-05-18 | 上海声构信息科技有限公司 | Audio data monitoring processing method and system |
CN110136744A (en) * | 2019-05-24 | 2019-08-16 | 腾讯音乐娱乐科技(深圳)有限公司 | A kind of audio-frequency fingerprint generation method, equipment and storage medium |
CN110136744B (en) * | 2019-05-24 | 2021-03-26 | 腾讯音乐娱乐科技(深圳)有限公司 | Audio fingerprint generation method, equipment and storage medium |
CN110334240A (en) * | 2019-07-08 | 2019-10-15 | 联想(北京)有限公司 | Information processing method, system and the first equipment, the second equipment |
CN112447187A (en) * | 2019-09-02 | 2021-03-05 | 富士通株式会社 | Device and method for recognizing sound event |
CN111046216A (en) * | 2019-12-06 | 2020-04-21 | 广州国音智能科技有限公司 | Audio information access method, device, equipment and computer readable storage medium |
CN111046216B (en) * | 2019-12-06 | 2024-02-09 | 广州国音智能科技有限公司 | Audio information access method, device, equipment and computer readable storage medium |
CN110956980A (en) * | 2019-12-10 | 2020-04-03 | 腾讯科技(深圳)有限公司 | Media data processing method, device and storage medium |
CN111090758A (en) * | 2019-12-10 | 2020-05-01 | 腾讯科技(深圳)有限公司 | Media data processing method, device and storage medium |
CN111090758B (en) * | 2019-12-10 | 2023-08-18 | 腾讯科技(深圳)有限公司 | Media data processing method, device and storage medium |
CN110956980B (en) * | 2019-12-10 | 2024-04-09 | 腾讯科技(深圳)有限公司 | Media data processing method, device and storage medium |
CN111613213B (en) * | 2020-04-29 | 2023-07-04 | 广州欢聚时代信息科技有限公司 | Audio classification method, device, equipment and storage medium |
CN111613213A (en) * | 2020-04-29 | 2020-09-01 | 广州三人行壹佰教育科技有限公司 | Method, device, equipment and storage medium for audio classification |
CN112185396B (en) * | 2020-09-10 | 2022-03-25 | 国家海洋局南海调查技术中心(国家海洋局南海浮标中心) | Offshore wind farm biological monitoring method and system based on passive acoustics |
CN112185396A (en) * | 2020-09-10 | 2021-01-05 | 国家海洋局南海调查技术中心(国家海洋局南海浮标中心) | Offshore wind farm biological monitoring method and system based on passive acoustics |
CN114827085A (en) * | 2022-06-24 | 2022-07-29 | 鹏城实验室 | Root server correctness monitoring method, device, equipment and storage medium |
CN116761114A (en) * | 2023-07-14 | 2023-09-15 | 润芯微科技(江苏)有限公司 | Method and system for adjusting playing sound of vehicle-mounted sound equipment |
CN116761114B (en) * | 2023-07-14 | 2024-01-26 | 润芯微科技(江苏)有限公司 | Method and system for adjusting playing sound of vehicle-mounted sound equipment |
Also Published As
Publication number | Publication date |
---|---|
CN108538311B (en) | 2020-09-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108538311A (en) | Audio frequency classification method, device and computer readable storage medium | |
CN110121118B (en) | Video clip positioning method and device, computer equipment and storage medium | |
WO2019214361A1 (en) | Method for detecting key term in speech signal, device, terminal, and storage medium | |
CN109086709A (en) | Feature Selection Model training method, device and storage medium | |
CN110110145A (en) | Document creation method and device are described | |
CN110097019A (en) | Character identifying method, device, computer equipment and storage medium | |
CN109299315A (en) | Multimedia resource classification method, device, computer equipment and storage medium | |
CN108829881A (en) | video title generation method and device | |
CN110222789A (en) | Image-recognizing method and storage medium | |
CN110047468B (en) | Speech recognition method, apparatus and storage medium | |
CN108304506A (en) | Search method, device and equipment | |
CN110018970A (en) | Cache prefetching method, apparatus, equipment and computer readable storage medium | |
CN109994127A (en) | Audio-frequency detection, device, electronic equipment and storage medium | |
CN111105788B (en) | Sensitive word score detection method and device, electronic equipment and storage medium | |
CN111524501A (en) | Voice playing method and device, computer equipment and computer readable storage medium | |
CN109360222A (en) | Image partition method, device and storage medium | |
CN108320756A (en) | It is a kind of detection audio whether be absolute music audio method and apparatus | |
CN108922531A (en) | Slot position recognition methods, device, electronic equipment and storage medium | |
CN108806670B (en) | Audio recognition method, device and storage medium | |
CN109003621A (en) | A kind of audio-frequency processing method, device and storage medium | |
CN109065068A (en) | Audio-frequency processing method, device and storage medium | |
CN109961802B (en) | Sound quality comparison method, device, electronic equipment and storage medium | |
CN110166275A (en) | Information processing method, device and storage medium | |
CN115206305B (en) | Semantic text generation method and device, electronic equipment and storage medium | |
CN111341307A (en) | Voice recognition method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |