CN109671425A - Audio frequency classification method, device and storage medium - Google Patents
Audio frequency classification method, device and storage medium Download PDFInfo
- Publication number
- CN109671425A CN109671425A CN201811632676.7A CN201811632676A CN109671425A CN 109671425 A CN109671425 A CN 109671425A CN 201811632676 A CN201811632676 A CN 201811632676A CN 109671425 A CN109671425 A CN 109671425A
- Authority
- CN
- China
- Prior art keywords
- audio
- information
- frequency
- target
- fragment
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 47
- 238000013145 classification model Methods 0.000 claims abstract description 85
- 238000001914 filtration Methods 0.000 claims abstract description 60
- 238000000605 extraction Methods 0.000 claims abstract description 56
- 239000012634 fragment Substances 0.000 claims description 247
- 238000012549 training Methods 0.000 claims description 39
- 238000011156 evaluation Methods 0.000 claims 1
- 230000002093 peripheral effect Effects 0.000 description 10
- 230000001133 acceleration Effects 0.000 description 9
- 238000004891 communication Methods 0.000 description 8
- 230000006870 function Effects 0.000 description 7
- 238000012545 processing Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 230000004927 fusion Effects 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 2
- 239000000919 ceramic Substances 0.000 description 2
- 230000006835 compression Effects 0.000 description 2
- 238000007906 compression Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 230000005484 gravity Effects 0.000 description 2
- 239000004973 liquid crystal related substance Substances 0.000 description 2
- 238000002360 preparation method Methods 0.000 description 2
- 238000003325 tomography Methods 0.000 description 2
- 230000001052 transient effect Effects 0.000 description 2
- 241000208340 Araliaceae Species 0.000 description 1
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 description 1
- 235000003140 Panax quinquefolius Nutrition 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000003066 decision tree Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000007717 exclusion Effects 0.000 description 1
- 235000008434 ginseng Nutrition 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000006698 induction Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 230000004630 mental health Effects 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 238000000465 moulding Methods 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 230000006641 stabilisation Effects 0.000 description 1
- 238000011105 stabilization Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/16—Speech classification or search using artificial neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/24—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Acoustics & Sound (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Signal Processing (AREA)
- Quality & Reliability (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Telephonic Communication Services (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of audio frequency classification method, device and storage mediums, belong to Internet technical field.Method includes: at least one the target audio segment obtained in target audio information;High-pass filtering and feature extraction are carried out at least one target audio segment, obtain at least one corresponding audio frequency characteristics of at least one target audio segment;Based on audio classification model and at least one audio frequency characteristics, the class indication of at least one target audio segment is determined, according to the class indication of at least one target audio segment, determine the class indication of target audio information;It is normal audio-frequency information that first identifier, which is used to indicate corresponding audio-frequency information, and second identifier is used to indicate corresponding audio-frequency information as sensitive audio-frequency information.High-pass filtering is carried out before the class indication for determining target audio information, the low-frequency noise of target audio information can be filtered out, therefore is not in improve the accuracy of audio classification the case where low-frequency noise is determined as sensitive audio-frequency information.
Description
Technical field
The present invention relates to Internet technical field, in particular to a kind of audio frequency classification method, device and storage medium.
Background technique
With the rapid development of internet technology, the information scale of internet is gradually expanded, and causes many sensitive informations wide
General propagation, such as objectionable video and sensitive audio, these objectionable videos and sensitive audio produce evil to the mental health of people
Bad influence polluted network environment, easily cause Network Information Security Problem.Therefore, how to identify that sensitive information becomes urgently
Problem to be solved.
A kind of audio frequency classification method is provided in the related technology, can be classified to audio-frequency information, be identified normal sound
Frequency information and sensitive audio-frequency information.Firstly, obtaining multiple sensitive audio-frequency informations, feature is carried out to each sensitive audio-frequency information and is mentioned
It takes, obtains the audio frequency characteristics of each sensitive audio-frequency information, carry out model training according to multiple audio frequency characteristics, obtain mixed Gaussian mould
Type.Later, target audio information to be identified is obtained, feature extraction is carried out to target audio information, obtains target audio information
Target audio feature, be based on mixed Gauss model, determine the mahalanobis distance of target audio feature and mixed Gauss model, judge
Whether the mahalanobis distance is greater than preset threshold, when the mahalanobis distance is greater than preset threshold, determines that target audio information is normal
Audio-frequency information, and when the mahalanobis distance is not more than preset threshold, determine target audio information for sensitive audio-frequency information.
When in target audio information including low-frequency noise, the feature more phase of the low-frequency noise and sensitive audio-frequency information
Seemingly, when causing to be classified based on mixed Gauss model, low-frequency noise can be mistakenly considered to sensitive audio-frequency information, lead to audio point
There is mistake in class, and accuracy is lower.
Summary of the invention
The embodiment of the invention provides a kind of audio frequency classification method, device and storage mediums, can solve the relevant technologies and deposit
The problem of.The technical solution is as follows:
In a first aspect, providing a kind of audio frequency classification method, which comprises
Obtain at least one target audio segment in target audio information;
High-pass filtering and feature extraction are carried out at least one described target audio segment, obtain at least one described target
At least one corresponding audio frequency characteristics of audio fragment;
Based on audio classification model and at least one described audio frequency characteristics, at least one target audio segment is determined
Class indication determines the contingency table of the target audio information according to the class indication of at least one target audio segment
Know;
The class indication includes first identifier and second identifier, and the first identifier is used to indicate corresponding audio-frequency information
For normal audio-frequency information, the second identifier is used to indicate corresponding audio-frequency information as sensitive audio-frequency information.
Optionally, described at least one target audio segment obtained in target audio information, comprising:
The target audio information is divided according to the first preset length, obtains length equal to the described first default length
Multiple audio fragments of degree;
For each audio fragment in the multiple audio fragment, multiple fundamental frequencies in the audio fragment are obtained, are obtained
It takes in the multiple fundamental frequency greater than ratio shared by the fundamental frequency of the first predeterminated frequency;
From the multiple audio fragment, audio fragment of the ratio less than the first preset ratio is obtained, as target
Audio fragment.
Optionally, described at least one target audio segment obtained in target audio information, comprising:
The target audio information is divided according to the second preset length and third preset length, length is obtained and is equal to
Multiple audio fragments of second preset length, and the adjacent audio fragment of any two in the multiple audio fragment includes institute
State the identical audio-frequency information of third preset length;The third preset length is less than second preset length;
For each audio fragment in the multiple audio fragment, according to the 4th preset length to the audio fragment into
Row divides, and obtains multiple audio sub-segments that length is equal to the 4th preset length, obtains the amplitude of each audio sub-segments
Statistical value;4th preset length is less than second preset length;
From the multiple audio fragment, the audio fragment that any statistical value is greater than default value is obtained, as target sound
Frequency segment.
Optionally, described at least one target audio segment obtained in target audio information, comprising:
The target audio information is divided according to the second preset length and third preset length, length is obtained and is equal to
Multiple audio fragments of second preset length, and the adjacent audio fragment of any two in the multiple audio fragment includes institute
State the identical audio-frequency information of third preset length;The third preset length is less than second preset length;
For each audio fragment in the multiple audio fragment, multiple fundamental frequencies in the audio fragment are obtained, are obtained
It takes in the multiple fundamental frequency greater than ratio shared by the fundamental frequency of the first predeterminated frequency;
From the multiple audio fragment, the ratio is obtained greater than the second preset ratio and is less than third preset ratio
Audio fragment, as target audio segment.
Optionally, described that high-pass filtering and feature extraction are carried out at least one described target audio segment, it obtains described
At least one corresponding audio frequency characteristics of at least one target audio segment, comprising:
High-pass filtering is carried out at least one described target audio segment, at least one audio piece after obtaining high-pass filtering
Section;
Each audio fragment after high-pass filtering is divided according to the 5th preset length, length is obtained and is equal to described the
Multiple audio sub-segments of five preset lengths;
Feature extraction is carried out to each audio sub-segments, obtains the audio frequency characteristics of each audio sub-segments.
Optionally, the class indication of at least one target audio segment according to determines the target audio letter
The class indication of breath, includes at least one of the following:
In at least one described target audio segment, the class indication of the target audio segment of continuous first preset quantity
When for the second identifier, determine that the class indication of the target audio information is the second identifier;
In at least one described target audio segment, class indication is the target audio number of fragments of the second identifier
When shared ratio reaches four preset ratios, determine that the class indication of the target audio information is the second identifier.
Optionally, the method also includes:
Obtain the class indication of multiple sample audio information and the multiple sample audio information;
High-pass filtering and feature extraction are carried out to the multiple sample audio information, obtain the multiple sample audio information
Corresponding multiple audio frequency characteristics;
Model training is carried out according to the multiple audio frequency characteristics and the corresponding class indication of each audio frequency characteristics, obtains institute
State audio classification model.
Optionally, the audio classification model include the first audio classification model and the second audio classification model, described
Model training is carried out according to the multiple audio frequency characteristics and the corresponding class indication of each audio frequency characteristics, obtains the audio classification
Model, comprising:
According in the multiple audio frequency characteristics, class indication is that the audio frequency characteristics of first class indication carry out model instruction
Practice, obtains the first audio classification model;
According in the multiple audio frequency characteristics, class indication is that the audio frequency characteristics of second class indication carry out model instruction
Practice, obtains the second audio classification model.
Second aspect, provides a kind of audio classification device, and described device includes:
Module is obtained, for obtaining at least one target audio segment in target audio information;
Extraction module obtains institute for carrying out high-pass filtering and feature extraction at least one described target audio segment
State at least one corresponding audio frequency characteristics of at least one target audio segment;
Determining module, for being based on audio classification model and at least one described audio frequency characteristics, determine it is described at least one
The class indication of target audio segment determines the target sound according to the class indication of at least one target audio segment
The class indication of frequency information;
The class indication includes first identifier and second identifier, and the first identifier is used to indicate corresponding audio-frequency information
For normal audio-frequency information, the second identifier is used to indicate corresponding audio-frequency information as sensitive audio-frequency information.
Optionally, the acquisition module, comprising:
First division unit obtains length for dividing according to the first preset length to the target audio information
Equal to multiple audio fragments of first preset length;
Fundamental frequency acquiring unit, for obtaining the audio piece for each audio fragment in the multiple audio fragment
Multiple fundamental frequencies in section obtain in the multiple fundamental frequency greater than ratio shared by the fundamental frequency of the first predeterminated frequency;
Acquiring unit, for obtaining audio of the ratio less than the first preset ratio from the multiple audio fragment
Segment, as target audio segment.
Optionally, the acquisition module, comprising:
Second division unit, for being carried out according to the second preset length and third preset length to the target audio information
It divides, obtains multiple audio fragments that length is equal to second preset length, and any two phases in the multiple audio fragment
Adjacent audio fragment includes the identical audio-frequency information of the third preset length;It is pre- that the third preset length is less than described second
If length;
Second division unit is also used to for each audio fragment in the multiple audio fragment, according to the 4th
Preset length divides the audio fragment, obtains multiple audio sub-segments that length is equal to the 4th preset length,
Obtain the statistical value of the amplitude of each audio sub-segments;4th preset length is less than second preset length;
Acquiring unit, the audio piece for being greater than default value for from the multiple audio fragment, obtaining any statistical value
Section, as target audio segment.
Optionally, the acquisition module, comprising:
Third division unit, for being carried out according to the second preset length and third preset length to the target audio information
It divides, obtains multiple audio fragments that length is equal to second preset length, and any two phases in the multiple audio fragment
Adjacent audio fragment includes the identical audio-frequency information of the third preset length;It is pre- that the third preset length is less than described second
If length;
Fundamental frequency acquiring unit, for obtaining the audio piece for each audio fragment in the multiple audio fragment
Multiple fundamental frequencies in section obtain in the multiple fundamental frequency greater than ratio shared by the fundamental frequency of the first predeterminated frequency;
Acquiring unit, for from the multiple audio fragment, obtaining the ratio greater than the second preset ratio and being less than
The audio fragment of third preset ratio, as target audio segment.
Optionally, the extraction module, comprising:
Filter unit, for carrying out high-pass filtering at least one described target audio segment, after obtaining high-pass filtering
At least one audio fragment;
Division unit is obtained for dividing according to the 5th preset length to each audio fragment after high-pass filtering
Length is equal to multiple audio sub-segments of the 5th preset length;
Extraction unit, for carrying out feature extraction to each audio sub-segments, the audio for obtaining each audio sub-segments is special
Sign.
Optionally, the determining module, at least one of following for executing:
In at least one described target audio segment, the class indication of the target audio segment of continuous first preset quantity
When for the second identifier, determine that the class indication of the target audio information is the second identifier;
In at least one described target audio segment, class indication is the target audio number of fragments of the second identifier
When shared ratio reaches four preset ratios, determine that the class indication of the target audio information is the second identifier.
Optionally, described device further include:
The acquisition module is also used to obtain the contingency table of multiple sample audio information and the multiple sample audio information
Know;
The extraction module is also used to carry out high-pass filtering and feature extraction to the multiple sample audio information, obtain
The corresponding multiple audio frequency characteristics of the multiple sample audio information;
Training module, for carrying out mould according to the multiple audio frequency characteristics and the corresponding class indication of each audio frequency characteristics
Type training obtains the audio classification model.
Optionally, the audio classification model includes the first audio classification model and the second audio classification model;
The training module is also used to according in the multiple audio frequency characteristics, and class indication is first class indication
Audio frequency characteristics carry out model training, obtain the first audio classification model;
The training module is also used to according in the multiple audio frequency characteristics, and class indication is second class indication
Audio frequency characteristics carry out model training, obtain the second audio classification model.
The third aspect, provides a kind of audio classification device, and described device includes processor and memory, the memory
In be stored at least one instruction, described instruction is loaded by the processor and is executed to realize audio as described in relation to the first aspect
Performed operation in classification method.
Fourth aspect provides a kind of computer readable storage medium, is stored in the computer readable storage medium
At least one instruction, described instruction are loaded by processor and are executed to realize institute in audio frequency classification method as described in relation to the first aspect
The operation of execution.
Technical solution provided in an embodiment of the present invention has the benefit that
Method, apparatus provided in an embodiment of the present invention and storage medium, by obtaining at least one in target audio information
A target audio segment carries out high-pass filtering and feature extraction at least one target audio segment, can be with by high-pass filtering
Low-frequency noise is filtered out, feature extraction is carried out and obtains at least one corresponding audio frequency characteristics of at least one target audio segment, base
In audio classification model and at least one audio frequency characteristics, the class indication of at least one target audio segment is determined, according at least
The class indication of one target audio segment determines the class indication of target audio information, so as to by target audio information
It is determined as normal audio information or sensitive audio-frequency information, it is high due to being carried out before the class indication for determining target audio information
Pass filter can filter out the low-frequency noise of target audio information, therefore be not in that low-frequency noise is determined as sensitive audio letter
The case where breath, improves the accuracy of audio classification.
Also, obtain target audio information at least one target audio segment when, to the target audio information into
Row divides, so that multiple audio fragments after being divided sieve multiple audio fragments by the way that preset condition is arranged
Choosing will meet the audio fragment of preset condition as target audio segment, by the above-mentioned means, other audio fragments can be reduced
Interference, reduce misclassification the case where, improve the accuracy of audio classification.
Detailed description of the invention
To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment
Attached drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for
For those of ordinary skill in the art, without creative efforts, it can also be obtained according to these attached drawings other
Attached drawing.
Fig. 1 is a kind of flow chart of audio frequency classification method provided in an embodiment of the present invention;
Fig. 2 is a kind of flow chart of audio frequency classification method provided in an embodiment of the present invention;
Fig. 3 is a kind of structural schematic diagram of audio classification device provided in an embodiment of the present invention;
Fig. 4 is a kind of structural schematic diagram of server provided in an embodiment of the present invention;
Fig. 5 is a kind of structural schematic diagram of terminal provided in an embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are some of the embodiments of the present invention, instead of all the embodiments.Based on this hair
Embodiment in bright, every other implementation obtained by those of ordinary skill in the art without making creative efforts
Example, shall fall within the protection scope of the present invention.
Fig. 1 is a kind of flow chart of audio frequency classification method provided in an embodiment of the present invention.The execution master of the inventive embodiments
Body is sorter, referring to Fig. 1, this method comprises:
101, at least one target audio segment in target audio information is obtained.
102, high-pass filtering and feature extraction are carried out at least one target audio segment, obtains at least one target audio
At least one corresponding audio frequency characteristics of segment.
103, audio classification model and at least one audio frequency characteristics are based on, determine the classification of at least one target audio segment
Mark, according to the class indication of at least one target audio segment, determines the class indication of target audio information.
Wherein, class indication includes first identifier and second identifier, and first identifier is used to indicate corresponding audio-frequency information and is
Normal audio information, second identifier are used to indicate corresponding audio-frequency information as sensitive audio-frequency information.
Method provided in an embodiment of the present invention, by obtaining at least one target audio segment in target audio information,
High-pass filtering and feature extraction are carried out at least one target audio segment, can be filtered out low-frequency noise by high-pass filtering,
Carry out feature extraction and obtain at least one corresponding audio frequency characteristics of at least one target audio segment, based on audio classification model and
At least one audio frequency characteristics determines the class indication of at least one target audio segment, according at least one target audio segment
Class indication, determine the class indication of target audio information, so as to by target audio information be determined as normal audio letter
Breath or sensitive audio-frequency information can be by targets due to carrying out high-pass filtering before the class indication for determining target audio information
The low-frequency noise of audio-frequency information filters out, therefore is not in the case where low-frequency noise is determined as sensitive audio-frequency information, to improve
The accuracy of audio classification.
Optionally, at least one target audio segment in target audio information is obtained, comprising:
Target audio information is divided according to the first preset length, it is multiple equal to the first preset length to obtain length
Audio fragment;
For each audio fragment in multiple audio fragments, multiple fundamental frequencies in audio fragment are obtained, multiple bases are obtained
Greater than ratio shared by the fundamental frequency of the first predeterminated frequency in frequency;
From multiple audio fragments, audio fragment of the ratio less than the first preset ratio is obtained, as target audio segment.
Optionally, at least one target audio segment in target audio information is obtained, comprising:
Target audio information is divided according to the second preset length and third preset length, obtains length equal to second
Multiple audio fragments of preset length, and the adjacent audio fragment of any two in multiple audio fragments includes third preset length
Identical audio-frequency information;Third preset length is less than the second preset length;
For each audio fragment in multiple audio fragments, audio fragment is divided according to the 4th preset length,
Multiple audio sub-segments that length is equal to the 4th preset length are obtained, the statistical value of the amplitude of each audio sub-segments is obtained;The
Four preset lengths are less than the second preset length;
From multiple audio fragments, the audio fragment that any statistical value is greater than default value is obtained, as target audio piece
Section.
Optionally, at least one target audio segment in target audio information is obtained, comprising:
Target audio information is divided according to the second preset length and third preset length, obtains length equal to second
Multiple audio fragments of preset length, and the adjacent audio fragment of any two in multiple audio fragments includes third preset length
Identical audio-frequency information;Third preset length is less than the second preset length;
For each audio fragment in multiple audio fragments, multiple fundamental frequencies in audio fragment are obtained, multiple bases are obtained
Greater than ratio shared by the fundamental frequency of the first predeterminated frequency in frequency;
From multiple audio fragments, the ratio that obtains is greater than the second preset ratio and is less than the audio piece of third preset ratio
Section, as target audio segment.
Optionally, high-pass filtering and feature extraction are carried out at least one target audio segment, obtains at least one target
At least one corresponding audio frequency characteristics of audio fragment, comprising:
High-pass filtering is carried out at least one target audio segment, at least one audio fragment after obtaining high-pass filtering;
Each audio fragment after high-pass filtering is divided according to the 5th preset length, it is pre- equal to the 5th to obtain length
If multiple audio sub-segments of length;
Feature extraction is carried out to each audio sub-segments, obtains the audio frequency characteristics of each audio sub-segments.
Optionally, according to the class indication of at least one target audio segment, the class indication of target audio information is determined,
It includes at least one of the following:
In at least one target audio segment, the class indication of the target audio segment of continuous first preset quantity is the
When two marks, determine that the class indication of target audio information is second identifier;
In at least one target audio segment, class indication is ratio shared by the target audio number of fragments of second identifier
When example reaches four preset ratios, determine that the class indication of target audio information is second identifier.
Optionally, method further include:
Obtain the class indication of multiple sample audio information and multiple sample audio information;
High-pass filtering and feature extraction are carried out to multiple sample audio information, it is corresponding more to obtain multiple sample audio information
A audio frequency characteristics;
Model training is carried out according to multiple audio frequency characteristics and the corresponding class indication of each audio frequency characteristics, obtains audio point
Class model.
Optionally, audio classification model includes the first audio classification model and the second audio classification model, according to multiple sounds
Frequency feature and the corresponding class indication of each audio frequency characteristics carry out model training, obtain audio classification model, comprising:
According in multiple audio frequency characteristics, class indication is that the audio frequency characteristics of the first class indication carry out model training, is obtained
First audio classification model;
According in multiple audio frequency characteristics, class indication is that the audio frequency characteristics of the second class indication carry out model training, is obtained
Second audio classification model.
All the above alternatives can form alternative embodiment of the invention using any combination, herein no longer
It repeats one by one.
Fig. 2 is a kind of flow chart of audio frequency classification method provided in an embodiment of the present invention.The execution master of the inventive embodiments
Body is sorter, which can be the terminals such as mobile phone, computer or tablet computer, or server.Ginseng
See Fig. 2, this method comprises:
201, audio classification model is obtained.
In the embodiment of the present invention, it can be classified to any audio-frequency information based on audio classification model, determine the audio
Information is normal audio-frequency information or sensitive audio-frequency information.Wherein, audio classification model is used to determine the contingency table of audio-frequency information
Know, which includes first identifier and second identifier, and it is normal audio that first identifier, which is used to indicate corresponding audio-frequency information,
Information, second identifier are used to indicate corresponding audio-frequency information as sensitive audio-frequency information.
First identifier is two different marks from second identifier, for example, first identifier is 1, second identifier 0, alternatively,
First identifier is 0, second identifier 1.
The audio classification model can be obtained by sorter training, and be stored by sorter, alternatively, the audio classification
Model is sent to sorter after being trained by other equipment, and is stored by sorter.
In training audio classification model, the contingency table of multiple sample audio information and multiple sample audio information is obtained
Know, for each sample audio information, high-pass filtering is carried out to the sample audio information, is filtered out low in the sample audio information
Frequency noise carries out feature extraction to the sample audio information after high-pass filtering, and it is special to obtain the corresponding audio of the sample audio information
Sign.Using the corresponding multiple audio frequency characteristics of the available multiple sample audio information of aforesaid way, and multiple sample audios are believed
The class indication of breath is as the corresponding class indication of multiple audio frequency characteristics, according to multiple audio frequency characteristics and each audio frequency characteristics pair
The class indication answered carries out model training, obtains audio classification model.
For example, each audio frequency characteristics can be described sample audio information, it can be mel-frequency cepstrum coefficient, line
Property prediction cepstrum coefficient or other feature of target audio segment can be described.Correspondingly, it is carried out to sample audio information
When feature extraction, mel-frequency cepstrum coefficient algorithm, linear prediction residue error algorithm or other feature extractions can be used
Algorithm carries out feature extraction.
Furthermore it is possible to which using a variety of training algorithms training audio classification model, which can be mixed for Gauss
Molding type, neural network model, decision-tree model or other models.
By carrying out high-pass filtering to sample audio information, the low-frequency noise in sample audio information can be filtered out, is extracted
Obtain more can accurate description sample audio information audio frequency characteristics, moreover it is possible to the interference for avoiding low-frequency noise improves trained sound
The accuracy rate of frequency division class model.
Optionally, in training audio classification model, initial audio classification model can be first constructed, obtains training data
It includes multiple sample audio information that collection and test data set, training dataset and test data, which are concentrated,.
The multiple sample audio information concentrated to training data carry out high-pass filtering respectively, to multiple samples after high-pass filtering
This audio-frequency information carries out feature extraction, the corresponding multiple audio frequency characteristics of multiple sample audio information is obtained, by multiple audio frequency characteristics
As the input of audio classification model, audio classification model is trained, make audio classification model to normal audio information and
Difference between sensitive audio-frequency information is learnt, and has the ability for distinguishing normal audio information and sensitive audio-frequency information.
Later, the multiple sample audio information concentrated to test data carry out high-pass filtering respectively, after high-pass filtering
Multiple sample audio information carry out feature extraction, the corresponding multiple audio frequency characteristics of multiple sample audio information are obtained, by multiple sounds
Frequency feature is input in audio classification model, is based on audio classification model, is determined the class indication of each sample audio information, will
Determining class indication is compared with actual classification mark, is updated according to comparing result to audio classification model.
In subsequent process, new sample audio information and its class indication can also be obtained, continues to train audio classification mould
Type, so as to improve the accuracy rate of audio classification model.
Optionally, audio classification model includes the first audio classification model and the second audio classification model, is carrying out model
Training when, obtain the class indication of multiple sample audio information and multiple sample audio information, to multiple sample audio information into
Row high-pass filtering and feature extraction obtain the corresponding multiple audio frequency characteristics of multiple sample audio information, are the according to class indication
Multiple audio frequency characteristics of one class indication carry out model training, obtain the first audio classification model, are second according to class indication
The audio frequency characteristics of class indication carry out model training, obtain the second audio classification model.
Wherein, the first audio classification model can learn the feature of normal audio information, have identification normal sound
The ability of frequency information.It can determine that any audio-frequency information belongs to the probability of normal audio information based on the first audio classification model.
Second audio classification model can learn the feature of sensitive audio-frequency information, have the ability for identifying sensitive audio-frequency information.
It can determine that any audio-frequency information belongs to the probability of sensitive audio-frequency information based on the second audio classification model.It is subsequent can be based on the
One audio classification model and the second audio classification model, classify to target audio information.
By being trained respectively to two different sample audio information, the first audio classification model and the second sound are obtained
Frequency division class model can be improved specific aim, and then improve the accuracy rate of audio classification model.
202, at least one target audio segment in target audio information is obtained.
In the embodiment of the present invention, the target audio information be audio-frequency information to be sorted, need to target audio information into
Row classification determines that the target audio information is normal audio-frequency information or sensitive audio-frequency information.
For from message form, which can be the audio-frequency information in single audio file, Huo Zheke
Think the audio-frequency information extracted from video file, or can also be the audio-frequency information of other forms.
For from information source, which can be recorded to obtain by sorter, or by sorter
It downloads and obtains from internet, or be sent to sorter by other equipment.For example, playing live video in sorter
In the process, the audio-frequency information in available live video, as target audio information.
For from the information content, which may include singing audio-frequency information, chat audio-frequency information, sensitivity
Audio-frequency information, noise audio information etc..
Optionally, sorter can be using complete target audio information as target audio segment to be sorted.
Alternatively, preset condition can also be arranged, which is used to provide to may be that the audio of sensitive audio-frequency information is believed
Met condition is ceased, i.e., when a certain audio fragment meets the preset condition, indicates that the audio fragment may be comprising sensitive sound
Frequency information, and when the audio fragment is unsatisfactory for the preset condition, indicate that the audio fragment does not include sensitive audio-frequency information.
It, can not be to entire target audio information in order to improve accuracy rate when sorter obtains target audio information
Directly classify, but target audio information is divided at least one audio fragment, judges that at least one audio fragment is
It is no to meet preset condition, so that the audio fragment for meeting preset condition is determined as target audio segment, so as to it is subsequent can be right
Target audio segment is classified, and is no longer classified to other audio fragments.
Optionally, which includes the following steps at least one in 2021-2023:
2021, other audio fragments in target audio information in addition to singing audio fragment are obtained, as target audio
Segment.
Target audio information is divided according to the first preset length, it is multiple equal to the first preset length to obtain length
Audio fragment obtains multiple fundamental frequencies in audio fragment, obtains multiple bases for each audio fragment in multiple audio fragments
Greater than ratio shared by the fundamental frequency of the first predeterminated frequency in frequency.
Wherein, fundamental frequency is the frequency of the elementary audio of audio fragment, is obtained in multiple fundamental frequencies greater than the first predeterminated frequency
When ratio shared by fundamental frequency, the fundamental frequency quantity for being greater than the first predeterminated frequency is obtained, calculates the fundamental frequency quantity and the audio piece of acquisition
The ratio between fundamental frequency quantity in section, to obtain in multiple fundamental frequencies greater than ratio shared by the fundamental frequency of the first predeterminated frequency.
When in audio fragment be greater than the first predeterminated frequency fundamental frequency shared by ratio be not less than the first preset ratio when, determine
The audio fragment is singing segment, does not include sensitive audio-frequency information.And when the fundamental frequency for being greater than the first predeterminated frequency in audio fragment
It when shared ratio is less than the first preset ratio, determines that the audio fragment is not singing segment, then may include sensitive audio
Information.
Therefore, it is small to obtain ratio shared by the fundamental frequency for being greater than the first predeterminated frequency from multiple audio fragments for sorter
In the audio fragment of the first preset ratio, as target audio segment.
Wherein, the first preset length can be set to 10 seconds, 20 seconds or other durations.First predeterminated frequency can be set
It is 150 hertz, 160 hertz or other frequencies.First preset ratio can be set to 60%, 70% or other percentages.
For example, the first preset length is 20 seconds, the first preset ratio is 60%, and the first predeterminated frequency is 150 hertz.By mesh
Mark audio-frequency information is divided into 20 seconds multiple audio fragments, 100 fundamental frequencies is extracted from each audio fragment, if a certain audio
Fundamental frequency quantity in segment greater than 150 hertz is 50, and shared ratio is 50%, less than the first preset ratio 60%, then will
The audio fragment is as target audio segment.
By the way that target audio information is divided into multiple audio fragments, for each audio fragment, according to the audio fragment
Fundamental frequency judge whether the audio fragment is singing segment, so as to by target audio information singing segment exclude, subtract
Lack calculation amount, and improves the subsequent accuracy rate to the classification of target audio information.
2022, other audio fragments in target audio information in addition to mute audio segment are obtained, as target audio
Segment.
Target audio information is divided according to the second preset length and third preset length, obtains length equal to second
Multiple audio fragments of preset length, and the adjacent audio fragment of any two in multiple audio fragments includes third preset length
Identical audio-frequency information carries out each audio fragment in multiple audio fragments according to the 4th preset length to audio fragment
It divides, obtains multiple audio sub-segments that length is equal to the 4th preset length, obtain the statistics of the amplitude of each audio sub-segments
Value.
Wherein, amplitude is used to indicate the energy size of audio fragment, and the statistical value of the amplitude can be each audio sub-pieces
Section amplitude absolute value average value, square average value or other statistical values.
When the statistical value of the amplitude in each audio sub-segments is less than default value, determine that audio fragment is mute plate
Section does not include sensitive audio-frequency information.And when the statistical value of any amplitude is not less than default value, determine that the audio fragment is non-
Silence clip may then include sensitive audio-frequency information.
Therefore, sorter obtains the audio fragment that any statistical value is greater than default value from multiple audio fragments, makees
For target audio segment.
Wherein, the second preset length can be set to 1 second, 2 seconds or other durations, and third preset length can be set to
0.4 second, 0.5 second or other durations, the third preset length is less than the second preset length.4th preset length can be set to
0.1 second, 0.2 second or other durations, the 4th preset length, and can according to the 4th preset length less than the second preset length
The audio fragment of the second preset length is divided into audio sub-segments.Default value can be set to 0.2,0.3 or other
Numerical value.
For example, the second preset length is 1 second, third preset length is 0.5 second, and the 4th preset length is 0.2 second, present count
Value is 0.3.It is divided according to the second preset length and third preset length, obtains multiple audio fragments, respectively 0-1 seconds,
- 1.5 seconds 0.5 second, -2 seconds 1 second, and so on, target audio information is divided into multiple audio fragments, then by multiple sound
Frequency segment is respectively divided into 0.2 second multiple audio sub-segments, calculates separately multiple audio sub-segments in each audio fragment
Amplitude average value, if the average value of any of a certain audio fragment audio sub-segments be 0.4, be greater than default value
0.3, then using the audio fragment as target audio segment.
Target audio information is divided by the second preset length and third preset length, makes multiple audio fragment
In any two adjacent audio fragments include third length identical audio-frequency information, include a upper sound in each audio fragment
The audio-frequency information of frequency segment ending guarantees the complete of audio-frequency information to reduce the tomography of audio fragment.
Also, by the way that target audio information is divided into multiple audio fragments, for each audio fragment, by the audio piece
Section is divided into multiple audio sub-segments, determines whether the audio fragment is mute plate according to the statistical value of each audio sub-segments
Section reduces calculation amount, and improve following pairs of target sounds so as to exclude the silence clip in target audio information
The accuracy rate of frequency information classification.
2023, other audio fragments in target audio information in addition to noise segments and singing segment are obtained, as mesh
Mark audio fragment.
Target audio information is divided according to the second preset length and third preset length, obtains length equal to second
Multiple audio fragments of preset length, and the adjacent audio fragment of any two in multiple audio fragments includes third preset length
Identical audio-frequency information obtains multiple fundamental frequencies in audio fragment for each audio fragment in multiple audio fragments, obtains more
Greater than ratio shared by the fundamental frequency of the first predeterminated frequency in a fundamental frequency.
When in audio fragment be greater than the first predeterminated frequency fundamental frequency shared by ratio be not more than the second preset ratio when, determine
The audio fragment is noise segments, does not include sensitive audio-frequency information.When the fundamental frequency institute for being greater than the first predeterminated frequency in audio fragment
When the ratio accounted for is not less than third preset ratio, determines that the audio fragment is singing segment, do not include sensitive audio-frequency information.And work as
It is greater than the second preset ratio greater than ratio shared by the fundamental frequency of the first predeterminated frequency in audio fragment, and is less than third preset ratio
When, the audio fragment is determined neither noise segments, nor singing segment, then may include sensitive audio-frequency information.
Therefore, it is big to obtain ratio shared by the fundamental frequency for being greater than the first predeterminated frequency from multiple audio fragments for sorter
In the second preset ratio, and it is less than the audio fragment of third preset ratio, as target audio segment.
Wherein, the second preset length is 1 second, and third preset length is 0.5 second, and the second preset ratio can be set to
10%, 20% or other percentages.Third preset ratio can be set to 60%, 70% or other percentages.The third
Preset ratio can be identical as the first preset ratio in step 2021, can also be different.
Target audio information is divided by the second preset length and third preset length, makes multiple audio fragment
In any two adjacent audio fragments include third length identical audio-frequency information, include a upper sound in each audio fragment
The audio-frequency information of frequency segment ending guarantees the complete of audio-frequency information to reduce the tomography of audio fragment.
By the way that target audio information is divided into multiple audio fragments, for each audio fragment, pass through the audio fragment
Fundamental frequency judge that the audio fragment is noise segments, singing segment or the segment in addition to noise segments and singing segment, from
And by the noise segments and singing segment exclusion in target audio information, reduce calculation amount, and improve subsequent to target
The accuracy rate of audio-frequency information classification.
It should be noted that above-mentioned steps 2021-2023 can be combined with each other, it can be by the noise in target audio information
Segment, silence clip and singing segment and the segment in addition to above three segment distinguish, thus exclude noise segments,
Silence clip and singing segment determine the target audio segment classified, determine target sound according to target audio segment
Frequency segment is normal audio information or sensitive audio-frequency information.
203, high-pass filtering is carried out at least one target audio segment, at least one audio piece after obtaining high-pass filtering
Section.
Default cutoff frequency when high-pass filtering can be set in sorter, when carrying out high-pass filtering to audio-frequency information,
If the frequency of audio-frequency information is lower than default cutoff frequency, audio-frequency information is filtered out, if the frequency of audio-frequency information is cut not less than default
Only frequency retains audio-frequency information.
Through the above steps 202 determine at least one target audio segment after, according to default cutoff frequency, to this at least one
A target audio segment carries out high-pass filtering, if the frequency of audio-frequency information is lower than default cutoff frequency, audio-frequency information is filtered out, if
The frequency of audio-frequency information is not less than default cutoff frequency, and audio-frequency information is retained, so as to filter out low-frequency noise, obtains height
At least one audio fragment after pass filter.
Wherein, default cutoff frequency can be set to 100 hertz, 120 hertz or other frequencies, can be according to daily life
The maximum frequency setting that general low-frequency noise is likely to be breached in work.
204, feature extraction is carried out at least one target audio segment after high-pass filtering, obtains at least one target sound
At least one corresponding audio frequency characteristics of frequency segment.
Wherein, at least one audio frequency characteristics can be described target audio segment, can be mel-frequency cepstrum system
Number, linear prediction residue error or other feature of target audio segment can be described.Correspondingly, to target audio segment
When carrying out feature extraction, mel-frequency cepstrum coefficient algorithm, linear prediction residue error algorithm or other features can be used
Extraction algorithm carries out feature extraction.
For example, the dimension of setting is 40, the spy that 1-13 is tieed up when being extracted using mel-frequency cepstrum coefficient algorithm
Levy the feature as target audio segment.
Optionally, each audio fragment after high-pass filtering is divided according to the 5th preset length, obtains length etc.
In multiple audio sub-segments of the 5th preset length, feature extraction is carried out to each audio sub-segments, obtains each audio sub-pieces
The audio frequency characteristics of each audio sub-segments, are used as the audio frequency characteristics of target audio segment, alternatively, by more by the audio frequency characteristics of section
The audio frequency characteristics combination of a audio sub-segments constitutes an audio frequency characteristics of target audio segment.
For example, the 5th preset length can be 20 milliseconds, 25 milliseconds or other durations.
By dividing according to the 5th preset length to each audio fragment, multiple audio sub-segments are obtained, it can be right
Audio fragment is more accurately divided, to extract more accurate feature, accuracy rate can be improved.
205, audio classification model and at least one audio frequency characteristics are based on, determine the classification of at least one target audio segment
Mark, according to the class indication of at least one target audio segment, determines the class indication of target audio information.
For the audio frequency characteristics of each target audio segment, the audio frequency characteristics are calculated based on audio classification model,
Obtain class indication, the as class indication of the target audio segment.At least one target is arrived using aforesaid way is available
The class indication of audio fragment, so that it is determined that whether each target audio segment includes sensitive audio-frequency information.
Optionally, when audio classification model includes the first audio classification model and the second audio classification model, by target
The audio frequency characteristics of audio fragment are separately input into the first audio classification model and the second audio classification model, are based on the first audio
Disaggregated model exports the first probability, and the second audio classification model exports the second probability, and the first probability indicates target audio segment category
In the probability of normal audio information, the second probability indicates that target audio segment belongs to the probability of sensitive audio-frequency information.
When the first probability is greater than the second probability, determine that the target audio segment belongs to normal audio information, class indication
For first identifier.When the first probability is less than the second probability, determine that the target audio segment belongs to sensitive audio-frequency information, contingency table
Knowing is second identifier.When the first probability is equal to the second probability, it can determine that the class indication of the target audio segment is first
Mark or second identifier, or classify again to target audio segment.
Later, according to the class indication of at least one target audio segment, the class indication of target audio information is determined.It can
Selection of land, the process may comprise steps of at least one in 2051-2052:
2051, at least one target audio segment, the contingency table of the target audio segment of continuous first preset quantity
When knowledge is second identifier, determine that the class indication of target audio information is second identifier.
When at least one target audio segment is multiple target audio segments, according to the suitable of multiple target audio segment
Sequence traverses multiple target audio segment, and statistical classification is identified as the quantity of the successive objective audio fragment of second identifier, then
When the class indication of the target audio segment traversed is second identifier, the quantity of statistics is added 1, when the target sound traversed
When the class indication of frequency segment is first identifier, by the zeroing number of statistics.
When the quantity for determining statistics reaches the first preset quantity, determine that the class indication of target audio information is the second mark
Know, that is to say and determine the target audio information for sensitive audio-frequency information.
Wherein, the first preset quantity can be set to 3,4 or the number of other sizes.Multiple target audio piece
The sequence of section can be time sequence from morning to night of multiple target audio segment in target audio information.
2052, at least one target audio segment, class indication for second identifier target audio number of fragments institute
When the ratio accounted for reaches four preset ratios, determine that the class indication of target audio information is second identifier.
When at least one target audio segment is multiple target audio segments, statistical classification is identified as the mesh of second identifier
The quantity for marking audio fragment is preset when the quantity proportion for the target audio segment that class indication is second identifier reaches the 4th
When ratio, determines that the class indication of target audio information is second identifier, that is to say and determine the target audio information for sensitive sound
Frequency information.
Wherein, the 4th preset ratio can be set to 70%, 75% or other percentages.
It should be noted that step 2051 and step 2052 can be combined, at least one target audio segment,
The class indication of the target audio segment of continuous first preset quantity is second identifier, and class indication is the target of second identifier
When the quantity proportion of audio fragment reaches four preset ratios, just determine that the class indication of target audio information is the second mark
Know.When classifying to target audio information, while when two conditions of satisfaction, determine target audio information for sensitive audio letter
Breath, improves the accuracy rate of classification.
The embodiment of the present invention can be applied under the scenes such as network direct broadcasting, voice interface and video playing, for example, in net
Under network live scene, audio-frequency information is extracted from live video, is classified to audio-frequency information, when determine the audio-frequency information be it is quick
When feeling audio-frequency information, determines that network direct broadcasting video is objectionable video, close network direct broadcasting.Under voice interface scene, language is extracted
Audio-frequency information in sound, classifies to audio-frequency information, and when determining the audio-frequency information is sensitive audio-frequency information, voice is deleted
It removes.Under video playing scene, audio-frequency information is extracted from video, is classified to audio-frequency information, when determining the audio-frequency information
When for sensitive audio-frequency information, determines that the video is objectionable video, close the video.
Method provided in an embodiment of the present invention, by obtaining at least one target audio segment in target audio information,
High-pass filtering and feature extraction are carried out at least one target audio segment, can be filtered out low-frequency noise by high-pass filtering,
Carry out feature extraction and obtain at least one corresponding audio frequency characteristics of at least one target audio segment, based on audio classification model and
At least one audio frequency characteristics determines the class indication of at least one target audio segment, according at least one target audio segment
Class indication, determine the class indication of target audio information, so as to by target audio information be determined as normal audio letter
Breath or sensitive audio-frequency information can be by targets due to carrying out high-pass filtering before the class indication for determining target audio information
The low-frequency noise of audio-frequency information filters out, therefore is not in the case where low-frequency noise is determined as sensitive audio-frequency information, to improve
The accuracy of audio classification.
Also, obtain target audio information at least one target audio segment when, to the target audio information into
Row divides, so that multiple audio fragments after being divided sieve multiple audio fragments by the way that preset condition is arranged
Choosing will meet the audio fragment of preset condition as target audio segment, by the above-mentioned means, other audio fragments can be reduced
Interference, reduce misclassification the case where, improve the accuracy of audio classification.
Fig. 3 is a kind of structural schematic diagram of audio classification device provided in an embodiment of the present invention, referring to Fig. 3, the device packet
It includes: obtaining module 301, extraction module 302 and determining module 303;
Module 301 is obtained, for obtaining at least one target audio segment in target audio information;
Extraction module 302 obtains at least for carrying out high-pass filtering and feature extraction at least one target audio segment
At least one corresponding audio frequency characteristics of one target audio segment;
Determining module 303, for determining at least one target sound based on audio classification model and at least one audio frequency characteristics
The class indication of frequency segment determines the contingency table of target audio information according to the class indication of at least one target audio segment
Know;
Class indication includes first identifier and second identifier, and it is normal sound that first identifier, which is used to indicate corresponding audio-frequency information,
Frequency information, second identifier are used to indicate corresponding audio-frequency information as sensitive audio-frequency information.
Optionally, module 301 is obtained, comprising:
First division unit obtains length and is equal to for dividing according to the first preset length to target audio information
Multiple audio fragments of first preset length;
Fundamental frequency acquiring unit, for obtaining more in audio fragment for each audio fragment in multiple audio fragments
A fundamental frequency obtains in multiple fundamental frequencies greater than ratio shared by the fundamental frequency of the first predeterminated frequency;
Acquiring unit, for from multiple audio fragments, obtaining audio fragment of the ratio less than the first preset ratio, as
Target audio segment.
Optionally, module 301 is obtained, comprising:
Second division unit, for being drawn according to the second preset length and third preset length to target audio information
Point, obtain multiple audio fragments that length is equal to the second preset length, and the audio piece that any two in multiple audio fragments are adjacent
Section includes the identical audio-frequency information of third preset length;Third preset length is less than the second preset length;
Second division unit is also used to for each audio fragment in multiple audio fragments, according to the 4th preset length
Audio fragment is divided, multiple audio sub-segments that length is equal to the 4th preset length is obtained, obtains each audio sub-pieces
The statistical value of the amplitude of section;4th preset length is less than the second preset length;
Acquiring unit, the audio fragment for being greater than default value for from multiple audio fragments, obtaining any statistical value, makees
For target audio segment.
Optionally, module 301 is obtained, comprising:
Third division unit, for being drawn according to the second preset length and third preset length to target audio information
Point, obtain multiple audio fragments that length is equal to the second preset length, and the audio piece that any two in multiple audio fragments are adjacent
Section includes the identical audio-frequency information of third preset length;Third preset length is less than the second preset length;
Fundamental frequency acquiring unit, for obtaining more in audio fragment for each audio fragment in multiple audio fragments
A fundamental frequency obtains in multiple fundamental frequencies greater than ratio shared by the fundamental frequency of the first predeterminated frequency;
Acquiring unit, it is greater than the second preset ratio and default less than third for from multiple audio fragments, obtaining ratio
The audio fragment of ratio, as target audio segment.
Optionally, extraction module 302, comprising:
Filter unit obtains after high-pass filtering at least for carrying out high-pass filtering at least one target audio segment
One audio fragment;
Division unit is obtained for dividing according to the 5th preset length to each audio fragment after high-pass filtering
Length is equal to multiple audio sub-segments of the 5th preset length;
Extraction unit, for carrying out feature extraction to each audio sub-segments, the audio for obtaining each audio sub-segments is special
Sign.
Optionally it is determined that module 303, at least one of following for executing:
In at least one target audio segment, the class indication of the target audio segment of continuous first preset quantity is the
When two marks, determine that the class indication of target audio information is second identifier;
In at least one target audio segment, class indication is ratio shared by the target audio number of fragments of second identifier
When example reaches four preset ratios, determine that the class indication of target audio information is second identifier.
Optionally, device further include:
Module 301 is obtained, is also used to obtain the class indication of multiple sample audio information and multiple sample audio information;
Extraction module 301 is also used to carry out high-pass filtering and feature extraction to multiple sample audio information, obtains multiple samples
The corresponding multiple audio frequency characteristics of this audio-frequency information;
Training module, for carrying out model instruction according to multiple audio frequency characteristics and the corresponding class indication of each audio frequency characteristics
Practice, obtains audio classification model.
Optionally, audio classification model includes the first audio classification model and the second audio classification model;
Training module is also used to according in multiple audio frequency characteristics, class indication be the first class indication audio frequency characteristics into
Row model training obtains the first audio classification model;
Training module is also used to according in multiple audio frequency characteristics, class indication be the second class indication audio frequency characteristics into
Row model training obtains the second audio classification model.
Device provided in an embodiment of the present invention, by obtaining at least one target audio segment in target audio information,
High-pass filtering and feature extraction are carried out at least one target audio segment, can be filtered out low-frequency noise by high-pass filtering,
Carry out feature extraction and obtain at least one corresponding audio frequency characteristics of at least one target audio segment, based on audio classification model and
At least one audio frequency characteristics determines the class indication of at least one target audio segment, according at least one target audio segment
Class indication, determine the class indication of target audio information, so as to by target audio information be determined as normal audio letter
Breath or sensitive audio-frequency information can be by targets due to carrying out high-pass filtering before the class indication for determining target audio information
The low-frequency noise of audio-frequency information filters out, therefore is not in the case where low-frequency noise is determined as sensitive audio-frequency information, to improve
The accuracy of audio classification.
Also, obtain target audio information at least one target audio segment when, to the target audio information into
Row divides, so that multiple audio fragments after being divided sieve multiple audio fragments by the way that preset condition is arranged
Choosing will meet the audio fragment of preset condition as target audio segment, by the above-mentioned means, other audio fragments can be reduced
Interference, reduce misclassification the case where, improve the accuracy of audio classification.
All the above alternatives can form alternative embodiment of the invention using any combination, herein no longer
It repeats one by one.
It should be understood that audio classification device provided by the above embodiment is when classifying to audio-frequency information, only with
The division progress of above-mentioned each functional module can according to need and for example, in practical application by above-mentioned function distribution by not
Same functional module is completed, i.e., the internal structure of sorter is divided into different functional modules, described above to complete
All or part of function.In addition, audio classification device provided by the above embodiment and audio frequency classification method embodiment belong to together
One design, specific implementation process are detailed in embodiment of the method, and which is not described herein again.
Fig. 4 is a kind of structural schematic diagram of server provided in an embodiment of the present invention, which can be because of configuration or property
Energy is different and generates bigger difference, may include one or more processors (central processing
Units, CPU) 401 and one or more memory 402, wherein at least one finger is stored in the memory 402
It enables, at least one instruction is loaded by the processor 401 and executed the side to realize above-mentioned each embodiment of the method offer
Method.Certainly, which can also have the components such as wired or wireless network interface, keyboard and input/output interface, so as to
Input and output are carried out, which can also include other for realizing the component of functions of the equipments, and this will not be repeated here.
Server 400 can be used for executing step performed by sorter in above-mentioned audio frequency classification method.
Fig. 5 is a kind of structural schematic diagram of terminal provided in an embodiment of the present invention.The terminal 500 can be Portable movable
Terminal, such as: smart phone, tablet computer, MP3 player (Moving Picture Experts Group Audio
Layer III, dynamic image expert's compression standard audio level 3), MP4 (Moving Picture Experts Group
Audio Layer IV, dynamic image expert's compression standard audio level 4) it player, laptop, desktop computer, wears
Formula equipment or any other intelligent terminal.Terminal 500 be also possible to referred to as user equipment, portable terminal, laptop terminal,
Other titles such as terminal console.
In general, terminal 500 includes: processor 501 and memory 502.
Processor 501 may include one or more processing cores, such as 4 core processors, 8 core processors etc..Place
Reason device 501 can use DSP (Digital Signal Processing, Digital Signal Processing), FPGA (Field-
Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array, may be programmed
Logic array) at least one of example, in hardware realize.Processor 501 also may include primary processor and coprocessor, master
Processor is the processor for being handled data in the awake state, also referred to as CPU (Central Processing
Unit, central processing unit);Coprocessor is the low power processor for being handled data in the standby state.?
In some embodiments, processor 501 can be integrated with GPU (Graphics Processing Unit, the interaction of image procossing
Device), GPU is used to be responsible for the rendering and drafting of content to be shown needed for display screen.In some embodiments, processor 501 may be used also
To include AI (Artificial Intelligence, artificial intelligence) processor, the AI processor is for handling related engineering
The calculating operation of habit.
Memory 502 may include one or more computer readable storage mediums, which can
To be non-transient.Memory 502 may also include high-speed random access memory and nonvolatile memory, such as one
Or multiple disk storage equipments, flash memory device.In some embodiments, the non-transient computer in memory 502 can
Storage medium is read for storing at least one instruction, at least one instruction by processor 501 for being had to realize this Shen
Please in embodiment of the method provide predicting mode selecting method.
In some embodiments, terminal 500 is also optional includes: peripheral device interface 503 and at least one peripheral equipment.
It can be connected by bus or signal wire between processor 501, memory 502 and peripheral device interface 503.Each peripheral equipment
It can be connected by bus, signal wire or circuit board with peripheral device interface 503.Specifically, peripheral equipment includes: radio circuit
504, at least one of touch display screen 505, camera 506, voicefrequency circuit 507, positioning component 508 and power supply 509.
Peripheral device interface 503 can be used for I/O (Input/Output, input/output) is relevant outside at least one
Peripheral equipment is connected to processor 501 and memory 502.In some embodiments, processor 501, memory 502 and peripheral equipment
Interface 503 is integrated on same chip or circuit board;In some other embodiments, processor 501, memory 502 and outer
Any one or two in peripheral equipment interface 503 can realize on individual chip or circuit board, the present embodiment to this not
It is limited.
Radio circuit 504 is for receiving and emitting RF (Radio Frequency, radio frequency) signal, also referred to as electromagnetic signal.It penetrates
Frequency circuit 504 is communicated by electromagnetic signal with communication network and other communication equipments.Radio circuit 504 turns electric signal
It is changed to electromagnetic signal to be sent, alternatively, the electromagnetic signal received is converted to electric signal.Optionally, radio circuit 504 wraps
It includes: antenna system, RF transceiver, one or more amplifiers, tuner, oscillator, digital signal processor, codec chip
Group, user identity module card etc..Radio circuit 504 can be carried out by least one wireless communication protocol with other terminals
Communication.The wireless communication protocol includes but is not limited to: Metropolitan Area Network (MAN), each third generation mobile communication network (2G, 3G, 4G and 8G), wireless office
Domain net and/or WiFi (Wireless Fidelity, Wireless Fidelity) network.In some embodiments, radio circuit 504 may be used also
To include the related circuit of NFC (Near Field Communication, wireless near field communication), the application is not subject to this
It limits.
Display screen 505 is for showing UI (User Interface, user interface).The UI may include figure, text, figure
Mark, video and its their any combination.When display screen 505 is touch display screen, display screen 505 also there is acquisition to show
The ability of the touch signal on the surface or surface of screen 505.The touch signal can be used as control signal and be input to processor
501 are handled.At this point, display screen 505 can be also used for providing virtual push button and/or dummy keyboard, also referred to as soft button and/or
Soft keyboard.In some embodiments, display screen 505 can be one, and the front panel of terminal 500 is arranged;In other embodiments
In, display screen 505 can be at least two, be separately positioned on the different surfaces of terminal 500 or in foldover design;In still other reality
It applies in example, display screen 505 can be flexible display screen, be arranged on the curved surface of terminal 500 or on fold plane.Even, it shows
Display screen 505 can also be arranged to non-rectangle irregular figure, namely abnormity screen.Display screen 505 can use LCD (Liquid
Crystal Display, liquid crystal display), OLED (Organic Light-Emitting Diode, Organic Light Emitting Diode)
Etc. materials preparation.
CCD camera assembly 506 is for acquiring image or video.Optionally, CCD camera assembly 506 include front camera and
Rear camera.In general, the front panel of terminal is arranged in front camera, the back side of terminal is arranged in rear camera.One
In a little embodiments, rear camera at least two is main camera, depth of field camera, wide-angle camera, focal length camera shooting respectively
Any one in head, to realize that main camera and the fusion of depth of field camera realize background blurring function, main camera and wide-angle
Camera fusion realizes that pan-shot and VR (Virtual Reality, virtual reality) shooting function or other fusions are clapped
Camera shooting function.In some embodiments, CCD camera assembly 506 can also include flash lamp.Flash lamp can be monochromatic warm flash lamp,
It is also possible to double-colored temperature flash lamp.Double-colored temperature flash lamp refers to the combination of warm light flash lamp and cold light flash lamp, can be used for not
With the light compensation under colour temperature.
Voicefrequency circuit 507 may include microphone and loudspeaker.Microphone is used to acquire the sound wave of user and environment, and will
Sound wave, which is converted to electric signal and is input to processor 501, to be handled, or is input to radio circuit 504 to realize voice communication.
For stereo acquisition or the purpose of noise reduction, microphone can be separately positioned on the different parts of terminal 500 to be multiple.Mike
Wind can also be array microphone or omnidirectional's acquisition type microphone.Loudspeaker is then used to that processor 501 or radio circuit will to be come from
504 electric signal is converted to sound wave.Loudspeaker can be traditional wafer speaker, be also possible to piezoelectric ceramic loudspeaker.When
When loudspeaker is piezoelectric ceramic loudspeaker, the audible sound wave of the mankind can be not only converted electrical signals to, it can also be by telecommunications
Number the sound wave that the mankind do not hear is converted to carry out the purposes such as ranging.In some embodiments, voicefrequency circuit 507 can also include
Earphone jack.
Positioning component 508 is used for the current geographic position of positioning terminal 500, to realize navigation or LBS (Location
Based Service, location based service).Positioning component 508 can be the GPS (Global based on the U.S.
Positioning System, global positioning system), the dipper system of China, Russia Gray receive this system or European Union
The positioning component of Galileo system.
Power supply 509 is used to be powered for the various components in terminal 500.Power supply 509 can be alternating current, direct current,
Disposable battery or rechargeable battery.When power supply 509 includes rechargeable battery, which can support wired charging
Or wireless charging.The rechargeable battery can be also used for supporting fast charge technology.
In some embodiments, terminal 500 further includes having one or more sensors 510.The one or more sensors
510 include but is not limited to: acceleration transducer 511, gyro sensor 512, pressure sensor 513, fingerprint sensor 514,
Optical sensor 515 and proximity sensor 516.
The acceleration that acceleration transducer 511 can detecte in three reference axis of the coordinate system established with terminal 500 is big
It is small.For example, acceleration transducer 511 can be used for detecting component of the acceleration of gravity in three reference axis.Processor 501 can
With the acceleration of gravity signal acquired according to acceleration transducer 511, touch display screen 505 is controlled with transverse views or longitudinal view
Figure carries out the display of user interface.Acceleration transducer 511 can be also used for the acquisition of game or the exercise data of user.
Gyro sensor 512 can detecte body direction and the rotational angle of terminal 500, and gyro sensor 512 can
To cooperate with acquisition user to act the 3D of terminal 500 with acceleration transducer 511.Processor 501 is according to gyro sensor 512
Following function may be implemented in the data of acquisition: when action induction (for example changing UI according to the tilt operation of user), shooting
Image stabilization, game control and inertial navigation.
The lower layer of side frame and/or touch display screen 505 in terminal 500 can be set in pressure sensor 513.Work as pressure
When the side frame of terminal 500 is arranged in sensor 513, user can detecte to the gripping signal of terminal 500, by processor 501
Right-hand man's identification or prompt operation are carried out according to the gripping signal that pressure sensor 513 acquires.When the setting of pressure sensor 513 exists
When the lower layer of touch display screen 505, the pressure operation of touch display screen 505 is realized to UI circle according to user by processor 501
Operability control on face is controlled.Operability control includes button control, scroll bar control, icon control, menu
At least one of control.
Fingerprint sensor 514 is used to acquire the fingerprint of user, collected according to fingerprint sensor 514 by processor 501
The identity of fingerprint recognition user, alternatively, by fingerprint sensor 514 according to the identity of collected fingerprint recognition user.It is identifying
When the identity of user is trusted identity out, authorize the user that there is relevant sensitive operation, the sensitive operation packet by processor 501
Include solution lock screen, check encryption information, downloading software, payment and change setting etc..Terminal can be set in fingerprint sensor 514
500 front, the back side or side.When being provided with physical button or manufacturer Logo in terminal 500, fingerprint sensor 514 can be with
It is integrated with physical button or manufacturer's mark.
Optical sensor 515 is for acquiring ambient light intensity.In one embodiment, processor 501 can be according to optics
The ambient light intensity that sensor 515 acquires controls the display brightness of touch display screen 505.Specifically, when ambient light intensity is higher
When, the display brightness of touch display screen 505 is turned up;When ambient light intensity is lower, the display for turning down touch display screen 505 is bright
Degree.In another embodiment, the ambient light intensity that processor 501 can also be acquired according to optical sensor 515, dynamic adjust
The acquisition parameters of CCD camera assembly 506.
Proximity sensor 516, also referred to as range sensor are generally arranged at the front panel of terminal 500.Proximity sensor 516
For acquiring the distance between the front of user Yu terminal 500.In one embodiment, when proximity sensor 516 detects use
When family and the distance between the front of terminal 500 gradually become smaller, touch display screen 505 is controlled from bright screen state by processor 501
It is switched to breath screen state;When proximity sensor 516 detects user and the distance between the front of terminal 500 becomes larger,
Touch display screen 505 is controlled by processor 501 and is switched to bright screen state from breath screen state.
It will be understood by those skilled in the art that the restriction of the not structure paired terminal 500 of structure shown in Fig. 5, can wrap
It includes than illustrating more or fewer components, perhaps combine certain components or is arranged using different components.
The embodiment of the invention also provides a kind of audio classification devices, which includes processor and storage
Device is stored at least one instruction in memory, which is loaded by processor and executed the audio to realize above-described embodiment
Performed operation in classification method.
The embodiment of the invention also provides a kind of computer readable storage medium, stored in the computer readable storage medium
There is at least one instruction, which is loaded by processor and executed performed in the audio frequency classification method to realize above-described embodiment
Operation.
Those of ordinary skill in the art will appreciate that realizing that all or part of the steps of above-described embodiment can pass through hardware
It completes, relevant hardware can also be instructed to complete by program, the program can store in a kind of computer-readable
In storage medium, storage medium mentioned above can be read-only memory, disk or CD etc..
The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all in spirit of the invention and
Within principle, any modification, equivalent replacement, improvement and so on be should all be included in the protection scope of the present invention.
Claims (18)
1. a kind of audio frequency classification method, which is characterized in that the described method includes:
Obtain at least one target audio segment in target audio information;
High-pass filtering and feature extraction are carried out at least one described target audio segment, obtain at least one described target audio
At least one corresponding audio frequency characteristics of segment;
Based on audio classification model and at least one described audio frequency characteristics, the classification of at least one target audio segment is determined
Mark, according to the class indication of at least one target audio segment, determines the class indication of the target audio information;
The class indication includes first identifier and second identifier, and the first identifier is used to indicate corresponding audio-frequency information and is positive
Normal audio-frequency information, the second identifier are used to indicate corresponding audio-frequency information as sensitive audio-frequency information.
2. the method according to claim 1, wherein described at least one target obtained in target audio information
Audio fragment, comprising:
The target audio information is divided according to the first preset length, obtains length equal to first preset length
Multiple audio fragments;
For each audio fragment in the multiple audio fragment, multiple fundamental frequencies in the audio fragment are obtained, obtain institute
It states in multiple fundamental frequencies greater than ratio shared by the fundamental frequency of the first predeterminated frequency;
From the multiple audio fragment, audio fragment of the ratio less than the first preset ratio is obtained, as target audio
Segment.
3. the method according to claim 1, wherein described at least one target obtained in target audio information
Audio fragment, comprising:
The target audio information is divided according to the second preset length and third preset length, obtains length equal to described
Multiple audio fragments of second preset length, and the adjacent audio fragment of any two in the multiple audio fragment includes described
The identical audio-frequency information of three preset lengths;The third preset length is less than second preset length;
For each audio fragment in the multiple audio fragment, the audio fragment is drawn according to the 4th preset length
Point, multiple audio sub-segments that length is equal to the 4th preset length are obtained, the system of the amplitude of each audio sub-segments is obtained
Evaluation;4th preset length is less than second preset length;
From the multiple audio fragment, the audio fragment that any statistical value is greater than default value is obtained, as target audio piece
Section.
4. the method according to claim 1, wherein described at least one target obtained in target audio information
Audio fragment, comprising:
The target audio information is divided according to the second preset length and third preset length, obtains length equal to described
Multiple audio fragments of second preset length, and the adjacent audio fragment of any two in the multiple audio fragment includes described
The identical audio-frequency information of three preset lengths;The third preset length is less than second preset length;
For each audio fragment in the multiple audio fragment, multiple fundamental frequencies in the audio fragment are obtained, obtain institute
It states in multiple fundamental frequencies greater than ratio shared by the fundamental frequency of the first predeterminated frequency;
From the multiple audio fragment, the ratio is obtained greater than the second preset ratio and is less than the audio of third preset ratio
Segment, as target audio segment.
5. the method according to claim 1, wherein described carry out height at least one described target audio segment
Pass filter and feature extraction obtain at least one corresponding audio frequency characteristics of at least one described target audio segment, comprising:
High-pass filtering is carried out at least one described target audio segment, at least one audio fragment after obtaining high-pass filtering;
Each audio fragment after high-pass filtering is divided according to the 5th preset length, it is pre- equal to the described 5th to obtain length
If multiple audio sub-segments of length;
Feature extraction is carried out to each audio sub-segments, obtains the audio frequency characteristics of each audio sub-segments.
6. method according to claim 1-5, which is characterized in that described at least one target audio according to
The class indication of segment determines the class indication of the target audio information, includes at least one of the following:
In at least one described target audio segment, the class indication of the target audio segment of continuous first preset quantity is institute
When stating second identifier, determine that the class indication of the target audio information is the second identifier;
In at least one described target audio segment, class indication is shared by the target audio number of fragments of the second identifier
Ratio when reaching four preset ratios, determine that the class indication of the target audio information is the second identifier.
7. method according to claim 1-5, which is characterized in that the method also includes:
Obtain the class indication of multiple sample audio information and the multiple sample audio information;
High-pass filtering and feature extraction are carried out to the multiple sample audio information, it is corresponding to obtain the multiple sample audio information
Multiple audio frequency characteristics;
Model training is carried out according to the multiple audio frequency characteristics and the corresponding class indication of each audio frequency characteristics, obtains the sound
Frequency division class model.
8. the method according to the description of claim 7 is characterized in that the audio classification model includes the first audio classification model
It is described to be carried out according to the multiple audio frequency characteristics and the corresponding class indication of each audio frequency characteristics with the second audio classification model
Model training obtains the audio classification model, comprising:
According in the multiple audio frequency characteristics, class indication is that the audio frequency characteristics of first class indication carry out model training,
Obtain the first audio classification model;
According in the multiple audio frequency characteristics, class indication is that the audio frequency characteristics of second class indication carry out model training,
Obtain the second audio classification model.
9. a kind of audio classification device, which is characterized in that described device includes:
Module is obtained, for obtaining at least one target audio segment in target audio information;
Extraction module, for carrying out high-pass filtering and feature extraction at least one described target audio segment, obtain it is described extremely
Few at least one corresponding audio frequency characteristics of a target audio segment;
Determining module determines at least one described target for being based on audio classification model and at least one described audio frequency characteristics
The class indication of audio fragment determines the target audio letter according to the class indication of at least one target audio segment
The class indication of breath;
The class indication includes first identifier and second identifier, and the first identifier is used to indicate corresponding audio-frequency information and is positive
Normal audio-frequency information, the second identifier are used to indicate corresponding audio-frequency information as sensitive audio-frequency information.
10. device according to claim 9, which is characterized in that the acquisition module, comprising:
First division unit obtains length and is equal to for dividing according to the first preset length to the target audio information
Multiple audio fragments of first preset length;
Fundamental frequency acquiring unit, for obtaining in the audio fragment for each audio fragment in the multiple audio fragment
Multiple fundamental frequencies, obtain in the multiple fundamental frequency greater than the first predeterminated frequency fundamental frequency shared by ratio;
Acquiring unit, for obtaining audio fragment of the ratio less than the first preset ratio from the multiple audio fragment,
As target audio segment.
11. device according to claim 9, which is characterized in that the acquisition module, comprising:
Second division unit, for being drawn according to the second preset length and third preset length to the target audio information
Point, multiple audio fragments that length is equal to second preset length are obtained, and any two in the multiple audio fragment are adjacent
Audio fragment include the third preset length identical audio-frequency information;It is default that the third preset length is less than described second
Length;
Second division unit is also used to for each audio fragment in the multiple audio fragment, default according to the 4th
Length divides the audio fragment, obtains multiple audio sub-segments that length is equal to the 4th preset length, obtains
The statistical value of the amplitude of each audio sub-segments;4th preset length is less than second preset length;
Acquiring unit, the audio fragment for being greater than default value for from the multiple audio fragment, obtaining any statistical value, makees
For target audio segment.
12. device according to claim 9, which is characterized in that the acquisition module, comprising:
Third division unit, for being drawn according to the second preset length and third preset length to the target audio information
Point, multiple audio fragments that length is equal to second preset length are obtained, and any two in the multiple audio fragment are adjacent
Audio fragment include the third preset length identical audio-frequency information;It is default that the third preset length is less than described second
Length;
Fundamental frequency acquiring unit, for obtaining in the audio fragment for each audio fragment in the multiple audio fragment
Multiple fundamental frequencies, obtain in the multiple fundamental frequency greater than the first predeterminated frequency fundamental frequency shared by ratio;
Acquiring unit, for from the multiple audio fragment, obtaining the ratio greater than the second preset ratio and being less than third
The audio fragment of preset ratio, as target audio segment.
13. device according to claim 9, which is characterized in that the extraction module, comprising:
Filter unit obtains after high-pass filtering at least for carrying out high-pass filtering at least one described target audio segment
One audio fragment;
Division unit obtains length for dividing according to the 5th preset length to each audio fragment after high-pass filtering
Equal to multiple audio sub-segments of the 5th preset length;
Extraction unit obtains the audio frequency characteristics of each audio sub-segments for carrying out feature extraction to each audio sub-segments.
14. according to the described in any item devices of claim 9-13, which is characterized in that the determining module, it is following for executing
At least one of:
In at least one described target audio segment, the class indication of the target audio segment of continuous first preset quantity is institute
When stating second identifier, determine that the class indication of the target audio information is the second identifier;
In at least one described target audio segment, class indication is shared by the target audio number of fragments of the second identifier
Ratio when reaching four preset ratios, determine that the class indication of the target audio information is the second identifier.
15. according to the described in any item devices of claim 9-13, which is characterized in that described device further include:
The acquisition module is also used to obtain the class indication of multiple sample audio information and the multiple sample audio information;
The extraction module is also used to carry out high-pass filtering and feature extraction to the multiple sample audio information, obtains described
The corresponding multiple audio frequency characteristics of multiple sample audio information;
Training module, for carrying out model instruction according to the multiple audio frequency characteristics and the corresponding class indication of each audio frequency characteristics
Practice, obtains the audio classification model.
16. device according to claim 15, which is characterized in that the audio classification model includes the first audio classification mould
Type and the second audio classification model;
The training module is also used to according in the multiple audio frequency characteristics, and class indication is the sound of first class indication
Frequency feature carries out model training, obtains the first audio classification model;
The training module is also used to according in the multiple audio frequency characteristics, and class indication is the sound of second class indication
Frequency feature carries out model training, obtains the second audio classification model.
17. a kind of audio classification device, which is characterized in that described device includes processor and memory, is deposited in the memory
At least one instruction is contained, described instruction is loaded by the processor and executed to realize as any right of claim 1 to 8 is wanted
Ask operation performed in the audio frequency classification method.
18. a kind of computer readable storage medium, which is characterized in that be stored at least one in the computer readable storage medium
Item instruction, described instruction is as processor loads and executes to realize the audio point as described in claim 1 to 8 any claim
Performed operation in class method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811632676.7A CN109671425B (en) | 2018-12-29 | 2018-12-29 | Audio classification method, device and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811632676.7A CN109671425B (en) | 2018-12-29 | 2018-12-29 | Audio classification method, device and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109671425A true CN109671425A (en) | 2019-04-23 |
CN109671425B CN109671425B (en) | 2021-04-06 |
Family
ID=66146491
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811632676.7A Active CN109671425B (en) | 2018-12-29 | 2018-12-29 | Audio classification method, device and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109671425B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110933235A (en) * | 2019-11-06 | 2020-03-27 | 杭州哲信信息技术有限公司 | Noise removing method in intelligent calling system based on machine learning |
CN112667844A (en) * | 2020-12-23 | 2021-04-16 | 腾讯音乐娱乐科技(深圳)有限公司 | Method, device, equipment and storage medium for retrieving audio |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102543079A (en) * | 2011-12-21 | 2012-07-04 | 南京大学 | Method and equipment for classifying audio signals in real time |
CN104347068A (en) * | 2013-08-08 | 2015-02-11 | 索尼公司 | Audio signal processing device, audio signal processing method and monitoring system |
CN104538041A (en) * | 2014-12-11 | 2015-04-22 | 深圳市智美达科技有限公司 | Method and system for detecting abnormal sounds |
CN105719642A (en) * | 2016-02-29 | 2016-06-29 | 黄博 | Continuous and long voice recognition method and system and hardware equipment |
CN107452401A (en) * | 2017-05-27 | 2017-12-08 | 北京字节跳动网络技术有限公司 | A kind of advertising pronunciation recognition methods and device |
CN108538311A (en) * | 2018-04-13 | 2018-09-14 | 腾讯音乐娱乐科技(深圳)有限公司 | Audio frequency classification method, device and computer readable storage medium |
-
2018
- 2018-12-29 CN CN201811632676.7A patent/CN109671425B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102543079A (en) * | 2011-12-21 | 2012-07-04 | 南京大学 | Method and equipment for classifying audio signals in real time |
CN104347068A (en) * | 2013-08-08 | 2015-02-11 | 索尼公司 | Audio signal processing device, audio signal processing method and monitoring system |
CN104538041A (en) * | 2014-12-11 | 2015-04-22 | 深圳市智美达科技有限公司 | Method and system for detecting abnormal sounds |
CN105719642A (en) * | 2016-02-29 | 2016-06-29 | 黄博 | Continuous and long voice recognition method and system and hardware equipment |
CN107452401A (en) * | 2017-05-27 | 2017-12-08 | 北京字节跳动网络技术有限公司 | A kind of advertising pronunciation recognition methods and device |
CN108538311A (en) * | 2018-04-13 | 2018-09-14 | 腾讯音乐娱乐科技(深圳)有限公司 | Audio frequency classification method, device and computer readable storage medium |
Non-Patent Citations (4)
Title |
---|
FENG RONG, ETC: "Audio classification method based on machine learning", <2016 INTERNATIONAL CONFERENCE INTELLIGENT TRANSPORTATION, BIG DATA & SMART CITY> * |
LIU JIQING, ETC: "SPORTS AUDIO CLASSIFICATION BASED OF MFCC AND GMM", <PROCEEDINGS OF IC-BNMT2009> * |
江超,封化民,杨兴华: "视频中音频分类与语音识别分析研究", <2010APCID> * |
贾强: "《硕士学位论文》", 30 March 2016, 复旦大学 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110933235A (en) * | 2019-11-06 | 2020-03-27 | 杭州哲信信息技术有限公司 | Noise removing method in intelligent calling system based on machine learning |
CN110933235B (en) * | 2019-11-06 | 2021-07-27 | 杭州哲信信息技术有限公司 | Noise identification method in intelligent calling system based on machine learning |
CN112667844A (en) * | 2020-12-23 | 2021-04-16 | 腾讯音乐娱乐科技(深圳)有限公司 | Method, device, equipment and storage medium for retrieving audio |
Also Published As
Publication number | Publication date |
---|---|
CN109671425B (en) | 2021-04-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109740068B (en) | Media data recommendation method, device and storage medium | |
CN108829881B (en) | Video title generation method and device | |
CN110650379B (en) | Video abstract generation method and device, electronic equipment and storage medium | |
CN110277106B (en) | Audio quality determination method, device, equipment and storage medium | |
CN109379643A (en) | Image synthesizing method, device, terminal and storage medium | |
CN109640125B (en) | Video content processing method, device, server and storage medium | |
CN110083791A (en) | Target group detection method, device, computer equipment and storage medium | |
CN109784351B (en) | Behavior data classification method and device and classification model training method and device | |
CN110471858A (en) | Applied program testing method, device and storage medium | |
CN110222789A (en) | Image-recognizing method and storage medium | |
CN110572711A (en) | Video cover generation method and device, computer equipment and storage medium | |
CN109815150A (en) | Application testing method, device, electronic equipment and storage medium | |
CN108320756B (en) | Method and device for detecting whether audio is pure music audio | |
CN110956971A (en) | Audio processing method, device, terminal and storage medium | |
CN110163066A (en) | Multi-medium data recommended method, device and storage medium | |
CN112667844A (en) | Method, device, equipment and storage medium for retrieving audio | |
CN110853124B (en) | Method, device, electronic equipment and medium for generating GIF dynamic diagram | |
CN109671425A (en) | Audio frequency classification method, device and storage medium | |
CN114741559A (en) | Method, apparatus and storage medium for determining video cover | |
CN110675473A (en) | Method, device, electronic equipment and medium for generating GIF dynamic graph | |
CN108717849A (en) | The method, apparatus and storage medium of splicing multimedia data | |
CN113343709B (en) | Method for training intention recognition model, method, device and equipment for intention recognition | |
CN109036463A (en) | Obtain the method, apparatus and storage medium of the difficulty information of song | |
CN109117895A (en) | Data clustering method, device and storage medium | |
CN111599417B (en) | Training data acquisition method and device of solubility prediction model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |