CN109671425A - Audio frequency classification method, device and storage medium - Google Patents

Audio frequency classification method, device and storage medium Download PDF

Info

Publication number
CN109671425A
CN109671425A CN201811632676.7A CN201811632676A CN109671425A CN 109671425 A CN109671425 A CN 109671425A CN 201811632676 A CN201811632676 A CN 201811632676A CN 109671425 A CN109671425 A CN 109671425A
Authority
CN
China
Prior art keywords
audio
information
frequency
target
fragment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811632676.7A
Other languages
Chinese (zh)
Other versions
CN109671425B (en
Inventor
劳振锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Kugou Computer Technology Co Ltd
Original Assignee
Guangzhou Kugou Computer Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Kugou Computer Technology Co Ltd filed Critical Guangzhou Kugou Computer Technology Co Ltd
Priority to CN201811632676.7A priority Critical patent/CN109671425B/en
Publication of CN109671425A publication Critical patent/CN109671425A/en
Application granted granted Critical
Publication of CN109671425B publication Critical patent/CN109671425B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Acoustics & Sound (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Quality & Reliability (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Telephonic Communication Services (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of audio frequency classification method, device and storage mediums, belong to Internet technical field.Method includes: at least one the target audio segment obtained in target audio information;High-pass filtering and feature extraction are carried out at least one target audio segment, obtain at least one corresponding audio frequency characteristics of at least one target audio segment;Based on audio classification model and at least one audio frequency characteristics, the class indication of at least one target audio segment is determined, according to the class indication of at least one target audio segment, determine the class indication of target audio information;It is normal audio-frequency information that first identifier, which is used to indicate corresponding audio-frequency information, and second identifier is used to indicate corresponding audio-frequency information as sensitive audio-frequency information.High-pass filtering is carried out before the class indication for determining target audio information, the low-frequency noise of target audio information can be filtered out, therefore is not in improve the accuracy of audio classification the case where low-frequency noise is determined as sensitive audio-frequency information.

Description

Audio frequency classification method, device and storage medium
Technical field
The present invention relates to Internet technical field, in particular to a kind of audio frequency classification method, device and storage medium.
Background technique
With the rapid development of internet technology, the information scale of internet is gradually expanded, and causes many sensitive informations wide General propagation, such as objectionable video and sensitive audio, these objectionable videos and sensitive audio produce evil to the mental health of people Bad influence polluted network environment, easily cause Network Information Security Problem.Therefore, how to identify that sensitive information becomes urgently Problem to be solved.
A kind of audio frequency classification method is provided in the related technology, can be classified to audio-frequency information, be identified normal sound Frequency information and sensitive audio-frequency information.Firstly, obtaining multiple sensitive audio-frequency informations, feature is carried out to each sensitive audio-frequency information and is mentioned It takes, obtains the audio frequency characteristics of each sensitive audio-frequency information, carry out model training according to multiple audio frequency characteristics, obtain mixed Gaussian mould Type.Later, target audio information to be identified is obtained, feature extraction is carried out to target audio information, obtains target audio information Target audio feature, be based on mixed Gauss model, determine the mahalanobis distance of target audio feature and mixed Gauss model, judge Whether the mahalanobis distance is greater than preset threshold, when the mahalanobis distance is greater than preset threshold, determines that target audio information is normal Audio-frequency information, and when the mahalanobis distance is not more than preset threshold, determine target audio information for sensitive audio-frequency information.
When in target audio information including low-frequency noise, the feature more phase of the low-frequency noise and sensitive audio-frequency information Seemingly, when causing to be classified based on mixed Gauss model, low-frequency noise can be mistakenly considered to sensitive audio-frequency information, lead to audio point There is mistake in class, and accuracy is lower.
Summary of the invention
The embodiment of the invention provides a kind of audio frequency classification method, device and storage mediums, can solve the relevant technologies and deposit The problem of.The technical solution is as follows:
In a first aspect, providing a kind of audio frequency classification method, which comprises
Obtain at least one target audio segment in target audio information;
High-pass filtering and feature extraction are carried out at least one described target audio segment, obtain at least one described target At least one corresponding audio frequency characteristics of audio fragment;
Based on audio classification model and at least one described audio frequency characteristics, at least one target audio segment is determined Class indication determines the contingency table of the target audio information according to the class indication of at least one target audio segment Know;
The class indication includes first identifier and second identifier, and the first identifier is used to indicate corresponding audio-frequency information For normal audio-frequency information, the second identifier is used to indicate corresponding audio-frequency information as sensitive audio-frequency information.
Optionally, described at least one target audio segment obtained in target audio information, comprising:
The target audio information is divided according to the first preset length, obtains length equal to the described first default length Multiple audio fragments of degree;
For each audio fragment in the multiple audio fragment, multiple fundamental frequencies in the audio fragment are obtained, are obtained It takes in the multiple fundamental frequency greater than ratio shared by the fundamental frequency of the first predeterminated frequency;
From the multiple audio fragment, audio fragment of the ratio less than the first preset ratio is obtained, as target Audio fragment.
Optionally, described at least one target audio segment obtained in target audio information, comprising:
The target audio information is divided according to the second preset length and third preset length, length is obtained and is equal to Multiple audio fragments of second preset length, and the adjacent audio fragment of any two in the multiple audio fragment includes institute State the identical audio-frequency information of third preset length;The third preset length is less than second preset length;
For each audio fragment in the multiple audio fragment, according to the 4th preset length to the audio fragment into Row divides, and obtains multiple audio sub-segments that length is equal to the 4th preset length, obtains the amplitude of each audio sub-segments Statistical value;4th preset length is less than second preset length;
From the multiple audio fragment, the audio fragment that any statistical value is greater than default value is obtained, as target sound Frequency segment.
Optionally, described at least one target audio segment obtained in target audio information, comprising:
The target audio information is divided according to the second preset length and third preset length, length is obtained and is equal to Multiple audio fragments of second preset length, and the adjacent audio fragment of any two in the multiple audio fragment includes institute State the identical audio-frequency information of third preset length;The third preset length is less than second preset length;
For each audio fragment in the multiple audio fragment, multiple fundamental frequencies in the audio fragment are obtained, are obtained It takes in the multiple fundamental frequency greater than ratio shared by the fundamental frequency of the first predeterminated frequency;
From the multiple audio fragment, the ratio is obtained greater than the second preset ratio and is less than third preset ratio Audio fragment, as target audio segment.
Optionally, described that high-pass filtering and feature extraction are carried out at least one described target audio segment, it obtains described At least one corresponding audio frequency characteristics of at least one target audio segment, comprising:
High-pass filtering is carried out at least one described target audio segment, at least one audio piece after obtaining high-pass filtering Section;
Each audio fragment after high-pass filtering is divided according to the 5th preset length, length is obtained and is equal to described the Multiple audio sub-segments of five preset lengths;
Feature extraction is carried out to each audio sub-segments, obtains the audio frequency characteristics of each audio sub-segments.
Optionally, the class indication of at least one target audio segment according to determines the target audio letter The class indication of breath, includes at least one of the following:
In at least one described target audio segment, the class indication of the target audio segment of continuous first preset quantity When for the second identifier, determine that the class indication of the target audio information is the second identifier;
In at least one described target audio segment, class indication is the target audio number of fragments of the second identifier When shared ratio reaches four preset ratios, determine that the class indication of the target audio information is the second identifier.
Optionally, the method also includes:
Obtain the class indication of multiple sample audio information and the multiple sample audio information;
High-pass filtering and feature extraction are carried out to the multiple sample audio information, obtain the multiple sample audio information Corresponding multiple audio frequency characteristics;
Model training is carried out according to the multiple audio frequency characteristics and the corresponding class indication of each audio frequency characteristics, obtains institute State audio classification model.
Optionally, the audio classification model include the first audio classification model and the second audio classification model, described Model training is carried out according to the multiple audio frequency characteristics and the corresponding class indication of each audio frequency characteristics, obtains the audio classification Model, comprising:
According in the multiple audio frequency characteristics, class indication is that the audio frequency characteristics of first class indication carry out model instruction Practice, obtains the first audio classification model;
According in the multiple audio frequency characteristics, class indication is that the audio frequency characteristics of second class indication carry out model instruction Practice, obtains the second audio classification model.
Second aspect, provides a kind of audio classification device, and described device includes:
Module is obtained, for obtaining at least one target audio segment in target audio information;
Extraction module obtains institute for carrying out high-pass filtering and feature extraction at least one described target audio segment State at least one corresponding audio frequency characteristics of at least one target audio segment;
Determining module, for being based on audio classification model and at least one described audio frequency characteristics, determine it is described at least one The class indication of target audio segment determines the target sound according to the class indication of at least one target audio segment The class indication of frequency information;
The class indication includes first identifier and second identifier, and the first identifier is used to indicate corresponding audio-frequency information For normal audio-frequency information, the second identifier is used to indicate corresponding audio-frequency information as sensitive audio-frequency information.
Optionally, the acquisition module, comprising:
First division unit obtains length for dividing according to the first preset length to the target audio information Equal to multiple audio fragments of first preset length;
Fundamental frequency acquiring unit, for obtaining the audio piece for each audio fragment in the multiple audio fragment Multiple fundamental frequencies in section obtain in the multiple fundamental frequency greater than ratio shared by the fundamental frequency of the first predeterminated frequency;
Acquiring unit, for obtaining audio of the ratio less than the first preset ratio from the multiple audio fragment Segment, as target audio segment.
Optionally, the acquisition module, comprising:
Second division unit, for being carried out according to the second preset length and third preset length to the target audio information It divides, obtains multiple audio fragments that length is equal to second preset length, and any two phases in the multiple audio fragment Adjacent audio fragment includes the identical audio-frequency information of the third preset length;It is pre- that the third preset length is less than described second If length;
Second division unit is also used to for each audio fragment in the multiple audio fragment, according to the 4th Preset length divides the audio fragment, obtains multiple audio sub-segments that length is equal to the 4th preset length, Obtain the statistical value of the amplitude of each audio sub-segments;4th preset length is less than second preset length;
Acquiring unit, the audio piece for being greater than default value for from the multiple audio fragment, obtaining any statistical value Section, as target audio segment.
Optionally, the acquisition module, comprising:
Third division unit, for being carried out according to the second preset length and third preset length to the target audio information It divides, obtains multiple audio fragments that length is equal to second preset length, and any two phases in the multiple audio fragment Adjacent audio fragment includes the identical audio-frequency information of the third preset length;It is pre- that the third preset length is less than described second If length;
Fundamental frequency acquiring unit, for obtaining the audio piece for each audio fragment in the multiple audio fragment Multiple fundamental frequencies in section obtain in the multiple fundamental frequency greater than ratio shared by the fundamental frequency of the first predeterminated frequency;
Acquiring unit, for from the multiple audio fragment, obtaining the ratio greater than the second preset ratio and being less than The audio fragment of third preset ratio, as target audio segment.
Optionally, the extraction module, comprising:
Filter unit, for carrying out high-pass filtering at least one described target audio segment, after obtaining high-pass filtering At least one audio fragment;
Division unit is obtained for dividing according to the 5th preset length to each audio fragment after high-pass filtering Length is equal to multiple audio sub-segments of the 5th preset length;
Extraction unit, for carrying out feature extraction to each audio sub-segments, the audio for obtaining each audio sub-segments is special Sign.
Optionally, the determining module, at least one of following for executing:
In at least one described target audio segment, the class indication of the target audio segment of continuous first preset quantity When for the second identifier, determine that the class indication of the target audio information is the second identifier;
In at least one described target audio segment, class indication is the target audio number of fragments of the second identifier When shared ratio reaches four preset ratios, determine that the class indication of the target audio information is the second identifier.
Optionally, described device further include:
The acquisition module is also used to obtain the contingency table of multiple sample audio information and the multiple sample audio information Know;
The extraction module is also used to carry out high-pass filtering and feature extraction to the multiple sample audio information, obtain The corresponding multiple audio frequency characteristics of the multiple sample audio information;
Training module, for carrying out mould according to the multiple audio frequency characteristics and the corresponding class indication of each audio frequency characteristics Type training obtains the audio classification model.
Optionally, the audio classification model includes the first audio classification model and the second audio classification model;
The training module is also used to according in the multiple audio frequency characteristics, and class indication is first class indication Audio frequency characteristics carry out model training, obtain the first audio classification model;
The training module is also used to according in the multiple audio frequency characteristics, and class indication is second class indication Audio frequency characteristics carry out model training, obtain the second audio classification model.
The third aspect, provides a kind of audio classification device, and described device includes processor and memory, the memory In be stored at least one instruction, described instruction is loaded by the processor and is executed to realize audio as described in relation to the first aspect Performed operation in classification method.
Fourth aspect provides a kind of computer readable storage medium, is stored in the computer readable storage medium At least one instruction, described instruction are loaded by processor and are executed to realize institute in audio frequency classification method as described in relation to the first aspect The operation of execution.
Technical solution provided in an embodiment of the present invention has the benefit that
Method, apparatus provided in an embodiment of the present invention and storage medium, by obtaining at least one in target audio information A target audio segment carries out high-pass filtering and feature extraction at least one target audio segment, can be with by high-pass filtering Low-frequency noise is filtered out, feature extraction is carried out and obtains at least one corresponding audio frequency characteristics of at least one target audio segment, base In audio classification model and at least one audio frequency characteristics, the class indication of at least one target audio segment is determined, according at least The class indication of one target audio segment determines the class indication of target audio information, so as to by target audio information It is determined as normal audio information or sensitive audio-frequency information, it is high due to being carried out before the class indication for determining target audio information Pass filter can filter out the low-frequency noise of target audio information, therefore be not in that low-frequency noise is determined as sensitive audio letter The case where breath, improves the accuracy of audio classification.
Also, obtain target audio information at least one target audio segment when, to the target audio information into Row divides, so that multiple audio fragments after being divided sieve multiple audio fragments by the way that preset condition is arranged Choosing will meet the audio fragment of preset condition as target audio segment, by the above-mentioned means, other audio fragments can be reduced Interference, reduce misclassification the case where, improve the accuracy of audio classification.
Detailed description of the invention
To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment Attached drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for For those of ordinary skill in the art, without creative efforts, it can also be obtained according to these attached drawings other Attached drawing.
Fig. 1 is a kind of flow chart of audio frequency classification method provided in an embodiment of the present invention;
Fig. 2 is a kind of flow chart of audio frequency classification method provided in an embodiment of the present invention;
Fig. 3 is a kind of structural schematic diagram of audio classification device provided in an embodiment of the present invention;
Fig. 4 is a kind of structural schematic diagram of server provided in an embodiment of the present invention;
Fig. 5 is a kind of structural schematic diagram of terminal provided in an embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are some of the embodiments of the present invention, instead of all the embodiments.Based on this hair Embodiment in bright, every other implementation obtained by those of ordinary skill in the art without making creative efforts Example, shall fall within the protection scope of the present invention.
Fig. 1 is a kind of flow chart of audio frequency classification method provided in an embodiment of the present invention.The execution master of the inventive embodiments Body is sorter, referring to Fig. 1, this method comprises:
101, at least one target audio segment in target audio information is obtained.
102, high-pass filtering and feature extraction are carried out at least one target audio segment, obtains at least one target audio At least one corresponding audio frequency characteristics of segment.
103, audio classification model and at least one audio frequency characteristics are based on, determine the classification of at least one target audio segment Mark, according to the class indication of at least one target audio segment, determines the class indication of target audio information.
Wherein, class indication includes first identifier and second identifier, and first identifier is used to indicate corresponding audio-frequency information and is Normal audio information, second identifier are used to indicate corresponding audio-frequency information as sensitive audio-frequency information.
Method provided in an embodiment of the present invention, by obtaining at least one target audio segment in target audio information, High-pass filtering and feature extraction are carried out at least one target audio segment, can be filtered out low-frequency noise by high-pass filtering, Carry out feature extraction and obtain at least one corresponding audio frequency characteristics of at least one target audio segment, based on audio classification model and At least one audio frequency characteristics determines the class indication of at least one target audio segment, according at least one target audio segment Class indication, determine the class indication of target audio information, so as to by target audio information be determined as normal audio letter Breath or sensitive audio-frequency information can be by targets due to carrying out high-pass filtering before the class indication for determining target audio information The low-frequency noise of audio-frequency information filters out, therefore is not in the case where low-frequency noise is determined as sensitive audio-frequency information, to improve The accuracy of audio classification.
Optionally, at least one target audio segment in target audio information is obtained, comprising:
Target audio information is divided according to the first preset length, it is multiple equal to the first preset length to obtain length Audio fragment;
For each audio fragment in multiple audio fragments, multiple fundamental frequencies in audio fragment are obtained, multiple bases are obtained Greater than ratio shared by the fundamental frequency of the first predeterminated frequency in frequency;
From multiple audio fragments, audio fragment of the ratio less than the first preset ratio is obtained, as target audio segment.
Optionally, at least one target audio segment in target audio information is obtained, comprising:
Target audio information is divided according to the second preset length and third preset length, obtains length equal to second Multiple audio fragments of preset length, and the adjacent audio fragment of any two in multiple audio fragments includes third preset length Identical audio-frequency information;Third preset length is less than the second preset length;
For each audio fragment in multiple audio fragments, audio fragment is divided according to the 4th preset length, Multiple audio sub-segments that length is equal to the 4th preset length are obtained, the statistical value of the amplitude of each audio sub-segments is obtained;The Four preset lengths are less than the second preset length;
From multiple audio fragments, the audio fragment that any statistical value is greater than default value is obtained, as target audio piece Section.
Optionally, at least one target audio segment in target audio information is obtained, comprising:
Target audio information is divided according to the second preset length and third preset length, obtains length equal to second Multiple audio fragments of preset length, and the adjacent audio fragment of any two in multiple audio fragments includes third preset length Identical audio-frequency information;Third preset length is less than the second preset length;
For each audio fragment in multiple audio fragments, multiple fundamental frequencies in audio fragment are obtained, multiple bases are obtained Greater than ratio shared by the fundamental frequency of the first predeterminated frequency in frequency;
From multiple audio fragments, the ratio that obtains is greater than the second preset ratio and is less than the audio piece of third preset ratio Section, as target audio segment.
Optionally, high-pass filtering and feature extraction are carried out at least one target audio segment, obtains at least one target At least one corresponding audio frequency characteristics of audio fragment, comprising:
High-pass filtering is carried out at least one target audio segment, at least one audio fragment after obtaining high-pass filtering;
Each audio fragment after high-pass filtering is divided according to the 5th preset length, it is pre- equal to the 5th to obtain length If multiple audio sub-segments of length;
Feature extraction is carried out to each audio sub-segments, obtains the audio frequency characteristics of each audio sub-segments.
Optionally, according to the class indication of at least one target audio segment, the class indication of target audio information is determined, It includes at least one of the following:
In at least one target audio segment, the class indication of the target audio segment of continuous first preset quantity is the When two marks, determine that the class indication of target audio information is second identifier;
In at least one target audio segment, class indication is ratio shared by the target audio number of fragments of second identifier When example reaches four preset ratios, determine that the class indication of target audio information is second identifier.
Optionally, method further include:
Obtain the class indication of multiple sample audio information and multiple sample audio information;
High-pass filtering and feature extraction are carried out to multiple sample audio information, it is corresponding more to obtain multiple sample audio information A audio frequency characteristics;
Model training is carried out according to multiple audio frequency characteristics and the corresponding class indication of each audio frequency characteristics, obtains audio point Class model.
Optionally, audio classification model includes the first audio classification model and the second audio classification model, according to multiple sounds Frequency feature and the corresponding class indication of each audio frequency characteristics carry out model training, obtain audio classification model, comprising:
According in multiple audio frequency characteristics, class indication is that the audio frequency characteristics of the first class indication carry out model training, is obtained First audio classification model;
According in multiple audio frequency characteristics, class indication is that the audio frequency characteristics of the second class indication carry out model training, is obtained Second audio classification model.
All the above alternatives can form alternative embodiment of the invention using any combination, herein no longer It repeats one by one.
Fig. 2 is a kind of flow chart of audio frequency classification method provided in an embodiment of the present invention.The execution master of the inventive embodiments Body is sorter, which can be the terminals such as mobile phone, computer or tablet computer, or server.Ginseng See Fig. 2, this method comprises:
201, audio classification model is obtained.
In the embodiment of the present invention, it can be classified to any audio-frequency information based on audio classification model, determine the audio Information is normal audio-frequency information or sensitive audio-frequency information.Wherein, audio classification model is used to determine the contingency table of audio-frequency information Know, which includes first identifier and second identifier, and it is normal audio that first identifier, which is used to indicate corresponding audio-frequency information, Information, second identifier are used to indicate corresponding audio-frequency information as sensitive audio-frequency information.
First identifier is two different marks from second identifier, for example, first identifier is 1, second identifier 0, alternatively, First identifier is 0, second identifier 1.
The audio classification model can be obtained by sorter training, and be stored by sorter, alternatively, the audio classification Model is sent to sorter after being trained by other equipment, and is stored by sorter.
In training audio classification model, the contingency table of multiple sample audio information and multiple sample audio information is obtained Know, for each sample audio information, high-pass filtering is carried out to the sample audio information, is filtered out low in the sample audio information Frequency noise carries out feature extraction to the sample audio information after high-pass filtering, and it is special to obtain the corresponding audio of the sample audio information Sign.Using the corresponding multiple audio frequency characteristics of the available multiple sample audio information of aforesaid way, and multiple sample audios are believed The class indication of breath is as the corresponding class indication of multiple audio frequency characteristics, according to multiple audio frequency characteristics and each audio frequency characteristics pair The class indication answered carries out model training, obtains audio classification model.
For example, each audio frequency characteristics can be described sample audio information, it can be mel-frequency cepstrum coefficient, line Property prediction cepstrum coefficient or other feature of target audio segment can be described.Correspondingly, it is carried out to sample audio information When feature extraction, mel-frequency cepstrum coefficient algorithm, linear prediction residue error algorithm or other feature extractions can be used Algorithm carries out feature extraction.
Furthermore it is possible to which using a variety of training algorithms training audio classification model, which can be mixed for Gauss Molding type, neural network model, decision-tree model or other models.
By carrying out high-pass filtering to sample audio information, the low-frequency noise in sample audio information can be filtered out, is extracted Obtain more can accurate description sample audio information audio frequency characteristics, moreover it is possible to the interference for avoiding low-frequency noise improves trained sound The accuracy rate of frequency division class model.
Optionally, in training audio classification model, initial audio classification model can be first constructed, obtains training data It includes multiple sample audio information that collection and test data set, training dataset and test data, which are concentrated,.
The multiple sample audio information concentrated to training data carry out high-pass filtering respectively, to multiple samples after high-pass filtering This audio-frequency information carries out feature extraction, the corresponding multiple audio frequency characteristics of multiple sample audio information is obtained, by multiple audio frequency characteristics As the input of audio classification model, audio classification model is trained, make audio classification model to normal audio information and Difference between sensitive audio-frequency information is learnt, and has the ability for distinguishing normal audio information and sensitive audio-frequency information.
Later, the multiple sample audio information concentrated to test data carry out high-pass filtering respectively, after high-pass filtering Multiple sample audio information carry out feature extraction, the corresponding multiple audio frequency characteristics of multiple sample audio information are obtained, by multiple sounds Frequency feature is input in audio classification model, is based on audio classification model, is determined the class indication of each sample audio information, will Determining class indication is compared with actual classification mark, is updated according to comparing result to audio classification model.
In subsequent process, new sample audio information and its class indication can also be obtained, continues to train audio classification mould Type, so as to improve the accuracy rate of audio classification model.
Optionally, audio classification model includes the first audio classification model and the second audio classification model, is carrying out model Training when, obtain the class indication of multiple sample audio information and multiple sample audio information, to multiple sample audio information into Row high-pass filtering and feature extraction obtain the corresponding multiple audio frequency characteristics of multiple sample audio information, are the according to class indication Multiple audio frequency characteristics of one class indication carry out model training, obtain the first audio classification model, are second according to class indication The audio frequency characteristics of class indication carry out model training, obtain the second audio classification model.
Wherein, the first audio classification model can learn the feature of normal audio information, have identification normal sound The ability of frequency information.It can determine that any audio-frequency information belongs to the probability of normal audio information based on the first audio classification model. Second audio classification model can learn the feature of sensitive audio-frequency information, have the ability for identifying sensitive audio-frequency information. It can determine that any audio-frequency information belongs to the probability of sensitive audio-frequency information based on the second audio classification model.It is subsequent can be based on the One audio classification model and the second audio classification model, classify to target audio information.
By being trained respectively to two different sample audio information, the first audio classification model and the second sound are obtained Frequency division class model can be improved specific aim, and then improve the accuracy rate of audio classification model.
202, at least one target audio segment in target audio information is obtained.
In the embodiment of the present invention, the target audio information be audio-frequency information to be sorted, need to target audio information into Row classification determines that the target audio information is normal audio-frequency information or sensitive audio-frequency information.
For from message form, which can be the audio-frequency information in single audio file, Huo Zheke Think the audio-frequency information extracted from video file, or can also be the audio-frequency information of other forms.
For from information source, which can be recorded to obtain by sorter, or by sorter It downloads and obtains from internet, or be sent to sorter by other equipment.For example, playing live video in sorter In the process, the audio-frequency information in available live video, as target audio information.
For from the information content, which may include singing audio-frequency information, chat audio-frequency information, sensitivity Audio-frequency information, noise audio information etc..
Optionally, sorter can be using complete target audio information as target audio segment to be sorted.
Alternatively, preset condition can also be arranged, which is used to provide to may be that the audio of sensitive audio-frequency information is believed Met condition is ceased, i.e., when a certain audio fragment meets the preset condition, indicates that the audio fragment may be comprising sensitive sound Frequency information, and when the audio fragment is unsatisfactory for the preset condition, indicate that the audio fragment does not include sensitive audio-frequency information.
It, can not be to entire target audio information in order to improve accuracy rate when sorter obtains target audio information Directly classify, but target audio information is divided at least one audio fragment, judges that at least one audio fragment is It is no to meet preset condition, so that the audio fragment for meeting preset condition is determined as target audio segment, so as to it is subsequent can be right Target audio segment is classified, and is no longer classified to other audio fragments.
Optionally, which includes the following steps at least one in 2021-2023:
2021, other audio fragments in target audio information in addition to singing audio fragment are obtained, as target audio Segment.
Target audio information is divided according to the first preset length, it is multiple equal to the first preset length to obtain length Audio fragment obtains multiple fundamental frequencies in audio fragment, obtains multiple bases for each audio fragment in multiple audio fragments Greater than ratio shared by the fundamental frequency of the first predeterminated frequency in frequency.
Wherein, fundamental frequency is the frequency of the elementary audio of audio fragment, is obtained in multiple fundamental frequencies greater than the first predeterminated frequency When ratio shared by fundamental frequency, the fundamental frequency quantity for being greater than the first predeterminated frequency is obtained, calculates the fundamental frequency quantity and the audio piece of acquisition The ratio between fundamental frequency quantity in section, to obtain in multiple fundamental frequencies greater than ratio shared by the fundamental frequency of the first predeterminated frequency.
When in audio fragment be greater than the first predeterminated frequency fundamental frequency shared by ratio be not less than the first preset ratio when, determine The audio fragment is singing segment, does not include sensitive audio-frequency information.And when the fundamental frequency for being greater than the first predeterminated frequency in audio fragment It when shared ratio is less than the first preset ratio, determines that the audio fragment is not singing segment, then may include sensitive audio Information.
Therefore, it is small to obtain ratio shared by the fundamental frequency for being greater than the first predeterminated frequency from multiple audio fragments for sorter In the audio fragment of the first preset ratio, as target audio segment.
Wherein, the first preset length can be set to 10 seconds, 20 seconds or other durations.First predeterminated frequency can be set It is 150 hertz, 160 hertz or other frequencies.First preset ratio can be set to 60%, 70% or other percentages.
For example, the first preset length is 20 seconds, the first preset ratio is 60%, and the first predeterminated frequency is 150 hertz.By mesh Mark audio-frequency information is divided into 20 seconds multiple audio fragments, 100 fundamental frequencies is extracted from each audio fragment, if a certain audio Fundamental frequency quantity in segment greater than 150 hertz is 50, and shared ratio is 50%, less than the first preset ratio 60%, then will The audio fragment is as target audio segment.
By the way that target audio information is divided into multiple audio fragments, for each audio fragment, according to the audio fragment Fundamental frequency judge whether the audio fragment is singing segment, so as to by target audio information singing segment exclude, subtract Lack calculation amount, and improves the subsequent accuracy rate to the classification of target audio information.
2022, other audio fragments in target audio information in addition to mute audio segment are obtained, as target audio Segment.
Target audio information is divided according to the second preset length and third preset length, obtains length equal to second Multiple audio fragments of preset length, and the adjacent audio fragment of any two in multiple audio fragments includes third preset length Identical audio-frequency information carries out each audio fragment in multiple audio fragments according to the 4th preset length to audio fragment It divides, obtains multiple audio sub-segments that length is equal to the 4th preset length, obtain the statistics of the amplitude of each audio sub-segments Value.
Wherein, amplitude is used to indicate the energy size of audio fragment, and the statistical value of the amplitude can be each audio sub-pieces Section amplitude absolute value average value, square average value or other statistical values.
When the statistical value of the amplitude in each audio sub-segments is less than default value, determine that audio fragment is mute plate Section does not include sensitive audio-frequency information.And when the statistical value of any amplitude is not less than default value, determine that the audio fragment is non- Silence clip may then include sensitive audio-frequency information.
Therefore, sorter obtains the audio fragment that any statistical value is greater than default value from multiple audio fragments, makees For target audio segment.
Wherein, the second preset length can be set to 1 second, 2 seconds or other durations, and third preset length can be set to 0.4 second, 0.5 second or other durations, the third preset length is less than the second preset length.4th preset length can be set to 0.1 second, 0.2 second or other durations, the 4th preset length, and can according to the 4th preset length less than the second preset length The audio fragment of the second preset length is divided into audio sub-segments.Default value can be set to 0.2,0.3 or other Numerical value.
For example, the second preset length is 1 second, third preset length is 0.5 second, and the 4th preset length is 0.2 second, present count Value is 0.3.It is divided according to the second preset length and third preset length, obtains multiple audio fragments, respectively 0-1 seconds, - 1.5 seconds 0.5 second, -2 seconds 1 second, and so on, target audio information is divided into multiple audio fragments, then by multiple sound Frequency segment is respectively divided into 0.2 second multiple audio sub-segments, calculates separately multiple audio sub-segments in each audio fragment Amplitude average value, if the average value of any of a certain audio fragment audio sub-segments be 0.4, be greater than default value 0.3, then using the audio fragment as target audio segment.
Target audio information is divided by the second preset length and third preset length, makes multiple audio fragment In any two adjacent audio fragments include third length identical audio-frequency information, include a upper sound in each audio fragment The audio-frequency information of frequency segment ending guarantees the complete of audio-frequency information to reduce the tomography of audio fragment.
Also, by the way that target audio information is divided into multiple audio fragments, for each audio fragment, by the audio piece Section is divided into multiple audio sub-segments, determines whether the audio fragment is mute plate according to the statistical value of each audio sub-segments Section reduces calculation amount, and improve following pairs of target sounds so as to exclude the silence clip in target audio information The accuracy rate of frequency information classification.
2023, other audio fragments in target audio information in addition to noise segments and singing segment are obtained, as mesh Mark audio fragment.
Target audio information is divided according to the second preset length and third preset length, obtains length equal to second Multiple audio fragments of preset length, and the adjacent audio fragment of any two in multiple audio fragments includes third preset length Identical audio-frequency information obtains multiple fundamental frequencies in audio fragment for each audio fragment in multiple audio fragments, obtains more Greater than ratio shared by the fundamental frequency of the first predeterminated frequency in a fundamental frequency.
When in audio fragment be greater than the first predeterminated frequency fundamental frequency shared by ratio be not more than the second preset ratio when, determine The audio fragment is noise segments, does not include sensitive audio-frequency information.When the fundamental frequency institute for being greater than the first predeterminated frequency in audio fragment When the ratio accounted for is not less than third preset ratio, determines that the audio fragment is singing segment, do not include sensitive audio-frequency information.And work as It is greater than the second preset ratio greater than ratio shared by the fundamental frequency of the first predeterminated frequency in audio fragment, and is less than third preset ratio When, the audio fragment is determined neither noise segments, nor singing segment, then may include sensitive audio-frequency information.
Therefore, it is big to obtain ratio shared by the fundamental frequency for being greater than the first predeterminated frequency from multiple audio fragments for sorter In the second preset ratio, and it is less than the audio fragment of third preset ratio, as target audio segment.
Wherein, the second preset length is 1 second, and third preset length is 0.5 second, and the second preset ratio can be set to 10%, 20% or other percentages.Third preset ratio can be set to 60%, 70% or other percentages.The third Preset ratio can be identical as the first preset ratio in step 2021, can also be different.
Target audio information is divided by the second preset length and third preset length, makes multiple audio fragment In any two adjacent audio fragments include third length identical audio-frequency information, include a upper sound in each audio fragment The audio-frequency information of frequency segment ending guarantees the complete of audio-frequency information to reduce the tomography of audio fragment.
By the way that target audio information is divided into multiple audio fragments, for each audio fragment, pass through the audio fragment Fundamental frequency judge that the audio fragment is noise segments, singing segment or the segment in addition to noise segments and singing segment, from And by the noise segments and singing segment exclusion in target audio information, reduce calculation amount, and improve subsequent to target The accuracy rate of audio-frequency information classification.
It should be noted that above-mentioned steps 2021-2023 can be combined with each other, it can be by the noise in target audio information Segment, silence clip and singing segment and the segment in addition to above three segment distinguish, thus exclude noise segments, Silence clip and singing segment determine the target audio segment classified, determine target sound according to target audio segment Frequency segment is normal audio information or sensitive audio-frequency information.
203, high-pass filtering is carried out at least one target audio segment, at least one audio piece after obtaining high-pass filtering Section.
Default cutoff frequency when high-pass filtering can be set in sorter, when carrying out high-pass filtering to audio-frequency information, If the frequency of audio-frequency information is lower than default cutoff frequency, audio-frequency information is filtered out, if the frequency of audio-frequency information is cut not less than default Only frequency retains audio-frequency information.
Through the above steps 202 determine at least one target audio segment after, according to default cutoff frequency, to this at least one A target audio segment carries out high-pass filtering, if the frequency of audio-frequency information is lower than default cutoff frequency, audio-frequency information is filtered out, if The frequency of audio-frequency information is not less than default cutoff frequency, and audio-frequency information is retained, so as to filter out low-frequency noise, obtains height At least one audio fragment after pass filter.
Wherein, default cutoff frequency can be set to 100 hertz, 120 hertz or other frequencies, can be according to daily life The maximum frequency setting that general low-frequency noise is likely to be breached in work.
204, feature extraction is carried out at least one target audio segment after high-pass filtering, obtains at least one target sound At least one corresponding audio frequency characteristics of frequency segment.
Wherein, at least one audio frequency characteristics can be described target audio segment, can be mel-frequency cepstrum system Number, linear prediction residue error or other feature of target audio segment can be described.Correspondingly, to target audio segment When carrying out feature extraction, mel-frequency cepstrum coefficient algorithm, linear prediction residue error algorithm or other features can be used Extraction algorithm carries out feature extraction.
For example, the dimension of setting is 40, the spy that 1-13 is tieed up when being extracted using mel-frequency cepstrum coefficient algorithm Levy the feature as target audio segment.
Optionally, each audio fragment after high-pass filtering is divided according to the 5th preset length, obtains length etc. In multiple audio sub-segments of the 5th preset length, feature extraction is carried out to each audio sub-segments, obtains each audio sub-pieces The audio frequency characteristics of each audio sub-segments, are used as the audio frequency characteristics of target audio segment, alternatively, by more by the audio frequency characteristics of section The audio frequency characteristics combination of a audio sub-segments constitutes an audio frequency characteristics of target audio segment.
For example, the 5th preset length can be 20 milliseconds, 25 milliseconds or other durations.
By dividing according to the 5th preset length to each audio fragment, multiple audio sub-segments are obtained, it can be right Audio fragment is more accurately divided, to extract more accurate feature, accuracy rate can be improved.
205, audio classification model and at least one audio frequency characteristics are based on, determine the classification of at least one target audio segment Mark, according to the class indication of at least one target audio segment, determines the class indication of target audio information.
For the audio frequency characteristics of each target audio segment, the audio frequency characteristics are calculated based on audio classification model, Obtain class indication, the as class indication of the target audio segment.At least one target is arrived using aforesaid way is available The class indication of audio fragment, so that it is determined that whether each target audio segment includes sensitive audio-frequency information.
Optionally, when audio classification model includes the first audio classification model and the second audio classification model, by target The audio frequency characteristics of audio fragment are separately input into the first audio classification model and the second audio classification model, are based on the first audio Disaggregated model exports the first probability, and the second audio classification model exports the second probability, and the first probability indicates target audio segment category In the probability of normal audio information, the second probability indicates that target audio segment belongs to the probability of sensitive audio-frequency information.
When the first probability is greater than the second probability, determine that the target audio segment belongs to normal audio information, class indication For first identifier.When the first probability is less than the second probability, determine that the target audio segment belongs to sensitive audio-frequency information, contingency table Knowing is second identifier.When the first probability is equal to the second probability, it can determine that the class indication of the target audio segment is first Mark or second identifier, or classify again to target audio segment.
Later, according to the class indication of at least one target audio segment, the class indication of target audio information is determined.It can Selection of land, the process may comprise steps of at least one in 2051-2052:
2051, at least one target audio segment, the contingency table of the target audio segment of continuous first preset quantity When knowledge is second identifier, determine that the class indication of target audio information is second identifier.
When at least one target audio segment is multiple target audio segments, according to the suitable of multiple target audio segment Sequence traverses multiple target audio segment, and statistical classification is identified as the quantity of the successive objective audio fragment of second identifier, then When the class indication of the target audio segment traversed is second identifier, the quantity of statistics is added 1, when the target sound traversed When the class indication of frequency segment is first identifier, by the zeroing number of statistics.
When the quantity for determining statistics reaches the first preset quantity, determine that the class indication of target audio information is the second mark Know, that is to say and determine the target audio information for sensitive audio-frequency information.
Wherein, the first preset quantity can be set to 3,4 or the number of other sizes.Multiple target audio piece The sequence of section can be time sequence from morning to night of multiple target audio segment in target audio information.
2052, at least one target audio segment, class indication for second identifier target audio number of fragments institute When the ratio accounted for reaches four preset ratios, determine that the class indication of target audio information is second identifier.
When at least one target audio segment is multiple target audio segments, statistical classification is identified as the mesh of second identifier The quantity for marking audio fragment is preset when the quantity proportion for the target audio segment that class indication is second identifier reaches the 4th When ratio, determines that the class indication of target audio information is second identifier, that is to say and determine the target audio information for sensitive sound Frequency information.
Wherein, the 4th preset ratio can be set to 70%, 75% or other percentages.
It should be noted that step 2051 and step 2052 can be combined, at least one target audio segment, The class indication of the target audio segment of continuous first preset quantity is second identifier, and class indication is the target of second identifier When the quantity proportion of audio fragment reaches four preset ratios, just determine that the class indication of target audio information is the second mark Know.When classifying to target audio information, while when two conditions of satisfaction, determine target audio information for sensitive audio letter Breath, improves the accuracy rate of classification.
The embodiment of the present invention can be applied under the scenes such as network direct broadcasting, voice interface and video playing, for example, in net Under network live scene, audio-frequency information is extracted from live video, is classified to audio-frequency information, when determine the audio-frequency information be it is quick When feeling audio-frequency information, determines that network direct broadcasting video is objectionable video, close network direct broadcasting.Under voice interface scene, language is extracted Audio-frequency information in sound, classifies to audio-frequency information, and when determining the audio-frequency information is sensitive audio-frequency information, voice is deleted It removes.Under video playing scene, audio-frequency information is extracted from video, is classified to audio-frequency information, when determining the audio-frequency information When for sensitive audio-frequency information, determines that the video is objectionable video, close the video.
Method provided in an embodiment of the present invention, by obtaining at least one target audio segment in target audio information, High-pass filtering and feature extraction are carried out at least one target audio segment, can be filtered out low-frequency noise by high-pass filtering, Carry out feature extraction and obtain at least one corresponding audio frequency characteristics of at least one target audio segment, based on audio classification model and At least one audio frequency characteristics determines the class indication of at least one target audio segment, according at least one target audio segment Class indication, determine the class indication of target audio information, so as to by target audio information be determined as normal audio letter Breath or sensitive audio-frequency information can be by targets due to carrying out high-pass filtering before the class indication for determining target audio information The low-frequency noise of audio-frequency information filters out, therefore is not in the case where low-frequency noise is determined as sensitive audio-frequency information, to improve The accuracy of audio classification.
Also, obtain target audio information at least one target audio segment when, to the target audio information into Row divides, so that multiple audio fragments after being divided sieve multiple audio fragments by the way that preset condition is arranged Choosing will meet the audio fragment of preset condition as target audio segment, by the above-mentioned means, other audio fragments can be reduced Interference, reduce misclassification the case where, improve the accuracy of audio classification.
Fig. 3 is a kind of structural schematic diagram of audio classification device provided in an embodiment of the present invention, referring to Fig. 3, the device packet It includes: obtaining module 301, extraction module 302 and determining module 303;
Module 301 is obtained, for obtaining at least one target audio segment in target audio information;
Extraction module 302 obtains at least for carrying out high-pass filtering and feature extraction at least one target audio segment At least one corresponding audio frequency characteristics of one target audio segment;
Determining module 303, for determining at least one target sound based on audio classification model and at least one audio frequency characteristics The class indication of frequency segment determines the contingency table of target audio information according to the class indication of at least one target audio segment Know;
Class indication includes first identifier and second identifier, and it is normal sound that first identifier, which is used to indicate corresponding audio-frequency information, Frequency information, second identifier are used to indicate corresponding audio-frequency information as sensitive audio-frequency information.
Optionally, module 301 is obtained, comprising:
First division unit obtains length and is equal to for dividing according to the first preset length to target audio information Multiple audio fragments of first preset length;
Fundamental frequency acquiring unit, for obtaining more in audio fragment for each audio fragment in multiple audio fragments A fundamental frequency obtains in multiple fundamental frequencies greater than ratio shared by the fundamental frequency of the first predeterminated frequency;
Acquiring unit, for from multiple audio fragments, obtaining audio fragment of the ratio less than the first preset ratio, as Target audio segment.
Optionally, module 301 is obtained, comprising:
Second division unit, for being drawn according to the second preset length and third preset length to target audio information Point, obtain multiple audio fragments that length is equal to the second preset length, and the audio piece that any two in multiple audio fragments are adjacent Section includes the identical audio-frequency information of third preset length;Third preset length is less than the second preset length;
Second division unit is also used to for each audio fragment in multiple audio fragments, according to the 4th preset length Audio fragment is divided, multiple audio sub-segments that length is equal to the 4th preset length is obtained, obtains each audio sub-pieces The statistical value of the amplitude of section;4th preset length is less than the second preset length;
Acquiring unit, the audio fragment for being greater than default value for from multiple audio fragments, obtaining any statistical value, makees For target audio segment.
Optionally, module 301 is obtained, comprising:
Third division unit, for being drawn according to the second preset length and third preset length to target audio information Point, obtain multiple audio fragments that length is equal to the second preset length, and the audio piece that any two in multiple audio fragments are adjacent Section includes the identical audio-frequency information of third preset length;Third preset length is less than the second preset length;
Fundamental frequency acquiring unit, for obtaining more in audio fragment for each audio fragment in multiple audio fragments A fundamental frequency obtains in multiple fundamental frequencies greater than ratio shared by the fundamental frequency of the first predeterminated frequency;
Acquiring unit, it is greater than the second preset ratio and default less than third for from multiple audio fragments, obtaining ratio The audio fragment of ratio, as target audio segment.
Optionally, extraction module 302, comprising:
Filter unit obtains after high-pass filtering at least for carrying out high-pass filtering at least one target audio segment One audio fragment;
Division unit is obtained for dividing according to the 5th preset length to each audio fragment after high-pass filtering Length is equal to multiple audio sub-segments of the 5th preset length;
Extraction unit, for carrying out feature extraction to each audio sub-segments, the audio for obtaining each audio sub-segments is special Sign.
Optionally it is determined that module 303, at least one of following for executing:
In at least one target audio segment, the class indication of the target audio segment of continuous first preset quantity is the When two marks, determine that the class indication of target audio information is second identifier;
In at least one target audio segment, class indication is ratio shared by the target audio number of fragments of second identifier When example reaches four preset ratios, determine that the class indication of target audio information is second identifier.
Optionally, device further include:
Module 301 is obtained, is also used to obtain the class indication of multiple sample audio information and multiple sample audio information;
Extraction module 301 is also used to carry out high-pass filtering and feature extraction to multiple sample audio information, obtains multiple samples The corresponding multiple audio frequency characteristics of this audio-frequency information;
Training module, for carrying out model instruction according to multiple audio frequency characteristics and the corresponding class indication of each audio frequency characteristics Practice, obtains audio classification model.
Optionally, audio classification model includes the first audio classification model and the second audio classification model;
Training module is also used to according in multiple audio frequency characteristics, class indication be the first class indication audio frequency characteristics into Row model training obtains the first audio classification model;
Training module is also used to according in multiple audio frequency characteristics, class indication be the second class indication audio frequency characteristics into Row model training obtains the second audio classification model.
Device provided in an embodiment of the present invention, by obtaining at least one target audio segment in target audio information, High-pass filtering and feature extraction are carried out at least one target audio segment, can be filtered out low-frequency noise by high-pass filtering, Carry out feature extraction and obtain at least one corresponding audio frequency characteristics of at least one target audio segment, based on audio classification model and At least one audio frequency characteristics determines the class indication of at least one target audio segment, according at least one target audio segment Class indication, determine the class indication of target audio information, so as to by target audio information be determined as normal audio letter Breath or sensitive audio-frequency information can be by targets due to carrying out high-pass filtering before the class indication for determining target audio information The low-frequency noise of audio-frequency information filters out, therefore is not in the case where low-frequency noise is determined as sensitive audio-frequency information, to improve The accuracy of audio classification.
Also, obtain target audio information at least one target audio segment when, to the target audio information into Row divides, so that multiple audio fragments after being divided sieve multiple audio fragments by the way that preset condition is arranged Choosing will meet the audio fragment of preset condition as target audio segment, by the above-mentioned means, other audio fragments can be reduced Interference, reduce misclassification the case where, improve the accuracy of audio classification.
All the above alternatives can form alternative embodiment of the invention using any combination, herein no longer It repeats one by one.
It should be understood that audio classification device provided by the above embodiment is when classifying to audio-frequency information, only with The division progress of above-mentioned each functional module can according to need and for example, in practical application by above-mentioned function distribution by not Same functional module is completed, i.e., the internal structure of sorter is divided into different functional modules, described above to complete All or part of function.In addition, audio classification device provided by the above embodiment and audio frequency classification method embodiment belong to together One design, specific implementation process are detailed in embodiment of the method, and which is not described herein again.
Fig. 4 is a kind of structural schematic diagram of server provided in an embodiment of the present invention, which can be because of configuration or property Energy is different and generates bigger difference, may include one or more processors (central processing Units, CPU) 401 and one or more memory 402, wherein at least one finger is stored in the memory 402 It enables, at least one instruction is loaded by the processor 401 and executed the side to realize above-mentioned each embodiment of the method offer Method.Certainly, which can also have the components such as wired or wireless network interface, keyboard and input/output interface, so as to Input and output are carried out, which can also include other for realizing the component of functions of the equipments, and this will not be repeated here.
Server 400 can be used for executing step performed by sorter in above-mentioned audio frequency classification method.
Fig. 5 is a kind of structural schematic diagram of terminal provided in an embodiment of the present invention.The terminal 500 can be Portable movable Terminal, such as: smart phone, tablet computer, MP3 player (Moving Picture Experts Group Audio Layer III, dynamic image expert's compression standard audio level 3), MP4 (Moving Picture Experts Group Audio Layer IV, dynamic image expert's compression standard audio level 4) it player, laptop, desktop computer, wears Formula equipment or any other intelligent terminal.Terminal 500 be also possible to referred to as user equipment, portable terminal, laptop terminal, Other titles such as terminal console.
In general, terminal 500 includes: processor 501 and memory 502.
Processor 501 may include one or more processing cores, such as 4 core processors, 8 core processors etc..Place Reason device 501 can use DSP (Digital Signal Processing, Digital Signal Processing), FPGA (Field- Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array, may be programmed Logic array) at least one of example, in hardware realize.Processor 501 also may include primary processor and coprocessor, master Processor is the processor for being handled data in the awake state, also referred to as CPU (Central Processing Unit, central processing unit);Coprocessor is the low power processor for being handled data in the standby state.? In some embodiments, processor 501 can be integrated with GPU (Graphics Processing Unit, the interaction of image procossing Device), GPU is used to be responsible for the rendering and drafting of content to be shown needed for display screen.In some embodiments, processor 501 may be used also To include AI (Artificial Intelligence, artificial intelligence) processor, the AI processor is for handling related engineering The calculating operation of habit.
Memory 502 may include one or more computer readable storage mediums, which can To be non-transient.Memory 502 may also include high-speed random access memory and nonvolatile memory, such as one Or multiple disk storage equipments, flash memory device.In some embodiments, the non-transient computer in memory 502 can Storage medium is read for storing at least one instruction, at least one instruction by processor 501 for being had to realize this Shen Please in embodiment of the method provide predicting mode selecting method.
In some embodiments, terminal 500 is also optional includes: peripheral device interface 503 and at least one peripheral equipment. It can be connected by bus or signal wire between processor 501, memory 502 and peripheral device interface 503.Each peripheral equipment It can be connected by bus, signal wire or circuit board with peripheral device interface 503.Specifically, peripheral equipment includes: radio circuit 504, at least one of touch display screen 505, camera 506, voicefrequency circuit 507, positioning component 508 and power supply 509.
Peripheral device interface 503 can be used for I/O (Input/Output, input/output) is relevant outside at least one Peripheral equipment is connected to processor 501 and memory 502.In some embodiments, processor 501, memory 502 and peripheral equipment Interface 503 is integrated on same chip or circuit board;In some other embodiments, processor 501, memory 502 and outer Any one or two in peripheral equipment interface 503 can realize on individual chip or circuit board, the present embodiment to this not It is limited.
Radio circuit 504 is for receiving and emitting RF (Radio Frequency, radio frequency) signal, also referred to as electromagnetic signal.It penetrates Frequency circuit 504 is communicated by electromagnetic signal with communication network and other communication equipments.Radio circuit 504 turns electric signal It is changed to electromagnetic signal to be sent, alternatively, the electromagnetic signal received is converted to electric signal.Optionally, radio circuit 504 wraps It includes: antenna system, RF transceiver, one or more amplifiers, tuner, oscillator, digital signal processor, codec chip Group, user identity module card etc..Radio circuit 504 can be carried out by least one wireless communication protocol with other terminals Communication.The wireless communication protocol includes but is not limited to: Metropolitan Area Network (MAN), each third generation mobile communication network (2G, 3G, 4G and 8G), wireless office Domain net and/or WiFi (Wireless Fidelity, Wireless Fidelity) network.In some embodiments, radio circuit 504 may be used also To include the related circuit of NFC (Near Field Communication, wireless near field communication), the application is not subject to this It limits.
Display screen 505 is for showing UI (User Interface, user interface).The UI may include figure, text, figure Mark, video and its their any combination.When display screen 505 is touch display screen, display screen 505 also there is acquisition to show The ability of the touch signal on the surface or surface of screen 505.The touch signal can be used as control signal and be input to processor 501 are handled.At this point, display screen 505 can be also used for providing virtual push button and/or dummy keyboard, also referred to as soft button and/or Soft keyboard.In some embodiments, display screen 505 can be one, and the front panel of terminal 500 is arranged;In other embodiments In, display screen 505 can be at least two, be separately positioned on the different surfaces of terminal 500 or in foldover design;In still other reality It applies in example, display screen 505 can be flexible display screen, be arranged on the curved surface of terminal 500 or on fold plane.Even, it shows Display screen 505 can also be arranged to non-rectangle irregular figure, namely abnormity screen.Display screen 505 can use LCD (Liquid Crystal Display, liquid crystal display), OLED (Organic Light-Emitting Diode, Organic Light Emitting Diode) Etc. materials preparation.
CCD camera assembly 506 is for acquiring image or video.Optionally, CCD camera assembly 506 include front camera and Rear camera.In general, the front panel of terminal is arranged in front camera, the back side of terminal is arranged in rear camera.One In a little embodiments, rear camera at least two is main camera, depth of field camera, wide-angle camera, focal length camera shooting respectively Any one in head, to realize that main camera and the fusion of depth of field camera realize background blurring function, main camera and wide-angle Camera fusion realizes that pan-shot and VR (Virtual Reality, virtual reality) shooting function or other fusions are clapped Camera shooting function.In some embodiments, CCD camera assembly 506 can also include flash lamp.Flash lamp can be monochromatic warm flash lamp, It is also possible to double-colored temperature flash lamp.Double-colored temperature flash lamp refers to the combination of warm light flash lamp and cold light flash lamp, can be used for not With the light compensation under colour temperature.
Voicefrequency circuit 507 may include microphone and loudspeaker.Microphone is used to acquire the sound wave of user and environment, and will Sound wave, which is converted to electric signal and is input to processor 501, to be handled, or is input to radio circuit 504 to realize voice communication. For stereo acquisition or the purpose of noise reduction, microphone can be separately positioned on the different parts of terminal 500 to be multiple.Mike Wind can also be array microphone or omnidirectional's acquisition type microphone.Loudspeaker is then used to that processor 501 or radio circuit will to be come from 504 electric signal is converted to sound wave.Loudspeaker can be traditional wafer speaker, be also possible to piezoelectric ceramic loudspeaker.When When loudspeaker is piezoelectric ceramic loudspeaker, the audible sound wave of the mankind can be not only converted electrical signals to, it can also be by telecommunications Number the sound wave that the mankind do not hear is converted to carry out the purposes such as ranging.In some embodiments, voicefrequency circuit 507 can also include Earphone jack.
Positioning component 508 is used for the current geographic position of positioning terminal 500, to realize navigation or LBS (Location Based Service, location based service).Positioning component 508 can be the GPS (Global based on the U.S. Positioning System, global positioning system), the dipper system of China, Russia Gray receive this system or European Union The positioning component of Galileo system.
Power supply 509 is used to be powered for the various components in terminal 500.Power supply 509 can be alternating current, direct current, Disposable battery or rechargeable battery.When power supply 509 includes rechargeable battery, which can support wired charging Or wireless charging.The rechargeable battery can be also used for supporting fast charge technology.
In some embodiments, terminal 500 further includes having one or more sensors 510.The one or more sensors 510 include but is not limited to: acceleration transducer 511, gyro sensor 512, pressure sensor 513, fingerprint sensor 514, Optical sensor 515 and proximity sensor 516.
The acceleration that acceleration transducer 511 can detecte in three reference axis of the coordinate system established with terminal 500 is big It is small.For example, acceleration transducer 511 can be used for detecting component of the acceleration of gravity in three reference axis.Processor 501 can With the acceleration of gravity signal acquired according to acceleration transducer 511, touch display screen 505 is controlled with transverse views or longitudinal view Figure carries out the display of user interface.Acceleration transducer 511 can be also used for the acquisition of game or the exercise data of user.
Gyro sensor 512 can detecte body direction and the rotational angle of terminal 500, and gyro sensor 512 can To cooperate with acquisition user to act the 3D of terminal 500 with acceleration transducer 511.Processor 501 is according to gyro sensor 512 Following function may be implemented in the data of acquisition: when action induction (for example changing UI according to the tilt operation of user), shooting Image stabilization, game control and inertial navigation.
The lower layer of side frame and/or touch display screen 505 in terminal 500 can be set in pressure sensor 513.Work as pressure When the side frame of terminal 500 is arranged in sensor 513, user can detecte to the gripping signal of terminal 500, by processor 501 Right-hand man's identification or prompt operation are carried out according to the gripping signal that pressure sensor 513 acquires.When the setting of pressure sensor 513 exists When the lower layer of touch display screen 505, the pressure operation of touch display screen 505 is realized to UI circle according to user by processor 501 Operability control on face is controlled.Operability control includes button control, scroll bar control, icon control, menu At least one of control.
Fingerprint sensor 514 is used to acquire the fingerprint of user, collected according to fingerprint sensor 514 by processor 501 The identity of fingerprint recognition user, alternatively, by fingerprint sensor 514 according to the identity of collected fingerprint recognition user.It is identifying When the identity of user is trusted identity out, authorize the user that there is relevant sensitive operation, the sensitive operation packet by processor 501 Include solution lock screen, check encryption information, downloading software, payment and change setting etc..Terminal can be set in fingerprint sensor 514 500 front, the back side or side.When being provided with physical button or manufacturer Logo in terminal 500, fingerprint sensor 514 can be with It is integrated with physical button or manufacturer's mark.
Optical sensor 515 is for acquiring ambient light intensity.In one embodiment, processor 501 can be according to optics The ambient light intensity that sensor 515 acquires controls the display brightness of touch display screen 505.Specifically, when ambient light intensity is higher When, the display brightness of touch display screen 505 is turned up;When ambient light intensity is lower, the display for turning down touch display screen 505 is bright Degree.In another embodiment, the ambient light intensity that processor 501 can also be acquired according to optical sensor 515, dynamic adjust The acquisition parameters of CCD camera assembly 506.
Proximity sensor 516, also referred to as range sensor are generally arranged at the front panel of terminal 500.Proximity sensor 516 For acquiring the distance between the front of user Yu terminal 500.In one embodiment, when proximity sensor 516 detects use When family and the distance between the front of terminal 500 gradually become smaller, touch display screen 505 is controlled from bright screen state by processor 501 It is switched to breath screen state;When proximity sensor 516 detects user and the distance between the front of terminal 500 becomes larger, Touch display screen 505 is controlled by processor 501 and is switched to bright screen state from breath screen state.
It will be understood by those skilled in the art that the restriction of the not structure paired terminal 500 of structure shown in Fig. 5, can wrap It includes than illustrating more or fewer components, perhaps combine certain components or is arranged using different components.
The embodiment of the invention also provides a kind of audio classification devices, which includes processor and storage Device is stored at least one instruction in memory, which is loaded by processor and executed the audio to realize above-described embodiment Performed operation in classification method.
The embodiment of the invention also provides a kind of computer readable storage medium, stored in the computer readable storage medium There is at least one instruction, which is loaded by processor and executed performed in the audio frequency classification method to realize above-described embodiment Operation.
Those of ordinary skill in the art will appreciate that realizing that all or part of the steps of above-described embodiment can pass through hardware It completes, relevant hardware can also be instructed to complete by program, the program can store in a kind of computer-readable In storage medium, storage medium mentioned above can be read-only memory, disk or CD etc..
The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all in spirit of the invention and Within principle, any modification, equivalent replacement, improvement and so on be should all be included in the protection scope of the present invention.

Claims (18)

1. a kind of audio frequency classification method, which is characterized in that the described method includes:
Obtain at least one target audio segment in target audio information;
High-pass filtering and feature extraction are carried out at least one described target audio segment, obtain at least one described target audio At least one corresponding audio frequency characteristics of segment;
Based on audio classification model and at least one described audio frequency characteristics, the classification of at least one target audio segment is determined Mark, according to the class indication of at least one target audio segment, determines the class indication of the target audio information;
The class indication includes first identifier and second identifier, and the first identifier is used to indicate corresponding audio-frequency information and is positive Normal audio-frequency information, the second identifier are used to indicate corresponding audio-frequency information as sensitive audio-frequency information.
2. the method according to claim 1, wherein described at least one target obtained in target audio information Audio fragment, comprising:
The target audio information is divided according to the first preset length, obtains length equal to first preset length Multiple audio fragments;
For each audio fragment in the multiple audio fragment, multiple fundamental frequencies in the audio fragment are obtained, obtain institute It states in multiple fundamental frequencies greater than ratio shared by the fundamental frequency of the first predeterminated frequency;
From the multiple audio fragment, audio fragment of the ratio less than the first preset ratio is obtained, as target audio Segment.
3. the method according to claim 1, wherein described at least one target obtained in target audio information Audio fragment, comprising:
The target audio information is divided according to the second preset length and third preset length, obtains length equal to described Multiple audio fragments of second preset length, and the adjacent audio fragment of any two in the multiple audio fragment includes described The identical audio-frequency information of three preset lengths;The third preset length is less than second preset length;
For each audio fragment in the multiple audio fragment, the audio fragment is drawn according to the 4th preset length Point, multiple audio sub-segments that length is equal to the 4th preset length are obtained, the system of the amplitude of each audio sub-segments is obtained Evaluation;4th preset length is less than second preset length;
From the multiple audio fragment, the audio fragment that any statistical value is greater than default value is obtained, as target audio piece Section.
4. the method according to claim 1, wherein described at least one target obtained in target audio information Audio fragment, comprising:
The target audio information is divided according to the second preset length and third preset length, obtains length equal to described Multiple audio fragments of second preset length, and the adjacent audio fragment of any two in the multiple audio fragment includes described The identical audio-frequency information of three preset lengths;The third preset length is less than second preset length;
For each audio fragment in the multiple audio fragment, multiple fundamental frequencies in the audio fragment are obtained, obtain institute It states in multiple fundamental frequencies greater than ratio shared by the fundamental frequency of the first predeterminated frequency;
From the multiple audio fragment, the ratio is obtained greater than the second preset ratio and is less than the audio of third preset ratio Segment, as target audio segment.
5. the method according to claim 1, wherein described carry out height at least one described target audio segment Pass filter and feature extraction obtain at least one corresponding audio frequency characteristics of at least one described target audio segment, comprising:
High-pass filtering is carried out at least one described target audio segment, at least one audio fragment after obtaining high-pass filtering;
Each audio fragment after high-pass filtering is divided according to the 5th preset length, it is pre- equal to the described 5th to obtain length If multiple audio sub-segments of length;
Feature extraction is carried out to each audio sub-segments, obtains the audio frequency characteristics of each audio sub-segments.
6. method according to claim 1-5, which is characterized in that described at least one target audio according to The class indication of segment determines the class indication of the target audio information, includes at least one of the following:
In at least one described target audio segment, the class indication of the target audio segment of continuous first preset quantity is institute When stating second identifier, determine that the class indication of the target audio information is the second identifier;
In at least one described target audio segment, class indication is shared by the target audio number of fragments of the second identifier Ratio when reaching four preset ratios, determine that the class indication of the target audio information is the second identifier.
7. method according to claim 1-5, which is characterized in that the method also includes:
Obtain the class indication of multiple sample audio information and the multiple sample audio information;
High-pass filtering and feature extraction are carried out to the multiple sample audio information, it is corresponding to obtain the multiple sample audio information Multiple audio frequency characteristics;
Model training is carried out according to the multiple audio frequency characteristics and the corresponding class indication of each audio frequency characteristics, obtains the sound Frequency division class model.
8. the method according to the description of claim 7 is characterized in that the audio classification model includes the first audio classification model It is described to be carried out according to the multiple audio frequency characteristics and the corresponding class indication of each audio frequency characteristics with the second audio classification model Model training obtains the audio classification model, comprising:
According in the multiple audio frequency characteristics, class indication is that the audio frequency characteristics of first class indication carry out model training, Obtain the first audio classification model;
According in the multiple audio frequency characteristics, class indication is that the audio frequency characteristics of second class indication carry out model training, Obtain the second audio classification model.
9. a kind of audio classification device, which is characterized in that described device includes:
Module is obtained, for obtaining at least one target audio segment in target audio information;
Extraction module, for carrying out high-pass filtering and feature extraction at least one described target audio segment, obtain it is described extremely Few at least one corresponding audio frequency characteristics of a target audio segment;
Determining module determines at least one described target for being based on audio classification model and at least one described audio frequency characteristics The class indication of audio fragment determines the target audio letter according to the class indication of at least one target audio segment The class indication of breath;
The class indication includes first identifier and second identifier, and the first identifier is used to indicate corresponding audio-frequency information and is positive Normal audio-frequency information, the second identifier are used to indicate corresponding audio-frequency information as sensitive audio-frequency information.
10. device according to claim 9, which is characterized in that the acquisition module, comprising:
First division unit obtains length and is equal to for dividing according to the first preset length to the target audio information Multiple audio fragments of first preset length;
Fundamental frequency acquiring unit, for obtaining in the audio fragment for each audio fragment in the multiple audio fragment Multiple fundamental frequencies, obtain in the multiple fundamental frequency greater than the first predeterminated frequency fundamental frequency shared by ratio;
Acquiring unit, for obtaining audio fragment of the ratio less than the first preset ratio from the multiple audio fragment, As target audio segment.
11. device according to claim 9, which is characterized in that the acquisition module, comprising:
Second division unit, for being drawn according to the second preset length and third preset length to the target audio information Point, multiple audio fragments that length is equal to second preset length are obtained, and any two in the multiple audio fragment are adjacent Audio fragment include the third preset length identical audio-frequency information;It is default that the third preset length is less than described second Length;
Second division unit is also used to for each audio fragment in the multiple audio fragment, default according to the 4th Length divides the audio fragment, obtains multiple audio sub-segments that length is equal to the 4th preset length, obtains The statistical value of the amplitude of each audio sub-segments;4th preset length is less than second preset length;
Acquiring unit, the audio fragment for being greater than default value for from the multiple audio fragment, obtaining any statistical value, makees For target audio segment.
12. device according to claim 9, which is characterized in that the acquisition module, comprising:
Third division unit, for being drawn according to the second preset length and third preset length to the target audio information Point, multiple audio fragments that length is equal to second preset length are obtained, and any two in the multiple audio fragment are adjacent Audio fragment include the third preset length identical audio-frequency information;It is default that the third preset length is less than described second Length;
Fundamental frequency acquiring unit, for obtaining in the audio fragment for each audio fragment in the multiple audio fragment Multiple fundamental frequencies, obtain in the multiple fundamental frequency greater than the first predeterminated frequency fundamental frequency shared by ratio;
Acquiring unit, for from the multiple audio fragment, obtaining the ratio greater than the second preset ratio and being less than third The audio fragment of preset ratio, as target audio segment.
13. device according to claim 9, which is characterized in that the extraction module, comprising:
Filter unit obtains after high-pass filtering at least for carrying out high-pass filtering at least one described target audio segment One audio fragment;
Division unit obtains length for dividing according to the 5th preset length to each audio fragment after high-pass filtering Equal to multiple audio sub-segments of the 5th preset length;
Extraction unit obtains the audio frequency characteristics of each audio sub-segments for carrying out feature extraction to each audio sub-segments.
14. according to the described in any item devices of claim 9-13, which is characterized in that the determining module, it is following for executing At least one of:
In at least one described target audio segment, the class indication of the target audio segment of continuous first preset quantity is institute When stating second identifier, determine that the class indication of the target audio information is the second identifier;
In at least one described target audio segment, class indication is shared by the target audio number of fragments of the second identifier Ratio when reaching four preset ratios, determine that the class indication of the target audio information is the second identifier.
15. according to the described in any item devices of claim 9-13, which is characterized in that described device further include:
The acquisition module is also used to obtain the class indication of multiple sample audio information and the multiple sample audio information;
The extraction module is also used to carry out high-pass filtering and feature extraction to the multiple sample audio information, obtains described The corresponding multiple audio frequency characteristics of multiple sample audio information;
Training module, for carrying out model instruction according to the multiple audio frequency characteristics and the corresponding class indication of each audio frequency characteristics Practice, obtains the audio classification model.
16. device according to claim 15, which is characterized in that the audio classification model includes the first audio classification mould Type and the second audio classification model;
The training module is also used to according in the multiple audio frequency characteristics, and class indication is the sound of first class indication Frequency feature carries out model training, obtains the first audio classification model;
The training module is also used to according in the multiple audio frequency characteristics, and class indication is the sound of second class indication Frequency feature carries out model training, obtains the second audio classification model.
17. a kind of audio classification device, which is characterized in that described device includes processor and memory, is deposited in the memory At least one instruction is contained, described instruction is loaded by the processor and executed to realize as any right of claim 1 to 8 is wanted Ask operation performed in the audio frequency classification method.
18. a kind of computer readable storage medium, which is characterized in that be stored at least one in the computer readable storage medium Item instruction, described instruction is as processor loads and executes to realize the audio point as described in claim 1 to 8 any claim Performed operation in class method.
CN201811632676.7A 2018-12-29 2018-12-29 Audio classification method, device and storage medium Active CN109671425B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811632676.7A CN109671425B (en) 2018-12-29 2018-12-29 Audio classification method, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811632676.7A CN109671425B (en) 2018-12-29 2018-12-29 Audio classification method, device and storage medium

Publications (2)

Publication Number Publication Date
CN109671425A true CN109671425A (en) 2019-04-23
CN109671425B CN109671425B (en) 2021-04-06

Family

ID=66146491

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811632676.7A Active CN109671425B (en) 2018-12-29 2018-12-29 Audio classification method, device and storage medium

Country Status (1)

Country Link
CN (1) CN109671425B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110933235A (en) * 2019-11-06 2020-03-27 杭州哲信信息技术有限公司 Noise removing method in intelligent calling system based on machine learning
CN112667844A (en) * 2020-12-23 2021-04-16 腾讯音乐娱乐科技(深圳)有限公司 Method, device, equipment and storage medium for retrieving audio

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102543079A (en) * 2011-12-21 2012-07-04 南京大学 Method and equipment for classifying audio signals in real time
CN104347068A (en) * 2013-08-08 2015-02-11 索尼公司 Audio signal processing device, audio signal processing method and monitoring system
CN104538041A (en) * 2014-12-11 2015-04-22 深圳市智美达科技有限公司 Method and system for detecting abnormal sounds
CN105719642A (en) * 2016-02-29 2016-06-29 黄博 Continuous and long voice recognition method and system and hardware equipment
CN107452401A (en) * 2017-05-27 2017-12-08 北京字节跳动网络技术有限公司 A kind of advertising pronunciation recognition methods and device
CN108538311A (en) * 2018-04-13 2018-09-14 腾讯音乐娱乐科技(深圳)有限公司 Audio frequency classification method, device and computer readable storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102543079A (en) * 2011-12-21 2012-07-04 南京大学 Method and equipment for classifying audio signals in real time
CN104347068A (en) * 2013-08-08 2015-02-11 索尼公司 Audio signal processing device, audio signal processing method and monitoring system
CN104538041A (en) * 2014-12-11 2015-04-22 深圳市智美达科技有限公司 Method and system for detecting abnormal sounds
CN105719642A (en) * 2016-02-29 2016-06-29 黄博 Continuous and long voice recognition method and system and hardware equipment
CN107452401A (en) * 2017-05-27 2017-12-08 北京字节跳动网络技术有限公司 A kind of advertising pronunciation recognition methods and device
CN108538311A (en) * 2018-04-13 2018-09-14 腾讯音乐娱乐科技(深圳)有限公司 Audio frequency classification method, device and computer readable storage medium

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
FENG RONG, ETC: "Audio classification method based on machine learning", <2016 INTERNATIONAL CONFERENCE INTELLIGENT TRANSPORTATION, BIG DATA & SMART CITY> *
LIU JIQING, ETC: "SPORTS AUDIO CLASSIFICATION BASED OF MFCC AND GMM", <PROCEEDINGS OF IC-BNMT2009> *
江超,封化民,杨兴华: "视频中音频分类与语音识别分析研究", <2010APCID> *
贾强: "《硕士学位论文》", 30 March 2016, 复旦大学 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110933235A (en) * 2019-11-06 2020-03-27 杭州哲信信息技术有限公司 Noise removing method in intelligent calling system based on machine learning
CN110933235B (en) * 2019-11-06 2021-07-27 杭州哲信信息技术有限公司 Noise identification method in intelligent calling system based on machine learning
CN112667844A (en) * 2020-12-23 2021-04-16 腾讯音乐娱乐科技(深圳)有限公司 Method, device, equipment and storage medium for retrieving audio

Also Published As

Publication number Publication date
CN109671425B (en) 2021-04-06

Similar Documents

Publication Publication Date Title
CN109740068B (en) Media data recommendation method, device and storage medium
CN108829881B (en) Video title generation method and device
CN110650379B (en) Video abstract generation method and device, electronic equipment and storage medium
CN110277106B (en) Audio quality determination method, device, equipment and storage medium
CN109379643A (en) Image synthesizing method, device, terminal and storage medium
CN109640125B (en) Video content processing method, device, server and storage medium
CN110083791A (en) Target group detection method, device, computer equipment and storage medium
CN109784351B (en) Behavior data classification method and device and classification model training method and device
CN110471858A (en) Applied program testing method, device and storage medium
CN110222789A (en) Image-recognizing method and storage medium
CN110572711A (en) Video cover generation method and device, computer equipment and storage medium
CN109815150A (en) Application testing method, device, electronic equipment and storage medium
CN108320756B (en) Method and device for detecting whether audio is pure music audio
CN110956971A (en) Audio processing method, device, terminal and storage medium
CN110163066A (en) Multi-medium data recommended method, device and storage medium
CN112667844A (en) Method, device, equipment and storage medium for retrieving audio
CN110853124B (en) Method, device, electronic equipment and medium for generating GIF dynamic diagram
CN109671425A (en) Audio frequency classification method, device and storage medium
CN114741559A (en) Method, apparatus and storage medium for determining video cover
CN110675473A (en) Method, device, electronic equipment and medium for generating GIF dynamic graph
CN108717849A (en) The method, apparatus and storage medium of splicing multimedia data
CN113343709B (en) Method for training intention recognition model, method, device and equipment for intention recognition
CN109036463A (en) Obtain the method, apparatus and storage medium of the difficulty information of song
CN109117895A (en) Data clustering method, device and storage medium
CN111599417B (en) Training data acquisition method and device of solubility prediction model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant