CN109087634A - A kind of sound quality setting method based on audio classification - Google Patents
A kind of sound quality setting method based on audio classification Download PDFInfo
- Publication number
- CN109087634A CN109087634A CN201811278861.0A CN201811278861A CN109087634A CN 109087634 A CN109087634 A CN 109087634A CN 201811278861 A CN201811278861 A CN 201811278861A CN 109087634 A CN109087634 A CN 109087634A
- Authority
- CN
- China
- Prior art keywords
- audio
- frame
- voice data
- mel
- classification
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/18—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/45—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of analysis window
Abstract
The sound quality setting method based on audio classification that the invention discloses a kind of extracts the feature of one section of voice data first, generates characteristic image, is then classified using convolutional neural networks classification method to characteristic image.Finally according to different classification audios, corresponding adjustment is made in Doby audio, equalization setting.By carrying out identification and corresponding setting to different audio scenes automatically, keeps Android intelligent television more intelligent, promote the usage experience of user, experience Android intelligent television bring enjoyment.
Description
Technical field
The invention belongs to voice technology fields, and in particular to a kind of sound quality setting method based on audio classification.
Background technique
With artificial intelligence technology greatly developing in all trades and professions, the various aspects of human lives, TV have been entered
Industry is no exception.Using artificial intelligence technology, TV is made to have intelligence, user demand can be better met, improves user
Usage experience.
The multi-medium datas such as video, audio are all information medium forms important in television set, and wherein audio-frequency information occupies
Critically important status.How audio-frequency information is handled, fabric analysis and using being one in field of information processing important
Project, and audio classification is one of key technology therein.The audio-frequency information of different scenes has reapective features, such as news
Class, modulation in tone have the characteristics of certain word speed is this kind of audio scene;For example music class, high-low sound frequency have both, and have one
Determine the characteristics of timing is this kind of audio scene.For different audio scenes, different audio moulds can be set on TV
Formula preferably adapts to different scenes.
The artificial intelligence technology of major part product is all to operate in the cloud server end of internet at present, because carrying
The hardware condition of android system itself limits, and is unable to run large-scale calculating, can not occupy too many resource, such as CPU
Occupancy.
Summary of the invention
The sound quality setting method based on audio classification that the purpose of the present invention is to provide a kind of has to operating in Arm plate
On audio scene classification technology the advantages of being designed, optimize and realizing.
Above-mentioned purpose of the invention has the technical scheme that
A kind of sound quality setting method based on audio classification, including audio feature extraction module, audio classification module and sound
Frequency setup module, further comprising the steps of:
S1, audio feature extraction;
S11, preemphasis promote the high frequency section in voice data, make letter by the voice data of 9S by high-pass filter
Number frequency spectrum become flat;
S12, framing, sample rate 22.05KHz, it is a frame that 822 sampled points, which are arranged, i.e. the time of a frame is 40ms, 9S
Voice data be divided into 225 frames;
S13, adding window increase the continuity of left and right ends by each frame multiplied by Hamming window;
S14, fast Fourier transform carry out Fast Fourier Transform (FFT) to every frame signal after adding window, obtain the frequency spectrum of each frame,
Again to frequency spectrum modulus square, the power spectrum of voice signal is finally obtained;
Linear natural frequency spectrum is converted to by the power spectrum of signal by Mel filter and embodies people by S15, Mel filtering
The Mel frequency spectrum of class auditory properties only takes preceding 224 features of each frame signal;
S16, logarithm is taken, logarithm is taken to Mel frequency spectrum, the spectrogram of 225*224 size can be obtained, is i.e. abscissa is
Frame, ordinate are Mel feature, in actually calculating, can give up a frame data, i.e., be done using the spectrogram of 224*224 size
Classification, but the value of spectrogram this moment is full in the range of image 0~255, in order to which the value of spectrogram is mapped to figure
As in the range of 0~255 value, the present invention has done Linear Mapping below and calculated:
F (x)=1.5 × (10x+80) (formula 1)
By the calculating of formula 1, the value of Mel spectrogram is may map to substantially in the value range of image 0~255;
S2, audio classification;
S21, audio classification module using deep learning CNN convolutional neural networks --- MobileNet sorter network come
Carry out the classification of voice data;
S3, sound quality setting;
S31, the voice data of music class is promoted by the audio frequency optimization apparatus function decaying low frequency part of Doby audio
Voice corresponding frequency band uses the clear function intensified vocal sections' effect of Doby audio;
S32, the voice data to news category delineate substantially sound style curve by Doby audio intelligence EQ function, lead to
It crosses the cooperation of the functions such as Doby audio supper bass and surround sound and adjusts audio;
S33, the voice data to other classes, default standard mode parameter.
In conclusion the invention has the following advantages:
(1) by carrying out identification and corresponding setting to different audio scenes automatically, make Android intelligent television more intelligence
Can, the usage experience of user is promoted, Android intelligent television bring enjoyment is experienced.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this
Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with
It obtains other drawings based on these drawings.
Fig. 1 is the flow chart of the embodiment of the present invention.
Specific embodiment
In the following detailed description, many details are proposed, in order to complete understanding of the present invention.But
It will be apparent to those skilled in the art that the present invention can not need some details in these details
In the case of implement.Below to the description of embodiment just for the sake of provided by showing example of the invention to it is of the invention more
Understand well.
Below in conjunction with attached drawing, the technical solution of the embodiment of the present invention is described.
Embodiment:
As shown in Figure 1, a kind of sound quality setting method based on audio classification, including audio feature extraction module, audio point
Generic module and audio setting module, further comprising the steps of:
S1, audio feature extraction;
S11, preemphasis promote the high frequency section in voice data, make letter by the voice data of 9S by high-pass filter
Number frequency spectrum become flat;
S12, framing, sample rate 22.05KHz, it is a frame that 822 sampled points, which are arranged, i.e. the time of a frame is 40ms, 9S
Voice data be divided into 225 frames;
S13, adding window increase the continuity of left and right ends by each frame multiplied by Hamming window;
S14, fast Fourier transform carry out Fast Fourier Transform (FFT) to every frame signal after adding window, obtain the frequency spectrum of each frame,
Again to frequency spectrum modulus square, the power spectrum of voice signal is finally obtained;
Linear natural frequency spectrum is converted to by the power spectrum of signal by Mel filter and embodies people by S15, Mel filtering
The Mel frequency spectrum of class auditory properties only takes preceding 224 features of each frame signal;
S16, logarithm is taken, logarithm is taken to Mel frequency spectrum, the spectrogram of 225*224 size can be obtained, is i.e. abscissa is
Frame, ordinate are Mel feature, in actually calculating, can give up a frame data, i.e., be done using the spectrogram of 224*224 size
Classification, but the value of spectrogram this moment is full in the range of image 0~255, in order to which the value of spectrogram is mapped to figure
As in the range of 0~255 value, the present invention has done Linear Mapping below and calculated:
F (x)=1.5 × (10x+80) (formula 1)
By the calculating of formula 1, the value of Mel spectrogram is may map to substantially in the value range of image 0~255;
S2, sound quality setting;
S21, audio classification module using deep learning CNN convolutional neural networks --- MobileNet sorter network come
Carry out the classification of voice data;
S3, sound quality setting;
S31, the voice data of music class is promoted by the audio frequency optimization apparatus function decaying low frequency part of Doby audio
Voice corresponding frequency band uses the clear function intensified vocal sections' effect of Doby audio;
S32, the voice data to news category delineate substantially sound style curve by Doby audio intelligence EQ function, lead to
It crosses the cooperation of the functions such as Doby audio supper bass and surround sound and adjusts audio;
S33, the voice data to other classes, default standard mode parameter.
The present invention be directed to the audio scene classification technologies operated on Arm plate to be designed, optimize and realize, by certainly
It is dynamic that identification and corresponding setting are carried out to different audio scenes, keep Android intelligent television more intelligent, promote user uses body
It tests, experiences Android intelligent television bring enjoyment.
The above embodiments are merely illustrative of the technical solutions of the present invention, rather than limits the protection scope of invention.It is aobvious
So, described embodiment is only section Example of the present invention, rather than whole embodiments.Based on these embodiments, ability
Domain those of ordinary skill every other embodiment obtained without creative efforts, belongs to institute of the present invention
Scope of protection.
Although referring to above-described embodiment, invention is explained in detail, and those of ordinary skill in the art still can be with
In the absence of conflict, creative work is not made to be according to circumstances combined with each other the feature in various embodiments of the present invention, increase
It deletes or makees other adjustment, to obtain other technologies scheme different, that essence is without departing from design of the invention, these technical sides
Case similarly belongs to invention which is intended to be protected.
Claims (1)
1. a kind of sound quality setting method based on audio classification, which is characterized in that including audio feature extraction module, audio classification
Module and audio setting module, further comprising the steps of:
S1, audio feature extraction;
S11, preemphasis promote the high frequency section in voice data, make signal by the voice data of 9S by high-pass filter
Frequency spectrum becomes flat;
S12, framing, sample rate 22.05KHz, it is a frame that 822 sampled points, which are arranged, i.e. the time of a frame is 40ms, the language of 9S
Sound data are divided into 225 frames;
S13, adding window increase the continuity of left and right ends by each frame multiplied by Hamming window;
S14, fast Fourier transform carry out Fast Fourier Transform (FFT) to every frame signal after adding window, obtain the frequency spectrum of each frame, then right
Frequency spectrum modulus square finally obtains the power spectrum of voice signal;
Linear natural frequency spectrum is converted to the embodiment mankind and listened by S15, Mel filtering by the power spectrum of signal by Mel filter
The Mel frequency spectrum for feeling characteristic, only takes preceding 224 features of each frame signal;
S16, it taking logarithm, logarithm is taken to Mel frequency spectrum, the spectrogram of 225*224 size can be obtained, i.e. abscissa is frame,
Ordinate is Mel feature, in actually calculating, can give up a frame data, i.e., is done point using the spectrogram of 224*224 size
Class, but the value of spectrogram this moment is full in the range of image 0~255, in order to which the value of spectrogram is mapped to image 0
In the range of~255 values, the present invention has done Linear Mapping below and has calculated:
F (x)=1.5 × (10x+80) (formula 1)
By the calculating of formula 1, the value of Mel spectrogram is may map to substantially in the value range of image 0~255;
S2, audio classification;
S21, audio classification module using deep learning CNN convolutional neural networks --- MobileNet sorter network carries out
The classification of voice data;
S3, sound quality setting;
S31, voice is promoted by the audio frequency optimization apparatus function decaying low frequency part of Doby audio to the voice data of music class
Corresponding frequency band uses the clear function intensified vocal sections' effect of Doby audio;
S32, the voice data to news category delineate substantially sound style curve by Doby audio intelligence EQ function, pass through Du
Cooperate than functions such as audio supper bass and surround sounds and adjusts audio;
S33, the voice data to other classes, default standard mode parameter.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811278861.0A CN109087634A (en) | 2018-10-30 | 2018-10-30 | A kind of sound quality setting method based on audio classification |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811278861.0A CN109087634A (en) | 2018-10-30 | 2018-10-30 | A kind of sound quality setting method based on audio classification |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109087634A true CN109087634A (en) | 2018-12-25 |
Family
ID=64844448
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811278861.0A Pending CN109087634A (en) | 2018-10-30 | 2018-10-30 | A kind of sound quality setting method based on audio classification |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109087634A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111028852A (en) * | 2019-11-06 | 2020-04-17 | 杭州哲信信息技术有限公司 | Noise removing method in intelligent calling system based on CNN |
CN111274989A (en) * | 2020-02-11 | 2020-06-12 | 中国科学院上海微系统与信息技术研究所 | Deep learning-based field vehicle identification method |
CN111583890A (en) * | 2019-02-15 | 2020-08-25 | 阿里巴巴集团控股有限公司 | Audio classification method and device |
WO2021137551A1 (en) * | 2019-12-31 | 2021-07-08 | Samsung Electronics Co., Ltd. | Equalizer for equalization of music signals and methods for the same |
CN113257276A (en) * | 2021-05-07 | 2021-08-13 | 普联国际有限公司 | Audio scene detection method, device, equipment and storage medium |
CN113314148A (en) * | 2021-07-29 | 2021-08-27 | 中国科学院自动化研究所 | Light-weight neural network generated voice identification method and system based on original waveform |
WO2023078093A1 (en) * | 2021-11-03 | 2023-05-11 | 华为技术有限公司 | Audio playback method and system, and electronic device |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104811864A (en) * | 2015-04-20 | 2015-07-29 | 深圳市冠旭电子有限公司 | Method and system for self-adaptive adjustment of audio effect |
CN104819846A (en) * | 2015-04-10 | 2015-08-05 | 北京航空航天大学 | Rolling bearing sound signal fault diagnosis method based on short-time Fourier transform and sparse laminated automatic encoder |
CN105405448A (en) * | 2014-09-16 | 2016-03-16 | 科大讯飞股份有限公司 | Sound effect processing method and apparatus |
CN105895110A (en) * | 2016-06-30 | 2016-08-24 | 北京奇艺世纪科技有限公司 | Method and device for classifying audio files |
CN106600559A (en) * | 2016-12-21 | 2017-04-26 | 东方网力科技股份有限公司 | Fuzzy kernel obtaining and image de-blurring method and apparatus |
CN106775562A (en) * | 2016-12-09 | 2017-05-31 | 奇酷互联网络科技(深圳)有限公司 | The method and device of audio frequency parameter treatment |
CN107393554A (en) * | 2017-06-20 | 2017-11-24 | 武汉大学 | In a kind of sound scene classification merge class between standard deviation feature extracting method |
CN107910018A (en) * | 2017-10-30 | 2018-04-13 | 广州视源电子科技股份有限公司 | Sound effect treatment method and system, computer-readable storage medium and equipment |
CN108281152A (en) * | 2018-01-18 | 2018-07-13 | 腾讯音乐娱乐科技(深圳)有限公司 | Audio-frequency processing method, device and storage medium |
-
2018
- 2018-10-30 CN CN201811278861.0A patent/CN109087634A/en active Pending
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105405448A (en) * | 2014-09-16 | 2016-03-16 | 科大讯飞股份有限公司 | Sound effect processing method and apparatus |
CN104819846A (en) * | 2015-04-10 | 2015-08-05 | 北京航空航天大学 | Rolling bearing sound signal fault diagnosis method based on short-time Fourier transform and sparse laminated automatic encoder |
CN104811864A (en) * | 2015-04-20 | 2015-07-29 | 深圳市冠旭电子有限公司 | Method and system for self-adaptive adjustment of audio effect |
CN105895110A (en) * | 2016-06-30 | 2016-08-24 | 北京奇艺世纪科技有限公司 | Method and device for classifying audio files |
CN106775562A (en) * | 2016-12-09 | 2017-05-31 | 奇酷互联网络科技(深圳)有限公司 | The method and device of audio frequency parameter treatment |
CN106600559A (en) * | 2016-12-21 | 2017-04-26 | 东方网力科技股份有限公司 | Fuzzy kernel obtaining and image de-blurring method and apparatus |
CN107393554A (en) * | 2017-06-20 | 2017-11-24 | 武汉大学 | In a kind of sound scene classification merge class between standard deviation feature extracting method |
CN107910018A (en) * | 2017-10-30 | 2018-04-13 | 广州视源电子科技股份有限公司 | Sound effect treatment method and system, computer-readable storage medium and equipment |
CN108281152A (en) * | 2018-01-18 | 2018-07-13 | 腾讯音乐娱乐科技(深圳)有限公司 | Audio-frequency processing method, device and storage medium |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111583890A (en) * | 2019-02-15 | 2020-08-25 | 阿里巴巴集团控股有限公司 | Audio classification method and device |
CN111028852A (en) * | 2019-11-06 | 2020-04-17 | 杭州哲信信息技术有限公司 | Noise removing method in intelligent calling system based on CNN |
WO2021137551A1 (en) * | 2019-12-31 | 2021-07-08 | Samsung Electronics Co., Ltd. | Equalizer for equalization of music signals and methods for the same |
US11515853B2 (en) | 2019-12-31 | 2022-11-29 | Samsung Electronics Co., Ltd. | Equalizer for equalization of music signals and methods for the same |
CN111274989A (en) * | 2020-02-11 | 2020-06-12 | 中国科学院上海微系统与信息技术研究所 | Deep learning-based field vehicle identification method |
CN113257276A (en) * | 2021-05-07 | 2021-08-13 | 普联国际有限公司 | Audio scene detection method, device, equipment and storage medium |
CN113257276B (en) * | 2021-05-07 | 2024-03-29 | 普联国际有限公司 | Audio scene detection method, device, equipment and storage medium |
CN113314148A (en) * | 2021-07-29 | 2021-08-27 | 中国科学院自动化研究所 | Light-weight neural network generated voice identification method and system based on original waveform |
WO2023078093A1 (en) * | 2021-11-03 | 2023-05-11 | 华为技术有限公司 | Audio playback method and system, and electronic device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109087634A (en) | A kind of sound quality setting method based on audio classification | |
CN105845127B (en) | Audio recognition method and its system | |
CN107293286B (en) | Voice sample collection method based on network dubbing game | |
CN103456312B (en) | A kind of single-channel voice blind separating method based on Computational auditory scene analysis | |
CN104735528A (en) | Sound effect matching method and device | |
CN104900238B (en) | A kind of audio real-time comparison method based on perception filtering | |
CN101366078A (en) | Neural network classifier for separating audio sources from a monophonic audio signal | |
CN103700370A (en) | Broadcast television voice recognition method and system | |
JPS6011899A (en) | Method and apparatus for imitating audio response information | |
JP6335301B2 (en) | Method and apparatus for encoding stereo phase parameters | |
CN108962229A (en) | A kind of target speaker's voice extraction method based on single channel, unsupervised formula | |
CN110782915A (en) | Waveform music component separation method based on deep learning | |
CN107690034A (en) | Intelligent scene mode switching system and method based on environmental background sound | |
CA3136870A1 (en) | Method and apparatus for determining a deep filter | |
CN111142066A (en) | Direction-of-arrival estimation method, server, and computer-readable storage medium | |
CN115081473A (en) | Multi-feature fusion brake noise classification and identification method | |
US10991375B2 (en) | Systems and methods for processing an audio signal for replay on an audio device | |
Chu et al. | A noise-robust FFT-based auditory spectrum with application in audio classification | |
CN105227763A (en) | A kind of instrumental audio real time method for segmenting realized on Intelligent mobile equipment | |
CN104900239A (en) | Audio real-time comparison method based on Walsh-Hadamard transform | |
CN109841223B (en) | Audio signal processing method, intelligent terminal and storage medium | |
WO2021184732A1 (en) | Audio packet loss repairing method, device and system based on neural network | |
Meng et al. | An empirical envelope estimation algorithm | |
Baghel et al. | Shouted/normal speech classification using speech-specific features | |
US20130322645A1 (en) | Data recognition and separation engine |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20181225 |
|
RJ01 | Rejection of invention patent application after publication |