CN109087634A - A kind of sound quality setting method based on audio classification - Google Patents

A kind of sound quality setting method based on audio classification Download PDF

Info

Publication number
CN109087634A
CN109087634A CN201811278861.0A CN201811278861A CN109087634A CN 109087634 A CN109087634 A CN 109087634A CN 201811278861 A CN201811278861 A CN 201811278861A CN 109087634 A CN109087634 A CN 109087634A
Authority
CN
China
Prior art keywords
audio
frame
voice data
mel
classification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811278861.0A
Other languages
Chinese (zh)
Inventor
高岚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan Changhong Electric Co Ltd
Original Assignee
Sichuan Changhong Electric Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan Changhong Electric Co Ltd filed Critical Sichuan Changhong Electric Co Ltd
Priority to CN201811278861.0A priority Critical patent/CN109087634A/en
Publication of CN109087634A publication Critical patent/CN109087634A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/45Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of analysis window

Abstract

The sound quality setting method based on audio classification that the invention discloses a kind of extracts the feature of one section of voice data first, generates characteristic image, is then classified using convolutional neural networks classification method to characteristic image.Finally according to different classification audios, corresponding adjustment is made in Doby audio, equalization setting.By carrying out identification and corresponding setting to different audio scenes automatically, keeps Android intelligent television more intelligent, promote the usage experience of user, experience Android intelligent television bring enjoyment.

Description

A kind of sound quality setting method based on audio classification
Technical field
The invention belongs to voice technology fields, and in particular to a kind of sound quality setting method based on audio classification.
Background technique
With artificial intelligence technology greatly developing in all trades and professions, the various aspects of human lives, TV have been entered Industry is no exception.Using artificial intelligence technology, TV is made to have intelligence, user demand can be better met, improves user Usage experience.
The multi-medium datas such as video, audio are all information medium forms important in television set, and wherein audio-frequency information occupies Critically important status.How audio-frequency information is handled, fabric analysis and using being one in field of information processing important Project, and audio classification is one of key technology therein.The audio-frequency information of different scenes has reapective features, such as news Class, modulation in tone have the characteristics of certain word speed is this kind of audio scene;For example music class, high-low sound frequency have both, and have one Determine the characteristics of timing is this kind of audio scene.For different audio scenes, different audio moulds can be set on TV Formula preferably adapts to different scenes.
The artificial intelligence technology of major part product is all to operate in the cloud server end of internet at present, because carrying The hardware condition of android system itself limits, and is unable to run large-scale calculating, can not occupy too many resource, such as CPU Occupancy.
Summary of the invention
The sound quality setting method based on audio classification that the purpose of the present invention is to provide a kind of has to operating in Arm plate On audio scene classification technology the advantages of being designed, optimize and realizing.
Above-mentioned purpose of the invention has the technical scheme that
A kind of sound quality setting method based on audio classification, including audio feature extraction module, audio classification module and sound Frequency setup module, further comprising the steps of:
S1, audio feature extraction;
S11, preemphasis promote the high frequency section in voice data, make letter by the voice data of 9S by high-pass filter Number frequency spectrum become flat;
S12, framing, sample rate 22.05KHz, it is a frame that 822 sampled points, which are arranged, i.e. the time of a frame is 40ms, 9S Voice data be divided into 225 frames;
S13, adding window increase the continuity of left and right ends by each frame multiplied by Hamming window;
S14, fast Fourier transform carry out Fast Fourier Transform (FFT) to every frame signal after adding window, obtain the frequency spectrum of each frame, Again to frequency spectrum modulus square, the power spectrum of voice signal is finally obtained;
Linear natural frequency spectrum is converted to by the power spectrum of signal by Mel filter and embodies people by S15, Mel filtering The Mel frequency spectrum of class auditory properties only takes preceding 224 features of each frame signal;
S16, logarithm is taken, logarithm is taken to Mel frequency spectrum, the spectrogram of 225*224 size can be obtained, is i.e. abscissa is Frame, ordinate are Mel feature, in actually calculating, can give up a frame data, i.e., be done using the spectrogram of 224*224 size Classification, but the value of spectrogram this moment is full in the range of image 0~255, in order to which the value of spectrogram is mapped to figure As in the range of 0~255 value, the present invention has done Linear Mapping below and calculated:
F (x)=1.5 × (10x+80) (formula 1)
By the calculating of formula 1, the value of Mel spectrogram is may map to substantially in the value range of image 0~255;
S2, audio classification;
S21, audio classification module using deep learning CNN convolutional neural networks --- MobileNet sorter network come Carry out the classification of voice data;
S3, sound quality setting;
S31, the voice data of music class is promoted by the audio frequency optimization apparatus function decaying low frequency part of Doby audio Voice corresponding frequency band uses the clear function intensified vocal sections' effect of Doby audio;
S32, the voice data to news category delineate substantially sound style curve by Doby audio intelligence EQ function, lead to It crosses the cooperation of the functions such as Doby audio supper bass and surround sound and adjusts audio;
S33, the voice data to other classes, default standard mode parameter.
In conclusion the invention has the following advantages:
(1) by carrying out identification and corresponding setting to different audio scenes automatically, make Android intelligent television more intelligence Can, the usage experience of user is promoted, Android intelligent television bring enjoyment is experienced.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with It obtains other drawings based on these drawings.
Fig. 1 is the flow chart of the embodiment of the present invention.
Specific embodiment
In the following detailed description, many details are proposed, in order to complete understanding of the present invention.But It will be apparent to those skilled in the art that the present invention can not need some details in these details In the case of implement.Below to the description of embodiment just for the sake of provided by showing example of the invention to it is of the invention more Understand well.
Below in conjunction with attached drawing, the technical solution of the embodiment of the present invention is described.
Embodiment:
As shown in Figure 1, a kind of sound quality setting method based on audio classification, including audio feature extraction module, audio point Generic module and audio setting module, further comprising the steps of:
S1, audio feature extraction;
S11, preemphasis promote the high frequency section in voice data, make letter by the voice data of 9S by high-pass filter Number frequency spectrum become flat;
S12, framing, sample rate 22.05KHz, it is a frame that 822 sampled points, which are arranged, i.e. the time of a frame is 40ms, 9S Voice data be divided into 225 frames;
S13, adding window increase the continuity of left and right ends by each frame multiplied by Hamming window;
S14, fast Fourier transform carry out Fast Fourier Transform (FFT) to every frame signal after adding window, obtain the frequency spectrum of each frame, Again to frequency spectrum modulus square, the power spectrum of voice signal is finally obtained;
Linear natural frequency spectrum is converted to by the power spectrum of signal by Mel filter and embodies people by S15, Mel filtering The Mel frequency spectrum of class auditory properties only takes preceding 224 features of each frame signal;
S16, logarithm is taken, logarithm is taken to Mel frequency spectrum, the spectrogram of 225*224 size can be obtained, is i.e. abscissa is Frame, ordinate are Mel feature, in actually calculating, can give up a frame data, i.e., be done using the spectrogram of 224*224 size Classification, but the value of spectrogram this moment is full in the range of image 0~255, in order to which the value of spectrogram is mapped to figure As in the range of 0~255 value, the present invention has done Linear Mapping below and calculated:
F (x)=1.5 × (10x+80) (formula 1)
By the calculating of formula 1, the value of Mel spectrogram is may map to substantially in the value range of image 0~255;
S2, sound quality setting;
S21, audio classification module using deep learning CNN convolutional neural networks --- MobileNet sorter network come Carry out the classification of voice data;
S3, sound quality setting;
S31, the voice data of music class is promoted by the audio frequency optimization apparatus function decaying low frequency part of Doby audio Voice corresponding frequency band uses the clear function intensified vocal sections' effect of Doby audio;
S32, the voice data to news category delineate substantially sound style curve by Doby audio intelligence EQ function, lead to It crosses the cooperation of the functions such as Doby audio supper bass and surround sound and adjusts audio;
S33, the voice data to other classes, default standard mode parameter.
The present invention be directed to the audio scene classification technologies operated on Arm plate to be designed, optimize and realize, by certainly It is dynamic that identification and corresponding setting are carried out to different audio scenes, keep Android intelligent television more intelligent, promote user uses body It tests, experiences Android intelligent television bring enjoyment.
The above embodiments are merely illustrative of the technical solutions of the present invention, rather than limits the protection scope of invention.It is aobvious So, described embodiment is only section Example of the present invention, rather than whole embodiments.Based on these embodiments, ability Domain those of ordinary skill every other embodiment obtained without creative efforts, belongs to institute of the present invention Scope of protection.
Although referring to above-described embodiment, invention is explained in detail, and those of ordinary skill in the art still can be with In the absence of conflict, creative work is not made to be according to circumstances combined with each other the feature in various embodiments of the present invention, increase It deletes or makees other adjustment, to obtain other technologies scheme different, that essence is without departing from design of the invention, these technical sides Case similarly belongs to invention which is intended to be protected.

Claims (1)

1. a kind of sound quality setting method based on audio classification, which is characterized in that including audio feature extraction module, audio classification Module and audio setting module, further comprising the steps of:
S1, audio feature extraction;
S11, preemphasis promote the high frequency section in voice data, make signal by the voice data of 9S by high-pass filter Frequency spectrum becomes flat;
S12, framing, sample rate 22.05KHz, it is a frame that 822 sampled points, which are arranged, i.e. the time of a frame is 40ms, the language of 9S Sound data are divided into 225 frames;
S13, adding window increase the continuity of left and right ends by each frame multiplied by Hamming window;
S14, fast Fourier transform carry out Fast Fourier Transform (FFT) to every frame signal after adding window, obtain the frequency spectrum of each frame, then right Frequency spectrum modulus square finally obtains the power spectrum of voice signal;
Linear natural frequency spectrum is converted to the embodiment mankind and listened by S15, Mel filtering by the power spectrum of signal by Mel filter The Mel frequency spectrum for feeling characteristic, only takes preceding 224 features of each frame signal;
S16, it taking logarithm, logarithm is taken to Mel frequency spectrum, the spectrogram of 225*224 size can be obtained, i.e. abscissa is frame, Ordinate is Mel feature, in actually calculating, can give up a frame data, i.e., is done point using the spectrogram of 224*224 size Class, but the value of spectrogram this moment is full in the range of image 0~255, in order to which the value of spectrogram is mapped to image 0 In the range of~255 values, the present invention has done Linear Mapping below and has calculated:
F (x)=1.5 × (10x+80) (formula 1)
By the calculating of formula 1, the value of Mel spectrogram is may map to substantially in the value range of image 0~255;
S2, audio classification;
S21, audio classification module using deep learning CNN convolutional neural networks --- MobileNet sorter network carries out The classification of voice data;
S3, sound quality setting;
S31, voice is promoted by the audio frequency optimization apparatus function decaying low frequency part of Doby audio to the voice data of music class Corresponding frequency band uses the clear function intensified vocal sections' effect of Doby audio;
S32, the voice data to news category delineate substantially sound style curve by Doby audio intelligence EQ function, pass through Du Cooperate than functions such as audio supper bass and surround sounds and adjusts audio;
S33, the voice data to other classes, default standard mode parameter.
CN201811278861.0A 2018-10-30 2018-10-30 A kind of sound quality setting method based on audio classification Pending CN109087634A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811278861.0A CN109087634A (en) 2018-10-30 2018-10-30 A kind of sound quality setting method based on audio classification

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811278861.0A CN109087634A (en) 2018-10-30 2018-10-30 A kind of sound quality setting method based on audio classification

Publications (1)

Publication Number Publication Date
CN109087634A true CN109087634A (en) 2018-12-25

Family

ID=64844448

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811278861.0A Pending CN109087634A (en) 2018-10-30 2018-10-30 A kind of sound quality setting method based on audio classification

Country Status (1)

Country Link
CN (1) CN109087634A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111028852A (en) * 2019-11-06 2020-04-17 杭州哲信信息技术有限公司 Noise removing method in intelligent calling system based on CNN
CN111274989A (en) * 2020-02-11 2020-06-12 中国科学院上海微系统与信息技术研究所 Deep learning-based field vehicle identification method
CN111583890A (en) * 2019-02-15 2020-08-25 阿里巴巴集团控股有限公司 Audio classification method and device
WO2021137551A1 (en) * 2019-12-31 2021-07-08 Samsung Electronics Co., Ltd. Equalizer for equalization of music signals and methods for the same
CN113257276A (en) * 2021-05-07 2021-08-13 普联国际有限公司 Audio scene detection method, device, equipment and storage medium
CN113314148A (en) * 2021-07-29 2021-08-27 中国科学院自动化研究所 Light-weight neural network generated voice identification method and system based on original waveform
WO2023078093A1 (en) * 2021-11-03 2023-05-11 华为技术有限公司 Audio playback method and system, and electronic device

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104811864A (en) * 2015-04-20 2015-07-29 深圳市冠旭电子有限公司 Method and system for self-adaptive adjustment of audio effect
CN104819846A (en) * 2015-04-10 2015-08-05 北京航空航天大学 Rolling bearing sound signal fault diagnosis method based on short-time Fourier transform and sparse laminated automatic encoder
CN105405448A (en) * 2014-09-16 2016-03-16 科大讯飞股份有限公司 Sound effect processing method and apparatus
CN105895110A (en) * 2016-06-30 2016-08-24 北京奇艺世纪科技有限公司 Method and device for classifying audio files
CN106600559A (en) * 2016-12-21 2017-04-26 东方网力科技股份有限公司 Fuzzy kernel obtaining and image de-blurring method and apparatus
CN106775562A (en) * 2016-12-09 2017-05-31 奇酷互联网络科技(深圳)有限公司 The method and device of audio frequency parameter treatment
CN107393554A (en) * 2017-06-20 2017-11-24 武汉大学 In a kind of sound scene classification merge class between standard deviation feature extracting method
CN107910018A (en) * 2017-10-30 2018-04-13 广州视源电子科技股份有限公司 Sound effect treatment method and system, computer-readable storage medium and equipment
CN108281152A (en) * 2018-01-18 2018-07-13 腾讯音乐娱乐科技(深圳)有限公司 Audio-frequency processing method, device and storage medium

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105405448A (en) * 2014-09-16 2016-03-16 科大讯飞股份有限公司 Sound effect processing method and apparatus
CN104819846A (en) * 2015-04-10 2015-08-05 北京航空航天大学 Rolling bearing sound signal fault diagnosis method based on short-time Fourier transform and sparse laminated automatic encoder
CN104811864A (en) * 2015-04-20 2015-07-29 深圳市冠旭电子有限公司 Method and system for self-adaptive adjustment of audio effect
CN105895110A (en) * 2016-06-30 2016-08-24 北京奇艺世纪科技有限公司 Method and device for classifying audio files
CN106775562A (en) * 2016-12-09 2017-05-31 奇酷互联网络科技(深圳)有限公司 The method and device of audio frequency parameter treatment
CN106600559A (en) * 2016-12-21 2017-04-26 东方网力科技股份有限公司 Fuzzy kernel obtaining and image de-blurring method and apparatus
CN107393554A (en) * 2017-06-20 2017-11-24 武汉大学 In a kind of sound scene classification merge class between standard deviation feature extracting method
CN107910018A (en) * 2017-10-30 2018-04-13 广州视源电子科技股份有限公司 Sound effect treatment method and system, computer-readable storage medium and equipment
CN108281152A (en) * 2018-01-18 2018-07-13 腾讯音乐娱乐科技(深圳)有限公司 Audio-frequency processing method, device and storage medium

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111583890A (en) * 2019-02-15 2020-08-25 阿里巴巴集团控股有限公司 Audio classification method and device
CN111028852A (en) * 2019-11-06 2020-04-17 杭州哲信信息技术有限公司 Noise removing method in intelligent calling system based on CNN
WO2021137551A1 (en) * 2019-12-31 2021-07-08 Samsung Electronics Co., Ltd. Equalizer for equalization of music signals and methods for the same
US11515853B2 (en) 2019-12-31 2022-11-29 Samsung Electronics Co., Ltd. Equalizer for equalization of music signals and methods for the same
CN111274989A (en) * 2020-02-11 2020-06-12 中国科学院上海微系统与信息技术研究所 Deep learning-based field vehicle identification method
CN113257276A (en) * 2021-05-07 2021-08-13 普联国际有限公司 Audio scene detection method, device, equipment and storage medium
CN113257276B (en) * 2021-05-07 2024-03-29 普联国际有限公司 Audio scene detection method, device, equipment and storage medium
CN113314148A (en) * 2021-07-29 2021-08-27 中国科学院自动化研究所 Light-weight neural network generated voice identification method and system based on original waveform
WO2023078093A1 (en) * 2021-11-03 2023-05-11 华为技术有限公司 Audio playback method and system, and electronic device

Similar Documents

Publication Publication Date Title
CN109087634A (en) A kind of sound quality setting method based on audio classification
CN105845127B (en) Audio recognition method and its system
CN107293286B (en) Voice sample collection method based on network dubbing game
CN103456312B (en) A kind of single-channel voice blind separating method based on Computational auditory scene analysis
CN104735528A (en) Sound effect matching method and device
CN104900238B (en) A kind of audio real-time comparison method based on perception filtering
CN101366078A (en) Neural network classifier for separating audio sources from a monophonic audio signal
CN103700370A (en) Broadcast television voice recognition method and system
JPS6011899A (en) Method and apparatus for imitating audio response information
JP6335301B2 (en) Method and apparatus for encoding stereo phase parameters
CN108962229A (en) A kind of target speaker's voice extraction method based on single channel, unsupervised formula
CN110782915A (en) Waveform music component separation method based on deep learning
CN107690034A (en) Intelligent scene mode switching system and method based on environmental background sound
CA3136870A1 (en) Method and apparatus for determining a deep filter
CN111142066A (en) Direction-of-arrival estimation method, server, and computer-readable storage medium
CN115081473A (en) Multi-feature fusion brake noise classification and identification method
US10991375B2 (en) Systems and methods for processing an audio signal for replay on an audio device
Chu et al. A noise-robust FFT-based auditory spectrum with application in audio classification
CN105227763A (en) A kind of instrumental audio real time method for segmenting realized on Intelligent mobile equipment
CN104900239A (en) Audio real-time comparison method based on Walsh-Hadamard transform
CN109841223B (en) Audio signal processing method, intelligent terminal and storage medium
WO2021184732A1 (en) Audio packet loss repairing method, device and system based on neural network
Meng et al. An empirical envelope estimation algorithm
Baghel et al. Shouted/normal speech classification using speech-specific features
US20130322645A1 (en) Data recognition and separation engine

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20181225

RJ01 Rejection of invention patent application after publication