CN109087634A

CN109087634A - A kind of sound quality setting method based on audio classification

Info

Publication number: CN109087634A
Application number: CN201811278861.0A
Authority: CN
Inventors: 高岚
Original assignee: Sichuan Changhong Electric Co Ltd
Current assignee: Sichuan Changhong Electric Co Ltd
Priority date: 2018-10-30
Filing date: 2018-10-30
Publication date: 2018-12-25

Abstract

The sound quality setting method based on audio classification that the invention discloses a kind of extracts the feature of one section of voice data first, generates characteristic image, is then classified using convolutional neural networks classification method to characteristic image.Finally according to different classification audios, corresponding adjustment is made in Doby audio, equalization setting.By carrying out identification and corresponding setting to different audio scenes automatically, keeps Android intelligent television more intelligent, promote the usage experience of user, experience Android intelligent television bring enjoyment.

Description

A kind of sound quality setting method based on audio classification

Technical field

The invention belongs to voice technology fields, and in particular to a kind of sound quality setting method based on audio classification.

Background technique

With artificial intelligence technology greatly developing in all trades and professions, the various aspects of human lives, TV have been entered Industry is no exception.Using artificial intelligence technology, TV is made to have intelligence, user demand can be better met, improves user Usage experience.

The multi-medium datas such as video, audio are all information medium forms important in television set, and wherein audio-frequency information occupies Critically important status.How audio-frequency information is handled, fabric analysis and using being one in field of information processing important Project, and audio classification is one of key technology therein.The audio-frequency information of different scenes has reapective features, such as news Class, modulation in tone have the characteristics of certain word speed is this kind of audio scene；For example music class, high-low sound frequency have both, and have one Determine the characteristics of timing is this kind of audio scene.For different audio scenes, different audio moulds can be set on TV Formula preferably adapts to different scenes.

The artificial intelligence technology of major part product is all to operate in the cloud server end of internet at present, because carrying The hardware condition of android system itself limits, and is unable to run large-scale calculating, can not occupy too many resource, such as CPU Occupancy.

Summary of the invention

The sound quality setting method based on audio classification that the purpose of the present invention is to provide a kind of has to operating in Arm plate On audio scene classification technology the advantages of being designed, optimize and realizing.

Above-mentioned purpose of the invention has the technical scheme that

A kind of sound quality setting method based on audio classification, including audio feature extraction module, audio classification module and sound Frequency setup module, further comprising the steps of:

S1, audio feature extraction；

S11, preemphasis promote the high frequency section in voice data, make letter by the voice data of 9S by high-pass filter Number frequency spectrum become flat；

S12, framing, sample rate 22.05KHz, it is a frame that 822 sampled points, which are arranged, i.e. the time of a frame is 40ms, 9S Voice data be divided into 225 frames；

S13, adding window increase the continuity of left and right ends by each frame multiplied by Hamming window；

S14, fast Fourier transform carry out Fast Fourier Transform (FFT) to every frame signal after adding window, obtain the frequency spectrum of each frame, Again to frequency spectrum modulus square, the power spectrum of voice signal is finally obtained；

Linear natural frequency spectrum is converted to by the power spectrum of signal by Mel filter and embodies people by S15, Mel filtering The Mel frequency spectrum of class auditory properties only takes preceding 224 features of each frame signal；

S16, logarithm is taken, logarithm is taken to Mel frequency spectrum, the spectrogram of 225*224 size can be obtained, is i.e. abscissa is Frame, ordinate are Mel feature, in actually calculating, can give up a frame data, i.e., be done using the spectrogram of 224*224 size Classification, but the value of spectrogram this moment is full in the range of image 0~255, in order to which the value of spectrogram is mapped to figure As in the range of 0~255 value, the present invention has done Linear Mapping below and calculated:

F (x)=1.5 × (10x+80) (formula 1)

By the calculating of formula 1, the value of Mel spectrogram is may map to substantially in the value range of image 0~255；

S2, audio classification；

S21, audio classification module using deep learning CNN convolutional neural networks --- MobileNet sorter network come Carry out the classification of voice data；

S3, sound quality setting；

S31, the voice data of music class is promoted by the audio frequency optimization apparatus function decaying low frequency part of Doby audio Voice corresponding frequency band uses the clear function intensified vocal sections' effect of Doby audio；

S32, the voice data to news category delineate substantially sound style curve by Doby audio intelligence EQ function, lead to It crosses the cooperation of the functions such as Doby audio supper bass and surround sound and adjusts audio；

S33, the voice data to other classes, default standard mode parameter.

In conclusion the invention has the following advantages:

(1) by carrying out identification and corresponding setting to different audio scenes automatically, make Android intelligent television more intelligence Can, the usage experience of user is promoted, Android intelligent television bring enjoyment is experienced.

Detailed description of the invention

In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with It obtains other drawings based on these drawings.

Fig. 1 is the flow chart of the embodiment of the present invention.

Specific embodiment

In the following detailed description, many details are proposed, in order to complete understanding of the present invention.But It will be apparent to those skilled in the art that the present invention can not need some details in these details In the case of implement.Below to the description of embodiment just for the sake of provided by showing example of the invention to it is of the invention more Understand well.

Below in conjunction with attached drawing, the technical solution of the embodiment of the present invention is described.

Embodiment:

As shown in Figure 1, a kind of sound quality setting method based on audio classification, including audio feature extraction module, audio point Generic module and audio setting module, further comprising the steps of:

S1, audio feature extraction；

F (x)=1.5 × (10x+80) (formula 1)

S2, sound quality setting；

S3, sound quality setting；

S33, the voice data to other classes, default standard mode parameter.

The present invention be directed to the audio scene classification technologies operated on Arm plate to be designed, optimize and realize, by certainly It is dynamic that identification and corresponding setting are carried out to different audio scenes, keep Android intelligent television more intelligent, promote user uses body It tests, experiences Android intelligent television bring enjoyment.

The above embodiments are merely illustrative of the technical solutions of the present invention, rather than limits the protection scope of invention.It is aobvious So, described embodiment is only section Example of the present invention, rather than whole embodiments.Based on these embodiments, ability Domain those of ordinary skill every other embodiment obtained without creative efforts, belongs to institute of the present invention Scope of protection.

Although referring to above-described embodiment, invention is explained in detail, and those of ordinary skill in the art still can be with In the absence of conflict, creative work is not made to be according to circumstances combined with each other the feature in various embodiments of the present invention, increase It deletes or makees other adjustment, to obtain other technologies scheme different, that essence is without departing from design of the invention, these technical sides Case similarly belongs to invention which is intended to be protected.

Claims

1. a kind of sound quality setting method based on audio classification, which is characterized in that including audio feature extraction module, audio classification Module and audio setting module, further comprising the steps of:

S1, audio feature extraction；

S11, preemphasis promote the high frequency section in voice data, make signal by the voice data of 9S by high-pass filter Frequency spectrum becomes flat；

S12, framing, sample rate 22.05KHz, it is a frame that 822 sampled points, which are arranged, i.e. the time of a frame is 40ms, the language of 9S Sound data are divided into 225 frames；

S14, fast Fourier transform carry out Fast Fourier Transform (FFT) to every frame signal after adding window, obtain the frequency spectrum of each frame, then right Frequency spectrum modulus square finally obtains the power spectrum of voice signal；

Linear natural frequency spectrum is converted to the embodiment mankind and listened by S15, Mel filtering by the power spectrum of signal by Mel filter The Mel frequency spectrum for feeling characteristic, only takes preceding 224 features of each frame signal；

S16, it taking logarithm, logarithm is taken to Mel frequency spectrum, the spectrogram of 225*224 size can be obtained, i.e. abscissa is frame, Ordinate is Mel feature, in actually calculating, can give up a frame data, i.e., is done point using the spectrogram of 224*224 size Class, but the value of spectrogram this moment is full in the range of image 0~255, in order to which the value of spectrogram is mapped to image 0 In the range of~255 values, the present invention has done Linear Mapping below and has calculated:

F (x)=1.5 × (10x+80) (formula 1)

S2, audio classification；

S21, audio classification module using deep learning CNN convolutional neural networks --- MobileNet sorter network carries out The classification of voice data；

S3, sound quality setting；

S31, voice is promoted by the audio frequency optimization apparatus function decaying low frequency part of Doby audio to the voice data of music class Corresponding frequency band uses the clear function intensified vocal sections' effect of Doby audio；

S32, the voice data to news category delineate substantially sound style curve by Doby audio intelligence EQ function, pass through Du Cooperate than functions such as audio supper bass and surround sounds and adjusts audio；

S33, the voice data to other classes, default standard mode parameter.