CN106792253A

CN106792253A - Sound effect processing method and system

Info

Publication number: CN106792253A
Application number: CN201611092855.7A
Authority: CN
Inventors: 陈蕴洲
Original assignee: Guangzhou Shiyuan Electronics Thecnology Co Ltd
Current assignee: Guangzhou Shiyuan Electronics Thecnology Co Ltd
Priority date: 2016-11-30
Filing date: 2016-11-30
Publication date: 2017-05-31
Anticipated expiration: 2036-11-30
Also published as: CN106792253B

Abstract

The invention relates to a sound effect processing method and a system, wherein the method comprises the following steps: collecting an environmental sound signal within a preset time range; carrying out feature extraction on the environmental sound signal to obtain environmental sound features; selecting a reference sound feature with the maximum similarity to the environmental sound feature from a plurality of preset reference sound features; searching a preset sound effect mode corresponding to the selected reference sound characteristic to obtain a matched sound effect mode; and performing sound effect processing on the sound signal to be played according to the matched sound effect mode. Therefore, the most suitable matching sound effect mode can be automatically selected according to the environmental sound characteristics of the environmental sound signal, and the sound effect processing effect is good; meanwhile, user operation is not needed, and the use convenience of the user is improved.

Description

Sound effect treatment method and system

Technical field

The present invention relates to signal processing technology field, more particularly to a kind of sound effect treatment method and system.

Background technology

When audio and video display device carries out audio frequency and video or program broadcasting, different audio patterns can be arranged as required to. By taking television set as an example, the method for traditional replacing audio pattern is that menu interface of the user using remote control in display is selected Select, the processing system of television set is exported after being processed voice signal according to the audio pattern that user selects.

However, generally there is noise in external environment residing for television set, user is not aware that corresponding to current environment, Which kind of the audio pattern being best suitable for is, the audio effect processing result of television set and the incongruent situation of current environment, sound easily occurs Effect poor processing effect.

The content of the invention

Based on this, it is necessary to regarding to the issue above, there is provided the sound effect treatment method and system of a kind of high treating effect.

A kind of sound effect treatment method, including：

Environmental sound signal in collection preset time range；

Feature extraction is carried out to the environmental sound signal, ambient sound feature is obtained；

The reference voice maximum with the ambient sound characteristic similarity is chosen from default multiple reference voice features Feature；

Default audio pattern corresponding with the reference voice feature chosen is searched, obtains matching audio pattern；

Audio effect processing is carried out to voice signal to be played according to the matching audio pattern.

A kind of sound effect processing system, including：

Environmental sound signal acquisition module, for gathering the environmental sound signal in preset time range；

Ambient sound feature acquisition module, for carrying out feature extraction to the environmental sound signal, obtains ambient sound Feature；

Reference voice characteristic selecting module, for being chosen and the ambient sound from default multiple reference voice features The maximum reference voice feature of characteristic similarity；

Matching audio pattern searching modul, the corresponding default audio pattern of reference voice feature for searching with choose, Obtain matching audio pattern；

Audio effect processing module, for carrying out audio effect processing to voice signal to be played according to the matching audio pattern.

Above-mentioned sound effect treatment method and system, by gathering the environmental sound signal in preset time range, to ambient sound Message number carries out feature extraction and obtains ambient sound feature；Then chosen and ambient sound from default multiple reference voice features The maximum reference voice feature of sound characteristic similarity, searches default audio pattern corresponding with the reference voice feature chosen and obtains Matching audio pattern, and audio effect processing is carried out to voice signal to be played according to matching audio pattern.In this way, can be according to ambient sound The ambient sound feature of message number, chooses the matching audio pattern being best suitable for automatically, and audio effect processing effect is good；Meanwhile, without use Family operates, and improves the convenience that user uses.

Brief description of the drawings

Fig. 1 is the flow chart of sound effect treatment method in an embodiment；

Fig. 2 obtains the idiographic flow of ambient sound feature to carry out feature extraction to environmental sound signal in an embodiment Figure；

Fig. 3 is maximum with ambient sound characteristic similarity to be chosen from default multiple reference voice features in an embodiment Reference voice feature particular flow sheet；

Fig. 4 is the module map of sound effect processing system in an embodiment.

Specific embodiment

With reference to Fig. 1, the sound effect treatment method in an embodiment comprises the following steps.

S110：Environmental sound signal in collection preset time range.

Preset time range refers to that the time period presets the time range that setting or duration pre-set.Voice signal specifically can be with The sound for gathering surrounding environment by microphone is obtained.

In one embodiment, preset time range is to be as initial time, with preset value to receive the moment of play instruction The time range of duration.Wherein, play instruction refers to for indicating to open the instruction that audio/video is played or TV programme are played, example The instruction of processing system is waken up during such as television boot-strap.Preset value can specifically be set according to actual needs.In the present embodiment, in advance If value is 5 seconds；When the moment for receiving play instruction corresponds to the television boot-strap moment, preset time range is for after television boot-strap First 5 seconds.

Generally after play instruction is received, processing system just plays audio/video after needing the shorter response time.By choosing The time range that the moment for taking reception play instruction is initial time, preset value is duration carries out the collection of voice signal, obtains Environmental sound signal be to start the voice signal played before audio/video, it is to avoid the sound of actual play is to ambient sound The influence of sound signal acquisition, can improve the accuracy of sound signal collecting.It is appreciated that in other embodiments, Preset Time Scope can also other times scope, such as the time range with current time as initial time, with preset value as duration, currently Moment can set in real time, realize gathering environmental sound signal simultaneously during audio/video is played.

S130：Feature extraction is carried out to environmental sound signal, ambient sound feature is obtained.

It can be numerical value or image that the ambient sound feature that feature extraction obtains is carried out to environmental sound signal.

In one embodiment, ambient sound is characterized as numerical value.With reference to Fig. 2, step S130 includes step S131 to step S134。

S131：Environmental sound signal is converted into data signal.

The ambient sound of collection is analog signal, and data signal can be converted analog signals into by analog-to-digital conversion.

S132：The frequency information that spectrum analysis obtains including multiple Frequency points is carried out to data signal.

Spectrum analysis is carried out to data signal, is analyzed using Fourier transformation, obtain data signal The Frequency point for inside including.

S133：The average value of the Frequency point in each predeterminated frequency section is calculated respectively according to frequency information, as each pre- If the characteristic value of frequency band.

Predeterminated frequency section has multiple, can be set according to the audible audio frequency range of actual human ear.By frequency information In be not belonging to any one predeterminated frequency section Frequency point give up, each Frequency point is classified according to the size of Frequency point, it is right The Frequency point belonged in a predeterminated frequency section calculates average value, can obtain the characteristic value of correspondence predeterminated frequency section.Specifically Ground, if Frequency point is successive value in same predeterminated frequency section, the calculating of average value can be by predeterminated frequency section Frequency point after divided by frequency spectrum length；If the Frequency point in same predeterminated frequency section is centrifugal pump, average value can be with It is to count out to obtain divided by frequency after directly calculating each Frequency point sum, for example, 30hz (hertz) in frequency information, 50hz, 100hz, 110hz belong to same predeterminated frequency section, then the average value of calculating 30,50,100 and 110 is used as the default frequency The characteristic value of rate section.

In one embodiment, predeterminated frequency section include 20hz-200hz, 200hz-700hz, 700hz-2000hz, 2000hz-7000hz and 7000hz-15000hz.In this way, the audio frequency range that human ear under normal circumstances is heard is drawn Point, feature extraction is targetedly carried out, improve data-handling efficiency.It is appreciated that in other embodiments, predeterminated frequency section It can also be provided that other numerical value.

S134：Calculate the characteristic value of each predeterminated frequency section and the product of corresponding predetermined coefficient respectively, and calculate each product it With obtain ambient sound feature.

Each predeterminated frequency section is advance to that should have a predetermined coefficient, and by step S133, each predeterminated frequency Section is to that should have a characteristic value.By by characteristic value and the predeterminated frequency section corresponding predetermined coefficient phase of each predeterminated frequency section Multiply, each predeterminated frequency section correspondence obtains a product, calculates the sum of products and then can obtain ambient sound feature.In the present embodiment, Each predeterminated frequency section 20hz-200hz, 200hz-700hz, 700hz-2000hz, 2000hz-7000hz and 7000hz- 15000hz distinguishes corresponding predetermined coefficient：- 100, -10,0,10 and 100.

Frequency analysis is carried out by environmental sound signal, the frequency information obtained according to frequency analysis is calculated The numerical value for arriving is indicated as ambient sound feature in the form of quantifying, and is easy to Data Analysis Services.It is appreciated that In other embodiment, sound characteristic can also be extracted using other method, for example, to the number after environmental sound signal analog-to-digital conversion Word signal carries out spectrum analysis, and the spectrogram that spectrum analysis is obtained is directly as environmental sound signal.

S150：The reference voice maximum with ambient sound characteristic similarity is chosen from default multiple reference voice features Feature.

Reference voice feature refers to the sound characteristic for pre-setting and comparing for reference.If ambient sound is characterized as numerical value, Reference voice feature is similarly numerical value, and reference voice feature is obtained with the similarity of ambient sound feature by calculating both differences Arrive, difference is smaller, then similarity is more maximum.If ambient sound is characterized as image, reference voice feature is similarly image, reference Sound characteristic is compared by image with the similarity of ambient sound feature and obtains, and image comparing difference is smaller, then similarity is bigger.

Wherein, reference voice feature can be obtained by advance collection analysis.In one embodiment, with reference to Fig. 3, step Also include step S101 to step S103 before S110.

S101：Respectively to the voice signal in multiple model scenes collection preset duration, multiple reference voice signals are obtained.

Model scene refers to real life scene.The quantity of model scene is equal with the quantity of reference voice feature, mould The voice signal of type scene can specifically be gathered by microphone.In the present embodiment, model scene includes parlor, night visitor in the daytime Five kinds of the Room, hotel, sales field and dining room, correspond to that The clamors of the people bubble up respectively, it is quiet, quiet, mix and spacious five kinds of situations.Can manage Solution, in other embodiments, model scene can also be other scenes.

Preset duration is more than the corresponding duration of preset time range.Model scene voice signal is carried out by setting for a long time Collection, the voice signal for obtaining can accurately represent the corresponding ambient acoustical circumstances of model scene so that the reference for obtaining Voice signal is more accurate.In the present embodiment, preset duration is 10000 seconds.

S102：According to the ratio between preset duration duration corresponding with preset time range, each reference voice signal is divided into many Individual signal segment.

The ratio between preset duration duration corresponding with preset time range is probably the integer value more than 1, it is also possible to be greater than 1 Non integer value.Reference voice signal is the voice signal continuous collection at each moment in preset duration, by each reference voice letter Number be divided into multiple signal segments, can be specifically with preset time range it is corresponding when a length of time interval, suitable according to time order and function Ordered pair reference voice signal carries out interception segmentation, and the last voice signal less than in a time interval is used as a signal segment. In this way, the corresponding reference voice signal of each model scene can be divided into multiple signal segments.

S103：The sound characteristic of each signal segment is extracted, the sound characteristic according to each signal segment obtains corresponding reference voice The sound characteristic of signal obtains reference voice feature.

The specific method of the sound characteristic of each signal segment is extracted, the specific side with the sound characteristic of extraction environment voice signal Method is identical, will not be described here.If the sound characteristic that extraction is obtained is numerical value, it is right that the sound characteristic according to each signal segment is obtained The sound characteristic of the reference voice signal answered, can be specifically special using the average value of the numerical value of each signal segment as reference voice Levy；If the sound characteristic that extraction is obtained is image, the sound characteristic according to each signal segment obtains corresponding reference voice signal Sound characteristic, can be specifically that image processing and analyzing is carried out to the image of each signal segment, obtain representing whole reference voice letter Number image as reference voice feature.

Gather the reference voice signal of multiple model scenes in advance by way of using step S101 to step S103, and The reference voice feature that feature extraction obtains each model scene is carried out, the representative strong and accuracy of the reference voice feature for obtaining It is high.

S170：Default audio pattern corresponding with the reference voice feature chosen is searched, obtains matching audio pattern.

Each reference voice feature, specifically can be by advance will be with reference to sound in advance to that should have a kind of default audio pattern Sound feature storage corresponding with default audio pattern, so as to according to reference voice feature corresponding default audio mould can be found Formula.

In the present embodiment, presetting audio pattern includes news, night, movie theatre, five kinds of patterns of standard and music, corresponds to respectively In the daytime the reference voice feature of parlor, night parlor, hotel, sales field and five kinds of dining room model scene.In the setting of audio pattern, It is typically used for three kinds of standard techniques：Total-sonic, total volume and total surround, total-sonic have On/off two states, total volume have tri- kinds of states of normal/night/off, and total surround have on/off Two states.The state of each default audio pattern correspondence accepted standard technology is as shown in table 1 below.

Table 1

	total-sonic	total volume	total surround
				News	on	normal	off
Night	on	night	off
				Movie theatre	on	off	on
Standard	off	off	off
				Music	on	off	off

S190：Audio effect processing is carried out to voice signal to be played according to matching audio pattern.

After obtaining matching audio pattern, audio effect processing is carried out according to matching audio pattern, specifically using matching audio mould The corresponding standard technique of formula carries out audio effect processing to voice signal to be played automatically so that the voice signal to be played of output is adapted to In the environment being presently in.For example, if matching audio pattern is music, set total-sonic, total volume and The state of tri- kinds of standard techniques of total surround is respectively on, off, off.Voice signal to be played can be television set Sound of television signal.

Above-mentioned sound effect treatment method, by gathering the environmental sound signal in preset time range, to environmental sound signal Carry out feature extraction and obtain ambient sound feature；Then chosen and ambient sound feature from default multiple reference voice features The maximum reference voice feature of similarity, searches default audio pattern corresponding with the reference voice feature chosen and obtains matching sound Effect pattern, and audio effect processing is carried out to voice signal to be played according to matching audio pattern.In this way, can be according to environmental sound signal Ambient sound feature, the matching audio pattern being best suitable for is chosen automatically, audio effect processing effect is good；Meanwhile, without user's operation, Improve the convenience that user uses.

Above-mentioned sound effect treatment method can apply to the processing system of television set so that television set can be automatic according to environment Selection matching audio pattern carries out audio effect processing.Above-mentioned sound effect treatment method can also be applied to other audio and video display devices, Such as mobile phone, flat board etc. so that audio and video display device can automatically select matching audio when player is opened according to environment Pattern carries out audio effect processing.

With reference to Fig. 4, the sound effect processing system in an embodiment, including environmental sound signal acquisition module 110, ambient sound Feature acquisition module 130, reference voice characteristic selecting module 150, matching audio pattern searching modul 170 and audio effect processing module 190。

Environmental sound signal acquisition module 110 is used to gather the environmental sound signal in preset time range.

Generally after play instruction is received, processing system just plays audio/video after needing the shorter response time.By choosing The time range that the moment for taking reception play instruction is initial time, preset value is duration carries out the collection of voice signal, obtains Environmental sound signal be to start the voice signal played before audio/video, it is to avoid the sound of actual play is to ambient sound The influence of sound signal acquisition, can improve the accuracy of sound signal collecting.

Ambient sound feature acquisition module 130 is used to carry out feature extraction to environmental sound signal, obtains ambient sound special Levy.

In one embodiment, ambient sound is characterized as numerical value.Ambient sound feature acquisition module 130 includes analog-to-digital conversion list First (not shown), spectral analysis unit (not shown), characteristic value computing unit (not shown) and ambient sound feature calculation unit (not shown).

AD conversion unit is used to for environmental sound signal to be converted to data signal.

Spectral analysis unit is used to carry out data signal the frequency information that spectrum analysis obtains including multiple Frequency points.It is right Data signal carries out spectrum analysis, is analyzed using Fourier transformation, obtains the frequency included in data signal Rate point.

Characteristic value computing unit is used to be calculated respectively according to frequency information the flat of the Frequency point in each predeterminated frequency section Average, as the characteristic value of each predeterminated frequency section.

If Frequency point is successive value in same predeterminated frequency section, the calculating of average value can be by predeterminated frequency Divided by frequency spectrum length after frequency point in section；If the Frequency point in same predeterminated frequency section is centrifugal pump, average value Can count out to obtain divided by frequency after directly calculating each Frequency point sum.

In one embodiment, predeterminated frequency section include 20hz-200hz, 200hz-700hz, 700hz-2000hz, 2000hz-7000hz and 7000hz-15000hz.In this way, the audio frequency range that human ear under normal circumstances is heard is drawn Point, feature extraction is targetedly carried out, improve data-handling efficiency.

Ambient sound feature calculation unit is used to calculate respectively the characteristic value and corresponding predetermined coefficient of each predeterminated frequency section Product, and calculate each sum of products and obtain ambient sound feature.In the present embodiment, each predeterminated frequency section 20hz-200hz, 200hz-700hz, 700hz-2000hz, 2000hz-7000hz and 7000hz-15000hz distinguish corresponding predetermined coefficient：- 100th, -10,0,10 and 100.

By using AD conversion unit, spectral analysis unit, characteristic value computing unit and ambient sound feature calculation list Unit, frequency analysis is carried out to environmental sound signal, the numerical value that the frequency information obtained according to frequency analysis is calculated As ambient sound feature, it is indicated in the form of quantifying, is easy to Data Analysis Services.

Reference voice characteristic selecting module 150 is used to choose special with ambient sound from default multiple reference voice features Levy the maximum reference voice feature of similarity.

If ambient sound is characterized as numerical value, reference voice feature is similarly numerical value, reference voice feature and ambient sound The similarity of feature is obtained by calculating both differences, and difference is smaller, then similarity is more maximum.If ambient sound is characterized as figure Picture, then reference voice feature be similarly image, reference voice feature is compared with the similarity of ambient sound feature by image Arrive, image comparing difference is smaller, then similarity is bigger.

Reference voice feature can be obtained by advance collection analysis.In one embodiment, above-mentioned sound effect processing system is also Obtained including reference voice signal acquisition module (not shown), reference voice signal segmentation module (not shown) and reference voice feature Modulus block (not shown).

Reference voice signal acquisition module is used for respectively to the voice signal in multiple model scenes collection preset duration, obtains To multiple reference voice signals.Wherein, the quantity of model scene is equal with the quantity of reference voice feature, and preset duration is more than pre- If the corresponding duration of time range.In the present embodiment, model scene includes parlor, night parlor, hotel, sales field and dining room in the daytime Five kinds, correspond to that The clamors of the people bubble up respectively, it is quiet, quiet, mix and spacious five kinds of situations.It is appreciated that in other embodiments, mould Type scene can also be other scenes.In the present embodiment, preset duration is 10000 seconds.

Reference voice signal segmentation module is used for according to the ratio between preset duration duration corresponding with preset time range, will be each Reference voice signal is divided into multiple signal segments.

Reference voice signal is the voice signal continuous collection at each moment in preset duration, by each reference voice signal point Be multiple signal segments, can be specifically with preset time range it is corresponding when a length of time interval, according to time order and function order pair Reference voice signal carries out interception segmentation, and the last voice signal less than in a time interval is used as a signal segment.In this way, The corresponding reference voice signal of each model scene can be divided into multiple signal segments.

Reference voice feature acquisition module is used to extract the sound characteristic of each signal segment, according to the sound characteristic of each signal segment The sound characteristic for obtaining corresponding reference voice signal obtains reference voice feature.

Mould is obtained by using reference voice signal acquisition module, reference voice signal segmentation module and reference voice feature Block, the reference voice signal of the multiple model scenes of collection in advance, and carry out the reference voice that feature extraction obtains each model scene Feature, the reference voice feature for obtaining is representative strong and accuracy is high.

Matching audio pattern searching modul 170 is used to search default audio mould corresponding with the reference voice feature chosen Formula, obtains matching audio pattern.

In the present embodiment, presetting audio pattern includes news, night, movie theatre, five kinds of patterns of standard and music, corresponds to respectively In the daytime the reference voice feature of parlor, night parlor, hotel, sales field and five kinds of dining room model scene.In the setting of audio pattern, It is typically used for three kinds of standard techniques：Total-sonic, total volume and total surround, total-sonic have On/off two states, total volume have tri- kinds of states of normal/night/off, and total surround have on/off Two states.

Audio effect processing module 190 is used to carry out audio effect processing to voice signal to be played according to matching audio pattern.

After obtaining matching audio pattern, audio effect processing is carried out according to matching audio pattern, specifically using matching audio mould The corresponding standard technique of formula carries out audio effect processing to voice signal to be played automatically so that the voice signal to be played of output is adapted to In the environment being presently in.Wherein, voice signal to be played can be the sound of television signal of television set.

Above-mentioned sound effect processing system, the environment in preset time range is gathered by environmental sound signal acquisition module 110 Voice signal, ambient sound feature acquisition module 130 carries out feature extraction and obtains ambient sound feature to environmental sound signal；So Reference voice characteristic selecting module 150 is chosen with ambient sound characteristic similarity most from default multiple reference voice features afterwards Big reference voice feature, matching audio pattern searching modul 170 searches default sound corresponding with the reference voice feature chosen Effect pattern obtains matching audio pattern, and audio effect processing module 190 carries out sound according to matching audio pattern to voice signal to be played Effect treatment.In this way, the matching audio pattern being best suitable for, sound can automatically be chosen according to the ambient sound feature of environmental sound signal Effect high treating effect；Meanwhile, without user's operation, improve the convenience that user uses.

Each technical characteristic of embodiment described above can be combined arbitrarily, to make description succinct, not to above-mentioned reality Apply all possible combination of each technical characteristic in example to be all described, as long as however, the combination of these technical characteristics is not deposited In contradiction, the scope of this specification record is all considered to be.

Embodiment described above only expresses several embodiments of the invention, and its description is more specific and detailed, but simultaneously Can not therefore be construed as limiting the scope of the patent.It should be pointed out that coming for one of ordinary skill in the art Say, without departing from the inventive concept of the premise, various modifications and improvements can be made, these belong to protection of the invention Scope.Therefore, the protection domain of patent of the present invention should be determined by the appended claims.

Claims

1. a kind of sound effect treatment method, it is characterised in that including：

Environmental sound signal in collection preset time range；

The reference voice feature maximum with the ambient sound characteristic similarity is chosen from default multiple reference voice features；

2. sound effect treatment method according to claim 1, it is characterised in that the preset time range is to receive broadcasting The moment of instruction is initial time, the time range with preset value as duration.

3. sound effect treatment method according to claim 1, it is characterised in that described that spy is carried out to the environmental sound signal Extraction is levied, ambient sound feature is obtained, including：

The environmental sound signal is converted into data signal；

The frequency information that spectrum analysis obtains including multiple Frequency points is carried out to the data signal；

The average value of the Frequency point in each predeterminated frequency section is calculated respectively according to the frequency information, as each predeterminated frequency The characteristic value of section；

The characteristic value of each predeterminated frequency section and the product of corresponding predetermined coefficient are calculated respectively, and calculates each sum of products obtain described Ambient sound feature.

4. sound effect treatment method according to claim 3, it is characterised in that the predeterminated frequency section includes 20hz- 200hz, 200hz-700hz, 700hz-2000hz, 2000hz-7000hz and 7000hz-15000hz.

5. sound effect treatment method according to claim 1, it is characterised in that the environment in the collection preset time range Before voice signal, also include：

Respectively to the voice signal in multiple model scenes collection preset duration, multiple reference voice signals, the model are obtained The quantity of scene is equal with the quantity of the reference voice feature, and the preset duration is corresponding more than the preset time range Duration；

According to the ratio between preset duration duration corresponding with the preset time range, each reference voice signal is divided into multiple Signal segment；

The sound characteristic of each signal segment is extracted, the sound characteristic according to each signal segment obtains the sound of corresponding reference voice signal Feature obtains the reference voice feature.

6. a kind of sound effect processing system, it is characterised in that including：

Reference voice characteristic selecting module, for being chosen and the ambient sound feature from default multiple reference voice features The maximum reference voice feature of similarity；

Matching audio pattern searching modul, the corresponding default audio pattern of reference voice feature for searching with choose, obtains Matching audio pattern；

7. sound effect processing system according to claim 6, it is characterised in that the preset time range is to receive broadcasting The moment of instruction is initial time, the time range with preset value as duration.

8. sound effect processing system according to claim 6, it is characterised in that the ambient sound feature acquisition module bag Include：

AD conversion unit, for the environmental sound signal to be converted into data signal；

Spectral analysis unit, obtains including the frequency information of multiple Frequency points for carrying out the data signal spectrum analysis；

Characteristic value computing unit, for calculating the flat of the Frequency point in each predeterminated frequency section respectively according to the frequency information Average, as the characteristic value of each predeterminated frequency section；

Ambient sound feature calculation unit, for calculating the characteristic value of each predeterminated frequency section and multiplying for corresponding predetermined coefficient respectively Accumulate, and calculate each sum of products and obtain the ambient sound feature.

9. sound effect processing system according to claim 8, it is characterised in that the predeterminated frequency section includes 20hz- 200hz, 200hz-700hz, 700hz-2000hz, 2000hz-7000hz and 7000hz-15000hz.

10. sound effect processing system according to claim 6, it is characterised in that also include：

Reference voice signal acquisition module, for the voice signal in multiple model scenes collection preset duration, obtaining respectively Multiple reference voice signals, the quantity of the model scene is equal with the quantity of the reference voice feature, the preset duration Duration corresponding more than the preset time range；

Reference voice signal segmentation module, for according to preset duration duration corresponding with the preset time range it Than each reference voice signal is divided into multiple signal segments；

Reference voice feature acquisition module, the sound characteristic for extracting each signal segment, the sound characteristic according to each signal segment is obtained The sound characteristic for taking corresponding reference voice signal obtains the reference voice feature.