CN104123950A

CN104123950A - Sound recording method and device

Info

Publication number: CN104123950A
Application number: CN201410341500.1A
Authority: CN
Inventors: 孙丽
Original assignee: Shenzhen ZTE Mobile Telecom Co Ltd
Current assignee: Shenzhen ZTE Mobile Telecom Co Ltd
Priority date: 2014-07-17
Filing date: 2014-07-17
Publication date: 2014-10-29
Anticipated expiration: 2034-07-17
Also published as: CN104123950B

Abstract

The invention discloses a sound recording method and a sound recording device, and belongs to the technical field of audio processing. The sound recording method includes: detecting a sound source; separating the detected sound source; obtaining and storing sound data of each independent sound source which is separated out of the sound source; analyzing the stored sound data of each independent sound source, and adjusting sound effects of each independent sound source according to an analysis result; performing sound mixing processing on each independent sound source after being adjusted; storing sound signals obtained through the sound mixing processing in preset file format. By using the sound recording method and the sound recording device, the sound source can be separated into the independent sound sources, and therefore ideal sound mixing effects are achieved by adjusting the sound effects of all the independent sound sources, a voice cracking phenomenon in close range sound recording is avoided on the premise that long distance sound recording effects are achieved, and sound recording effects are effectively improved under various environments.

Description

A kind of way of recording and device

Technical field

The present invention relates to audio signal processing technique field, relate in particular to a kind of way of recording and device.

Background technology

In the scenes such as meeting scene or interview scene, all need to utilize microphone to record.In the process of recording, people wish main spokesman's speech to be recorded clearly conventionally, therefore when microphone is far away apart from spokesman, need to heighten the amplification coefficient of microphone to guarantee recording effect, but in most of the cases, the environment of recording site all can not keep absolute peace and quiet, now, if having people closely to speak while recording, just easily there is unsweet sound phenomenon in recording, make recording effect poor, reduced user's experience.

Summary of the invention

In view of this, the technical problem to be solved in the present invention is to provide a kind of way of recording and device, to solve when prior art is recorded, to the sound source of recording site, can not with due attention to both short and long-range targets, and the defect that recording effect is poor.

It is as follows that the present invention solves the problems of the technologies described above adopted technical scheme:

According to an aspect of the present invention, provide a kind of way of recording, comprise the steps: to detect voice sound source; The voice sound source detecting is carried out to separation; Obtain and store the speech data of each individual sources of separating; Speech data to each individual sources of storage is analyzed, and according to analysis result, adjusts the sound effect of described each individual sources; Each individual sources after adjusting is carried out to stereo process; The sound signal that described stereo process is obtained is stored according to predetermined file layout.

A kind of method as above is provided, described in obtain and store the speech data of each individual sources of separating, comprising: the sound signal of obtaining described each individual sources; Described each individual sources is carried out to auditory localization, obtain the azimuth information of each individual sources; The speech data of described each individual sources is stored into respectively in different memory locations, and wherein the speech data of each individual sources includes sound signal and azimuth information.

A kind of method as above is provided, the described speech data to each individual sources of storage is analyzed, and according to the sound effect of described each individual sources of analysis result adjustment, comprising: the root-mean-square value of the sound signal of the speech data of each individual sources in Preset Time window described in calculating; According to the azimuth information of described speech data and/or described root-mean-square value, judge whether the individual sources that described speech data is corresponding is near field sound source; If near field sound source, judges whether described root-mean-square value surpasses pre-set threshold value, and when surpassing described pre-set threshold value, the sound signal of described speech data is carried out to audio and weaken processing; If not near field sound source judges that whether described root-mean-square value is lower than described pre-set threshold value, and when lower than described pre-set threshold value, the sound signal of described speech data is carried out to audio and strengthen processing.

A kind of method as above is provided, and described detection voice sound source, comprising: adopt the voice activity detection algorithms based on frequency spectrum variance, the voice sound source under noisy environment is detected.

A kind of method as above is provided, described the voice sound source detecting is carried out to separation, comprising: adopt the sound source analytical approach based on independent component analysis that described voice sound source is separated into a plurality of individual sources.

According to another aspect of the present invention, provide a kind of recording device, described device comprises: sound Sources Detection unit, for detection of voice sound source; Sound source separative element, for to described sound Sources Detection unit inspection to voice sound source carry out separation; Data processing unit, for obtaining the speech data of each individual sources that described sound source separative element separates; Storage unit, for storing the speech data of each individual sources that described data processing unit obtains; Described data processing unit, also for analyzing the speech data of each individual sources of described cell stores; Audio adjustment unit, for adjusting the sound effect of described each individual sources according to the analysis result of described sound source processing unit; Audio mixing unit, for carrying out stereo process to each individual sources after the adjustment of described audio adjustment unit; Described storage unit, also stores according to predetermined file layout for the sound signal that described audio mixing unit stereo process is obtained.

A kind of device as above is provided, and described data processing unit, also for obtaining the sound signal of described each individual sources, and carries out auditory localization to described each individual sources, obtains the azimuth information of each individual sources; Described storage unit, also, for storing respectively the speech data of described each individual sources into different memory location, wherein the speech data of each individual sources includes sound signal and azimuth information.

A kind of device as above is provided, described data processing unit, the root-mean-square value in Preset Time window for the sound signal of speech data of calculating each individual sources also, and according to the azimuth information of described speech data and/or described root-mean-square value, judge whether the individual sources that described speech data is corresponding is near field sound source, and when the described individual sources of judgement is near field sound source, judge whether described root-mean-square value surpasses pre-set threshold value, when the described individual sources of judgement is far field sound source, judge that whether described root-mean-square value is lower than described pre-set threshold value; Described audio adjustment unit, also for judging that at described data processing unit the individual sources that described speech data is corresponding is near field sound source, and when described root-mean-square value surpasses described pre-set threshold value, the sound signal of described speech data is carried out to audio and weaken processing, and for judging that at described data processing unit the individual sources that described speech data is corresponding is far field sound source, and described root-mean-square value is during lower than described pre-set threshold value, the sound signal of described speech data carried out to audio and strengthen and process.

A kind of device as above is provided, and described sound Sources Detection unit, also, for adopting the voice activity detection algorithms based on frequency spectrum variance, detects the voice sound source under noisy environment.

A kind of device as above is provided, and described sound source separative element, also for adopting the sound source analytical approach based on independent component analysis that described voice sound source is separated into a plurality of individual sources.

The way of recording of the present invention and device, voice sound source under can noisy environment detects, effectively remove noise, and can be to the voice sound source detecting be carried out to separation, speech data to the individual sources of separating is analyzed, and according to analysis result, the audio of each individual sources is adjusted, and makes each individual sources all can reach desirable audio, again each individual sources after processing is carried out to stereo process, just can access desirable recording effect.Also can distinguish near field and far field sound source simultaneously, can effectively near field sound source be weakened according to actual conditions, and far field sound source is strengthened, when guaranteeing remote recording effect, can be good at avoiding the unsweet sound phenomenon of closely recording, effectively raise the recording effect under various environment.

Accompanying drawing explanation

The modular structure schematic diagram of a kind of recording device that Fig. 1 provides for the embodiment of the present invention;

The process flow diagram of a kind of way of recording that Fig. 2 provides for the embodiment of the present invention;

Fig. 3 is the process flow diagram of step S203 in Fig. 2;

Fig. 4 is the process flow diagram of step S204 in Fig. 2.

Embodiment

In order to make technical matters to be solved by this invention, technical scheme and beneficial effect clearer, clear, below in conjunction with drawings and Examples, the present invention is further elaborated.Should be appreciated that specific embodiment described herein, only in order to explain the present invention, is not intended to limit the present invention.

Refer to Fig. 1, the invention provides a kind of recording device, this device comprises sound Sources Detection unit 110, sound source separative element 120, data processing unit 130, storage unit 140, audio adjustment unit 150 and audio mixing unit 160.

Sound Sources Detection unit 110, for detection of voice sound source.

Concrete, sound Sources Detection unit 110 can adopt as (the Voice Activity Detection of the voice activity detection based on frequency spectrum variance, VAD) algorithm, this algorithm can detect the voice sound source under noise circumstance, and voice sound source is extracted, thereby effectively remove the noise under noisy environment background, retain voice sound source clearly.

Sound source separative element 120, for to this sound Sources Detection unit inspection to voice sound source carry out separation.

Concrete, can adopt the sound of isolating each sound source in multi-acoustical as the sound source analytical approach based on independent component analysis, it makes full use of between sound source the source signal of sound source is this fact independently.In independent component analysis, according to sound source quantity, use dimension to equal the linear filter of number of microphone, when the quantity of sound source is less than the quantity of microphone, can recover source signal completely; When sound source quantity surpasses number of microphone, can use L1 Norm minimum method.

Data processing unit 130, for obtaining the speech data of each individual sources that this sound source separative element separates.

Concrete, first data processing unit 130 obtains the sound signal of each individual sources, and each individual sources is carried out to auditory localization, obtain the azimuth information of each individual sources, using the sound signal of each individual sources and bearing signal as a voice signal independently.

Storage unit 140, for storing the speech data of each individual sources that this data processing unit 130 obtains.

Concrete, this storage unit 140 stores the speech data of each individual sources respectively in different memory locations, and wherein the speech data of each individual sources includes sound signal and azimuth information.

This data processing unit 130, also for analyzing the speech data of each individual sources of these storage unit 140 storages.

Concrete, unit, data place 130 is analyzed the speech data of each individual sources in the following way: first, according to the sound signal of each speech data, calculate root-mean-square value (the Root Mean Square in schedule time window, RMS), and according to the azimuth information of this speech data and/or RMS value judge corresponding individual sources chamber near field sound source or the far field sound source of this speech data.If judging the individual sources that this speech data is corresponding is near field sound source, judge whether this root-mean-square value surpasses pre-set threshold value; If judging the individual sources that this speech data is corresponding is far field sound source, judge that whether this root-mean-square value is lower than this pre-set threshold value.

It should be noted that, the definition of far field sound source and near field sound source depends on the relative position of recorded sound source.Such as scene has two kinds of sound sources, the orientation of these two kinds of sound sources is different must have so distance relatively minute, wherein, the definition of far field sound source and near field sound source depends on the relative position of recorded sound source.The distance of supposing a recording distance device is 1 meter, and the distance of another recording distance device is 3 meters, and can define apart from the sound source that is 1 meter is near field sound source, and distance is that the sound source of 3 meters is far field sound source.In practical application, near field sound source and far field sound source also can define in the RMS value in predetermined time window in conjunction with sound signal.

Audio adjustment unit 150, for adjusting the sound effect of each individual sources according to the analysis result of this sound source processing unit 130.

Concrete, if individual sources corresponding to data processing unit 130 this speech data of judgement is near field sound source, and this root-mean-square value surpasses this pre-set threshold value, the sound signal of 150 pairs of these speech datas of audio adjustment unit is carried out audio and weaken to be processed, and prevents that the sound over-saturation of the individual sources of marching into the arena from causing distortion; If individual sources corresponding to data processing unit 130 this speech data of judgement is far field sound source, and this root-mean-square value is lower than this pre-set threshold value, and the sound signal of 150 pairs of these speech datas of audio adjustment unit is carried out audio and strengthened and process.

Audio mixing unit, for carrying out stereo process to each individual sources after this audio adjustment unit adjustment.

This storage unit, also stores according to predetermined file layout for the sound signal that this audio mixing unit stereo process is obtained.

Concrete, storage unit can be sent into the sound signal after audio mixing coding module and be encoded into the file layouts such as MP3 or wav and preserve.

It should be noted that, the recording device of the present embodiment, is not only in conference model recording, is also applicable to common video recording, interview isotype.And this recording device also can be arranged on mobile terminal.

The recording device of the present embodiment, is provided with sound Sources Detection unit the voice sound source under noisy environment is detected, and effectively removes noise, also be provided with sound source separative element simultaneously, the voice sound source detecting is carried out to separation, by processing unit, the speech data of the individual sources of separating is analyzed again, and then the audio of each individual sources is adjusted according to analysis result by audio adjustment unit, and can distinguish near field and far field sound source, can effectively near field sound source be weakened according to actual conditions, and far field sound source is strengthened, make each individual sources all can reach desirable audio, finally by audio mixing unit, each individual sources after processing is carried out to stereo process, when guaranteeing remote recording effect, can be good at avoiding the unsweet sound phenomenon of closely recording, effectively raise the recording effect under various environment.

On the basis of above-mentioned recording device embodiment, the present invention also provides a kind of way of recording, refers to Fig. 2, and method flow comprises:

S201, detection voice sound source.

Concrete, this step can adopt as (the Voice Activity Detection of the voice activity detection based on frequency spectrum variance, VAD) algorithm, this algorithm can detect the voice sound source under noise circumstance, and voice sound source is extracted, thereby effectively remove the noise under noisy environment background, retain voice sound source clearly.

S202, the voice sound source detecting is carried out to separation.

Concrete, this step can adopt the sound of isolating each sound source in multi-acoustical as the sound source analytical approach based on independent component analysis, and it makes full use of between sound source the source signal of sound source is this fact independently.In independent component analysis, according to sound source quantity, use dimension to equal the linear filter of number of microphone, when the quantity of sound source is less than the quantity of microphone, can recover source signal completely; When sound source quantity surpasses number of microphone, can use L1 Norm minimum method.

S203, obtain and store the speech data of each individual sources of separating.

Concrete, referring to Fig. 3, this step comprises:

S2031, obtain the sound signal of each individual sources;

S2032, each individual sources is carried out to auditory localization, obtain the azimuth information of each individual sources;

S2033, speech data corresponding to each individual sources stored into respectively in different memory locations, wherein the speech data of each individual sources includes sound signal and azimuth information.

S204, the speech data of each individual sources of storage is analyzed, and according to analysis result, adjusted the sound effect of each individual sources.

Concrete, referring to Fig. 4, this step comprises:

S2041, the root-mean-square value in Preset Time window of the sound signal of speech data that calculates each individual sources;

S2042, according to the azimuth information of this speech data and/or described root-mean-square value, judge whether the individual sources that this speech data is corresponding is near field sound source, if execution step S2043, otherwise execution step S2045;

S4043, judge that whether this root-mean-square value surpasses pre-set threshold value, and when surpassing this pre-set threshold value, execution step S2044;

S2044, the sound signal of this speech data is carried out to audio weaken to process;

S2045, judge that whether this root-mean-square value is lower than this pre-set threshold value, and when lower than this pre-set threshold value, perform step S2046;

S2046, the sound signal of this speech data is carried out to audio strengthen to process.

S205, each individual sources after adjusting is carried out to stereo process;

S206, the sound signal that this stereo process is obtained are stored according to predetermined file layout.

Concrete, this step can be sent into the sound signal after audio mixing coding module and be encoded into the file layouts such as MP3 or wav and preserve

The way of recording of the present embodiment, can the voice sound source under noisy environment be detected, effectively remove noise, and can be to the voice sound source detecting be carried out to separation, speech data to the individual sources of separating is analyzed, and according to analysis result, the audio of each individual sources is adjusted, and makes each individual sources all can reach desirable audio, again each individual sources after processing is carried out to stereo process, just can access desirable recording effect.Also can distinguish near field and far field sound source simultaneously, can effectively near field sound source be weakened according to actual conditions, and far field sound source is strengthened, when guaranteeing remote recording effect, can be good at avoiding the unsweet sound phenomenon of closely recording, effectively raise the recording effect under various environment.

With reference to the accompanying drawings of the preferred embodiments of the present invention, not thereby limit to interest field of the present invention above.Those skilled in the art do not depart from the scope and spirit of the present invention interior done any modification, are equal to and replace and improve, all should be within interest field of the present invention.

Claims

1. a way of recording, is characterized in that, comprises the steps:

Detect voice sound source;

The voice sound source detecting is carried out to separation;

Obtain and store the speech data of each individual sources of separating;

Speech data to each individual sources of storage is analyzed, and according to analysis result, adjusts the sound effect of described each individual sources;

Each individual sources after adjusting is carried out to stereo process;

The sound signal that described stereo process is obtained is stored according to predetermined file layout.

2. method according to claim 1, is characterized in that, described in obtain and store the speech data of each individual sources of separating, comprising:

Obtain the sound signal of described each individual sources;

Described each individual sources is carried out to auditory localization, obtain the azimuth information of each individual sources;

The speech data of described each individual sources is stored into respectively in different memory locations, and wherein the speech data of each individual sources includes sound signal and azimuth information.

3. method according to claim 2, is characterized in that, the described speech data to each individual sources of storage is analyzed, and according to analysis result, adjusts the sound effect of described each individual sources, comprising:

The root-mean-square value of the sound signal of the speech data of each individual sources in Preset Time window described in calculating;

According to the azimuth information of described speech data and/or described root-mean-square value, judge whether the individual sources that described speech data is corresponding is near field sound source;

If near field sound source, judges whether described root-mean-square value surpasses pre-set threshold value, and when surpassing described pre-set threshold value, the sound signal of described speech data is carried out to audio and weaken processing;

If not near field sound source judges that whether described root-mean-square value is lower than described pre-set threshold value, and when lower than described pre-set threshold value, the sound signal of described speech data is carried out to audio and strengthen processing.

4. according to the method described in claims 1 to 3 any one, it is characterized in that, described detection voice sound source, comprising:

The voice activity detection algorithms of employing based on frequency spectrum variance, detects the voice sound source under noisy environment.

5. method according to claim 4, is characterized in that, described the voice sound source detecting is carried out to separation, comprising:

The sound source analytical approach of employing based on independent component analysis is separated into a plurality of individual sources by described voice sound source.

6. a recording device, is characterized in that, described device comprises:

Sound Sources Detection unit, for detection of voice sound source;

Sound source separative element, for to described sound Sources Detection unit inspection to voice sound source carry out separation;

Data processing unit, for obtaining the speech data of each individual sources that described sound source separative element separates;

Storage unit, for storing the speech data of each individual sources that described data processing unit obtains;

Described data processing unit, also for analyzing the speech data of each individual sources of described cell stores;

Audio adjustment unit, for adjusting the sound effect of described each individual sources according to the analysis result of described sound source processing unit;

Audio mixing unit, for carrying out stereo process to each individual sources after the adjustment of described audio adjustment unit;

Described storage unit, also stores according to predetermined file layout for the sound signal that described audio mixing unit stereo process is obtained.

7. device according to claim 6, is characterized in that,

Described data processing unit, also for obtaining the sound signal of described each individual sources, and carries out auditory localization to described each individual sources, obtains the azimuth information of each individual sources;

Described storage unit, also, for storing respectively the speech data of described each individual sources into different memory location, wherein the speech data of each individual sources includes sound signal and azimuth information.

8. device according to claim 7, is characterized in that,

Described data processing unit, the root-mean-square value in Preset Time window for the sound signal of speech data of calculating each individual sources also, and according to the azimuth information of described speech data and/or described root-mean-square value, judge whether the individual sources that described speech data is corresponding is near field sound source, and when the described individual sources of judgement is near field sound source, judge whether described root-mean-square value surpasses pre-set threshold value, when the described individual sources of judgement is far field sound source, judge that whether described root-mean-square value is lower than described pre-set threshold value;

Described audio adjustment unit, also for judging that at described data processing unit the individual sources that described speech data is corresponding is near field sound source, and when described root-mean-square value surpasses described pre-set threshold value, the sound signal of described speech data is carried out to audio and weaken processing, and for judging that at described data processing unit the individual sources that described speech data is corresponding is far field sound source, and described root-mean-square value is during lower than described pre-set threshold value, the sound signal of described speech data carried out to audio and strengthen and process.

9. according to the device described in claim 6 to 8 any one, it is characterized in that described sound Sources Detection unit also, for adopting the voice activity detection algorithms based on frequency spectrum variance, detects the voice sound source under noisy environment.

10. device according to claim 9, is characterized in that, described sound source separative element, also for adopting the sound source analytical approach based on independent component analysis that described voice sound source is separated into a plurality of individual sources.