CN109841227B

CN109841227B - Background noise removing method based on learning compensation

Info

Publication number: CN109841227B
Application number: CN201910182463.7A
Authority: CN
Inventors: 张晖; 高财政; 赵海涛; 孙雁飞; 朱洪波
Original assignee: Nanjing University of Posts and Telecommunications
Current assignee: Nanjing University of Posts and Telecommunications
Priority date: 2019-03-11
Filing date: 2019-03-11
Publication date: 2020-10-02
Anticipated expiration: 2039-03-11
Also published as: CN109841227A

Abstract

The invention discloses a background noise removing method based on learning compensation, which comprises the following steps: step (1): dividing the conference scene background noise data set into small conference background noise, medium conference background noise and large conference background noise according to the conference scale; step (2): the background noise estimation specifically includes: step (2.1): learning the characteristics of the background noise by adopting a GMM model, and respectively obtaining background noise distribution of the background noise of the small conference, the background noise of the medium conference and the background noise of the large conference; step (2.2): identifying the background noise of which scale the collected voice signal belongs to through the GMM, and finally selecting the background noise distribution of the corresponding scale according to the identification result; and (3): and according to the estimated background noise distribution, compensating the acquired voice signals by adopting a noise learning compensation algorithm, and removing the background noise in the acquired voice signals. The invention has the advantage of effectively removing background noise.

Description

Background noise removing method based on learning compensation

Technical Field

The invention relates to the field of intelligent conferences, in particular to a background noise removal method based on learning compensation.

Background

The background noise in the conference scene is difficult to draw the distribution rule and characteristics of the background noise due to time variability, instability and complexity, and is even impossible to be classified into a specific class, and moreover, the difference between the background noise is very large in the conference scenes with different scales, which brings great difficulty for removing the conference background noise. Therefore, it is desirable to invent a method for effectively removing the conference background noise in the voice signal.

Disclosure of Invention

The invention aims to provide a background noise removing method based on learning compensation, which can effectively remove background noise in a voice signal.

In order to achieve the purpose, the invention adopts the following technical scheme: a background noise removing method based on learning compensation comprises the following steps:

step (1): scene-based noise classification: according to the conference scale, dividing the conference scene background noise data set into small conference background noise, medium conference background noise and large conference background noise;

step (2): the background noise estimation method specifically comprises the following steps:

step (2.1): learning the characteristics of the background noise by adopting a GMM model, and respectively obtaining background noise distribution of the background noise of the small conference, the background noise of the medium conference and the background noise of the large conference;

step (2.2): identifying the background noise of which scale the collected voice signal belongs to through the GMM, and finally selecting the background noise distribution of the corresponding scale according to the identification result;

and (3): and according to the background noise distribution estimated by the collected voice signals, compensating the collected voice signals by adopting a noise learning compensation algorithm, thereby removing the background noise in the collected voice signals.

Further, the foregoing background noise removing method based on learning compensation, wherein: in step (1), the scene-based noise classification specifically includes: the method comprises the steps of firstly screening representative partial samples which are uniform in background noise distribution and easy to extract from conference scene background noise data in a centralized mode, then dividing the samples into small conference background noise, medium conference background noise and large conference background noise according to the conference scale, then respectively carrying out data cleaning on the classified background noise, then separating background noise signals from sample voice data, splicing the noise signals into a plurality of noise files with the same time length, and finally carrying out manual marking on the noise files to finish classification.

Further, the foregoing background noise removing method based on learning compensation, wherein: in the step (3), in the noise learning compensation algorithm, the specific calculation formula of the speaker signal in the collected speech signal is as follows:

wherein y (t) is the collected voice signal, x (t) is the speech signal, n (t) is the estimated background noise of the voice signal collected in step (2), and k is the adjustment parameter, which is an experimental value.

Through the implementation of the technical scheme, the invention has the beneficial effects that: the background noise in the voice signal can be effectively removed.

Drawings

Fig. 1 is a schematic flow chart of a background noise removing method based on learning compensation according to the present invention.

Detailed Description

The invention is described in further detail below with reference to the figures and specific examples.

As shown in fig. 1, the background noise removing method based on learning compensation includes the following steps:

step (1): scene-based noise classification: firstly, screening out representative partial samples which are uniform in background noise distribution and easy to extract from a conference scene background noise data set, then dividing the samples into small conference background noise, medium conference background noise and large conference background noise according to the conference scale, then respectively carrying out data cleaning on the classified background noise, then separating out background noise signals from sample voice data, splicing the noise signals into a plurality of noise files with consistent time length, and finally carrying out manual marking on the noise files, so that the conference scene background noise data set is divided into the small conference background noise, the medium conference background noise and the large conference background noise;

step (2.1): the characteristics of the background noise are learned by adopting a GMM model, the background noise distributions of the background noise of the small conference, the background noise of the medium conference and the background noise of the large conference are respectively obtained, the distributions describe the characteristics and the rules of the background noise of the corresponding conference, and the amplitude corresponding to the background noise signal at a certain moment can be predicted through the distributions;

step (2.2): identifying the background noise of which scale the collected voice signal belongs to through the GMM, and finally selecting the background noise distribution of the corresponding scale according to the identification result, wherein the selected background noise distribution is the background noise estimation result of the collected voice signal;

and (3): according to the background noise distribution estimated by the collected voice signals, compensating the collected voice signals by adopting a noise learning compensation algorithm so as to remove the background noise in the collected voice signals;

in the noise learning compensation algorithm, the speech signal collected by the collecting device is composed of a speaker signal and a background noise signal, and the relationship is shown in formula 1.1, wherein y (t) is the collected speech signal, x (t) is the speaker signal, n (t) is the background noise estimated in step 2, and w is a self-adaptive background noise adjusting parameter; the previous compensation algorithms do not consider adaptive adjustment, and generally directly obtain a speaker signal from y (t) -n (t), which may bring the result of too much compensation or too little compensation, which directly results in that the background noise is not removed completely or part of the signal associated with the speaker is also removed, and in order to improve the situation, the invention designs adaptive adjustment of background noise parameters;

y(t)＝x(t)+n(t)·w (1.1)

the research shows that: n (t) estimated by background noise is a signal irrelevant to y (t), the characteristics of y (t) distribution are not considered, the signal can only represent the distribution situation of conference scene noise under the average condition, the acquisition equipment is placed at different positions of a conference scene, and the amplitude values of the acquired background noise are different, so that on the basis, the solving process of w is designed to be shown in a formula (1.2), the selection of w fully considers the time domain distribution of k moments before the t moment, the compensation is carried out according to the principle that the larger the amplitude value is, the more the noise compensation is, and the background noise parameters under different environments can be well adaptively adjusted;

therefore, the specific calculation formula of the speaker signal in the collected voice signal is as follows:

wherein, y (t) is the collected voice signal, x (t) is the speech signal, n (t) is the background noise estimated from the voice signal collected in the step (2), k is the adjusting parameter, is an experimental value, and the value of k is flexibly selected according to the specific conference scene characteristics;

as can be seen from equation (1.3), the speaker signal with background noise removed at any time can be obtained as long as n (t) and y (t) are known.

The invention has the advantage of effectively removing the background noise in the voice signal.

Claims

1. A background noise removing method based on learning compensation is characterized in that: the method comprises the following steps:

wherein: the scene-based noise classification specifically includes: firstly, screening out representative partial samples which are uniform in background noise distribution and easy to extract from a conference scene background noise data set, then dividing the samples into small conference background noise, medium conference background noise and large conference background noise according to the conference scale, then respectively carrying out data cleaning on the classified background noise, then separating out background noise signals from sample voice data, splicing the noise signals into a plurality of noise files with consistent time length, and finally carrying out manual marking on the noise files, so that the conference scene background noise data set is divided into the small conference background noise, the medium conference background noise and the large conference background noise;

wherein, in the noise learning compensation algorithm, the voice signal collected by the collecting device is composed of the speaker signal and the background noise signal, the relation between the speaker signal and the background noise signal is shown in the formula 1.1,

y(t)＝x(t)+n(t)·w(1.1)

wherein, y (t) is the collected voice signal, x (t) is the speech signal, n (t) is the background noise estimated in step 2, and w is the adaptive adjustment background noise parameter;

the solving process of w is shown as a formula (1.2), and the selection of w fully considers the time domain distribution of k moments before t moment;

therefore, the specific calculation of the speaker signal in the collected speech signal is shown in equation 1.3:

wherein y (t) is the collected voice signal, x (t) is the speech signal, n (t) is the background noise estimated from the voice signal collected in step (2), and k is the adjustment parameter.