CN112331225A

CN112331225A - Method and device for assisting hearing in high-noise environment

Info

Publication number: CN112331225A
Application number: CN202011159182.9A
Authority: CN
Inventors: 周宇阳
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2020-10-26
Filing date: 2020-10-26
Publication date: 2021-02-05
Anticipated expiration: 2040-10-26
Also published as: CN112331225B

Abstract

The invention provides a method and a device for assisting hearing in a high-noise environment. The method comprises the steps of obtaining noise information in the environment and establishing a noise sample database; acquiring a plurality of voice information, and establishing a voice sample database; acquiring voice information communicated by workers in a high-noise environment; processing the voice information based on a noise sample database and a human voice sample database to obtain clean voice; and outputting the clean voice.

Description

Method and device for assisting hearing in high-noise environment

Technical Field

The invention relates to the technical field of voice separation, in particular to a method and a device for assisting hearing in a high-noise environment.

Background

At present, voice is one of the most direct ways for human to perform information interaction, people are always interfered by other sounds when acquiring voice, and especially people working in a high-noise environment are more easily interfered by noise, so that two parties of workers cannot perform effective information interaction, the working efficiency is seriously influenced, and voice separation as a preprocessing scheme is an effective way for suppressing interference;

speech separation refers to the task of separating the target speech from background interference. At present, the method of auditory scene analysis, nonnegative matrix decomposition and the like is mainly utilized for voice separation, the method is simple to implement, has large limitation and few applicable scenes, and has the defects of rapid performance reduction in the presence of noise, failure of considering voice characteristics, damage to voice and failure of considering a high-noise voice environment;

therefore, the invention provides a method and a device for assisting hearing in a high-noise environment to solve the problem that workers are difficult to communicate with each other in the high-noise environment.

Disclosure of Invention

The invention provides a method and a device for assisting hearing in a high-noise environment, which are used for solving the problem that workers are difficult to communicate with each other in the high-noise environment.

A method of assisting hearing in a high noise environment, comprising:

acquiring noise information in an environment, and establishing a noise sample database;

acquiring a plurality of voice information, and establishing a voice sample database;

acquiring voice information communicated by workers in a high-noise environment;

processing the voice information based on a noise sample database and a human voice sample database to obtain clean voice;

and outputting the clean voice.

As an embodiment of the present invention, the acquiring of the voice information and establishing a voice sample database includes:

performing analog-to-digital conversion on the voice information to obtain digital signals of the voice information;

processing the digital signals by utilizing a fast Fourier transform technology to obtain a plurality of frequency spectrums of the voice information;

obtaining the voice frequency information of each time point according to the frequency spectrum;

and establishing a voice sample database according to the voice frequency information of each time point.

As an embodiment of the present invention, the processing the digital signal by using a fast fourier transform technique to obtain a plurality of frequency spectrums of the human voice information includes:

the frequency spectrum of several vocal information is calculated by FFT:

wherein ,

e is a natural number logarithm, p is 0,1, …, M-1, x (N) is an N point sequence;

wherein ,T_p(theta) calculating a value in the frequency spectrum of several vocal information for the FFT,

is a positive integer of 0<＝θ<＝α-1。

As an embodiment of the present invention, the processing the voice information based on the noise sample database and the human voice sample database to obtain a clean voice includes:

obtaining a noise frequency threshold according to the noise sample database;

according to the noise frequency threshold, performing first processing on the voice information to obtain first filtered voice information; the first processing is filtering frequency signals in the voice information, wherein the frequency signals are higher than the noise frequency threshold;

and matching the first-time filtered voice information with the voice sample database, and filtering frequency signals with a difference larger than a preset difference value from a preset mean value in the voice sample database in the first-time filtered voice information to obtain clean voice information.

As an embodiment of the present invention, the obtaining a noise frequency threshold according to the noise sample database includes:

calculating a noise frequency threshold:

where v is the noise frequency threshold, F_iThe range of the sample frequency information in the noise sample database is shown, N is the number of samples in the noise sample database, pi is the circumferential rate, k is the stiffness coefficient, and m is the mass.

As an embodiment of the present invention, the noise frequency threshold is determined according to a high noise environment;

the high noise environment is determined by noise information in the high noise environment acquired within a preset time; wherein,

the high noise environment includes: traffic noise and industrial noise.

As an embodiment of the present invention, the determining of the high noise environment from noise information in the high noise environment acquired within a preset time includes:

acquiring noise information in a high-noise environment within a preset time, and obtaining a digital signal of the voice information through analog-to-digital conversion;

obtaining a noise frequency waveform according to the digital signal, and filtering an isolated waveform in the noise frequency waveform to obtain a section of continuous noise frequency waveform;

taking out the maximum value in the frequency range of the continuous noise waveform, and comparing the maximum value with a noise frequency threshold value in the noise sample database to obtain the most similar noise frequency threshold value;

determining a high noise environment according to the most similar noise frequency threshold;

wherein the noise frequency threshold corresponds to the high noise environment one to one.

An apparatus for assisting hearing in a high noise environment, comprising:

the acquisition module is used for acquiring noise information, a plurality of voice information and voice information in the environment;

the creating module is used for creating a noise sample database according to the noise information acquired in the acquiring module to obtain a noise frequency threshold value, and creating a voice sample database according to the voice information acquired in the acquiring module;

the comparison module is used for comparing the voice information in the environment with a noise frequency threshold value and determining the frequency information of which the voice information frequency is greater than the noise frequency threshold value in the environment;

the filtering module is used for filtering frequency information of which the frequency is greater than a noise frequency threshold value in the environment by using a filter to obtain first-time filtered voice information;

the matching module is used for matching the first-time filtered voice information with the voice sample database, and filtering frequency signals with difference larger than a preset difference value from a preset mean value in the voice sample database in the first-time filtered voice information to obtain clean voice;

and the transmission module is used for synthesizing the clean voice into voice segments and transmitting the voice segments to a receiver.

As an embodiment of the present invention, the creating module performs the following operations:

according to the acquired noise information, establishing an industrial noise sample database, a traffic noise sample database and a mixed noise sample database;

establishing at least 3 types of human voice sample databases according to the acquired human voice information; wherein the 3-type human voice sample database comprises: a voice sample database for the adult male, a voice sample database for the adult female and a voice sample database for the elderly male.

As an embodiment of the present invention, the matching module performs operations including:

if multi-user voice exists in the voice information of the second filtering, multi-voice sections in the clean voice are separated into single voice sections by utilizing a multi-user voice separation technology in the voice separation technology.

The invention has the beneficial effects that: avoid the influence of noise to pronunciation interchange for the staff also can be fast clear under high noise environment hear the meaning that the other side wanted the expression, help improving work efficiency, reduce the mood dysphoria because of the noise brings, the reinforcing makes things convenient for the mutual exchange between the staff to the protection of staff's hearing under the high noise environment.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and drawings.

The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:

fig. 1 is a flowchart of a method and an apparatus for assisting hearing in a high noise environment according to an embodiment of the present invention;

fig. 2 is a flowchart of an apparatus and a method for assisting hearing in a high noise environment according to an embodiment of the present invention.

Detailed Description

The preferred embodiments of the present invention will be described in conjunction with the accompanying drawings, and it will be understood that they are described herein for the purpose of illustration and explanation and not limitation.

Example 1:

as shown in fig. 1, an embodiment of the present invention provides a method for assisting hearing in a high-noise environment, including:

step S101: acquiring noise information in an environment, and establishing a noise sample database;

step S102: acquiring a plurality of voice information, and establishing a voice sample database;

step S103: acquiring voice information communicated by workers in a high-noise environment;

step S104: processing the voice information based on a noise sample database and a human voice sample database to obtain clean voice;

step S105: outputting the clean voice;

the working principle of the technical scheme is as follows: collecting noise information which possibly occurs in a working environment, such as traffic noise and industrial noise, through a microphone, so as to establish a noise sample database, obtaining a preset noise frequency threshold value through integrating the noise sample database, collecting a plurality of voice information through the microphone, wherein the voice information comprises but is not limited to voice information of a strong male, voice information of an adult female and voice information of an old male, establishing a voice sample database, making a frequency spectrum of the voice sample database through time domain and frequency domain conversion by utilizing an FFT (fast Fourier transform algorithm), and obtaining a voice frequency mean value through counting and integrating the frequency spectrum of the voice sample database; acquiring voice information in an environment through a microphone, converting the voice information into an analog signal based on a voice separation technology, filtering frequency signals higher than a preset noise frequency threshold value through a filter to obtain first-time filtered voice information, matching the first-time voice information with a human voice frequency mean value, filtering frequency signals with a difference larger than a preset difference value from the human voice frequency mean value in the frequency information of the first-time filtered voice information to obtain second-time filtered voice information, separating target voice through the voice separation technology if multi-person voice exists in the second-time voice information, converting multi-voice segments into single voice segments, transmitting the single voice segments, and directly transmitting the second language information if the multi-person voice does not exist in the second language information;

the beneficial effects of the above technical scheme are: noise information is integrated, a noise information threshold value is preset, the influence of high noise on voice communication is eliminated, and the influence of high noise on voice communication is eliminated secondarily through matching of a human voice information mean value and voice information in the environment; the voice communication method and the voice communication device can avoid the influence of noise on voice communication, so that workers can quickly and clearly hear the meaning of the people who want to express in a high-noise environment, the work efficiency is improved, the mood and the fidget caused by the noise are reduced, the hearing protection of the workers in the high-noise environment is enhanced, and the mutual communication among the workers is facilitated.

Example 2:

in one embodiment, the acquiring of the voice information and establishing a voice sample database includes:

establishing a voice sample database according to the voice frequency information of each time point;

the working principle of the technical scheme is as follows: converting a plurality of collected voice information into analog signals through a microphone, converting the analog signals of the voice information into digital signals through an ADC (analog to digital converter) module, converting the digital signals of the voice information into continuous frequency spectrums of the voice information by using an FFT (fast Fourier transform) technology, and realizing the conversion of time domains and frequency domains through the FFT; acquiring voice frequency information of each time point, establishing a voice sample database according to the voice frequency information of each time point, and acquiring a voice frequency mean value;

the beneficial effects of the above technical scheme are: by utilizing the fast Fourier transform technology, the integration of a plurality of voice frequency information is efficiently achieved, the voice frequency mean value is obtained, the voice information in the deep filtering environment is benefited, and the communication of workers is more convenient.

Example 3:

in one embodiment, the obtaining the frequency spectrum of the voice information by using the fast fourier transform technique includes:

the frequency spectrum of several vocal information is calculated by FFT:

wherein ,

is a positive integer of 0<＝θ<＝α-1；

The working principle of the technical scheme is as follows: assuming that the sampling frequency of the human voice signal is fs, when performing an N-point FFT on the human voice signal, the frequency interval between two points of the FFY result is fs/N, i.e., the frequency represented by any point p (p is 0 to M-1) is p x fs/N,

is a positive integer of 0<＝θ<Alpha-1, thereby making a plurality of frequency spectrums of the voice information;

the beneficial effects of the above technical scheme are: the frequency spectrum of a plurality of voice information is manufactured, so that the fluctuation condition of the voice information frequency can be visually obtained, and the accuracy of the voice frequency mean value is improved.

Example 4:

in one embodiment, the processing the voice information based on the noise sample database and the voice sample database to obtain clean voice includes:

obtaining a noise frequency threshold according to the noise sample database;

matching the first-time filtered voice information with the voice sample database, and filtering frequency signals with a difference larger than a preset difference value from a preset mean value in the voice sample database in the first-time filtered voice information to obtain clean voice information;

the working principle of the technical scheme is as follows: collecting voice information in an environment through a microphone, utilizing voice denoising in a voice separation technology, filtering frequency signals with frequencies higher than a preset noise frequency threshold value in the voice information through a filter to obtain first-time filtering voice, comparing the obtained first-time filtering voice with a human voice frequency mean value, filtering frequency signals with a difference larger than a preset difference value from the human voice frequency mean value in the frequency information of the first-time filtering voice information, and converting the filtered frequency signals into digital signals, namely clean voice information;

the beneficial effects of the above technical scheme are: the voice separation technology is utilized, noise in voice information is reduced beneficially, definition of target voice is improved, communication of workers is faster, and work efficiency is improved.

Example 5:

in one embodiment, said obtaining a noise frequency threshold from said noise sample database comprises:

calculating a noise frequency threshold:

where v is the noise frequency threshold, F_iThe method comprises the steps of obtaining a sample frequency information range in a noise sample database, wherein N is the number of samples in the noise sample database, pi is a circumferential rate, k is a stiffness coefficient, and m is mass;

the working principle of the technical scheme is as follows: acquiring the maximum values of all sample frequencies in a sample database, adding the maximum values to calculate an average value, and removing the influence of sound frequency generated by the inherent vibration of the device on collected data to obtain a noise frequency threshold, wherein k is the stiffness coefficient of the device, and m is the quality of the device;

the beneficial effects of the above technical scheme are: the device noise filtering precision is improved.

Example 6:

in one embodiment, the noise frequency threshold is determined based on a high noise environment;

the high noise environment includes: traffic noise and industrial noise;

the working principle of the technical scheme is as follows: determining a corresponding noise frequency threshold according to the acquired noise information in the preset time of the high noise environment, and determining the high noise environment;

the beneficial effects of the above technical scheme are: different noise frequency thresholds are selected in different high-noise environments, and the noise filtering precision is improved.

Example 7:

in one embodiment, the determining of the high noise environment by the noise information in the high noise environment acquired within the preset time includes:

wherein the noise frequency threshold corresponds to the high noise environment one to one;

the working principle of the technical scheme is as follows: the obtained noise information is converted into a digital signal through an analog-digital converter, a noise frequency waveform is obtained according to the digital signal, an isolated waveform in the noise frequency waveform is filtered, and a section of continuous noise frequency waveform is obtained. The isolated waveform refers to a waveform that is discontinuous or fluctuates greatly. Selecting the maximum value of the continuous noise frequency waveform, comparing the maximum value with all noise frequency threshold values in a noise sample database, selecting the noise frequency threshold value with the most similar comparison result, and determining the current high-noise environment;

the beneficial effects of the above technical scheme are: different high noise environments use different noise frequency thresholds to filter, improve the precision of filtering the noise, strengthen the effect of supplementary hearing.

Example 8:

as shown in fig. 2, an embodiment of the present invention provides a device for assisting hearing in a high noise environment, including:

step S201: the acquisition module is used for acquiring noise information, a plurality of voice information and voice information in the environment;

step S202: the creating module is used for creating a noise sample database according to the noise information acquired in the acquiring module and creating a voice sample database according to the voice information acquired in the acquiring module to obtain a preset noise frequency threshold value and a voice frequency mean value;

step S203: the comparison module is used for comparing the voice information in the environment with a preset noise frequency threshold value and determining the frequency information of which the voice information frequency in the environment is greater than the preset noise frequency threshold value;

step S204: the filtering module is used for filtering frequency information of which the frequency is greater than a preset noise frequency threshold value in the environment by using a filter to obtain first-time filtered voice information;

step S205: the matching module is used for matching the first-time filtered voice information with the voice frequency mean value, and filtering frequency signals with a difference larger than a preset difference value from the voice frequency mean value in the frequency information of the first-time filtered voice information to obtain second-time filtered voice information;

step S206: the transmission module is used for synthesizing the second-time filtered voice information into voice sections and transmitting the voice sections to a receiver;

Example 9:

in one embodiment, the creation module performs the following operations:

establishing at least 3 types of human voice sample databases according to the acquired human voice information; wherein the 3-type human voice sample database comprises: a voice sample database for the adult male, a voice sample database for the adult female and a voice sample database for the old male;

the working principle of the technical scheme is as follows: collecting whistling sound, automobile rumbling sound, ship noise, mechanical noise, aerodynamic noise and electromagnetic noise through a microphone, converting the collected sound into an analog signal through the microphone, converting the analog signal into a digital signal through an ADC (analog to digital converter) module, and integrating the digital signal through simulation to obtain a traffic noise sample database, an industrial noise frequency database and a mixed noise database; collecting voice information of young and old people in a working environment through a microphone, establishing a voice sample database of a male in the young, collecting voice information of an adult female in the working environment through the microphone, establishing a voice sample database of the adult female, collecting voice information of an old male in the working environment through the microphone, and establishing a voice sample database of the old male;

the beneficial effects of the above technical scheme are: different preset noise information thresholds are set according to different noise information, so that the noise filtering accuracy is improved; aiming at different crowds, different voice sample databases are established, and accurate assistance of the device to the audition of different crowds is realized.

Example 10:

in one embodiment, the matching module performs operations comprising:

if multi-user voice exists in the second-time filtered voice information, separating multi-voice sections in the second-time filtered voice information into single voice sections by utilizing a multi-user voice separation technology in a voice separation technology;

the working principle of the technical scheme is as follows: separating the multi-person voice speech section in the second filtered speech into a single speech section by utilizing a multi-person voice separation technology in the speech separation technology;

the beneficial effects of the above technical scheme are: the target voice definition is improved, simultaneous communication of multiple persons is facilitated, and the working efficiency is improved.

Example 11:

the invention also provides a method for assisting hearing in a high-noise environment, which comprises the following steps:

step S301: establishing a plurality of noise sample databases based on first positioning information of a high-noise environment, equipment operation information in the high-noise environment and noise information in the high-noise environment;

the main source of noise in the high-noise environment is the operation condition of each operating device in the environment, and when the device operation information in the high-noise environment is different, different noise sample databases are established, namely the noise sample databases take the positioning information and the device operation information as first calling tags; the accurate noise sample database is called;

step S302: acquiring a plurality of voice information, and establishing a voice sample database;

for example: generally, workers in the same working environment are fixed or do not change too much, so that a single-ended voice sample database can be established in advance according to each worker; then establishing a second calling label according to the sound characteristics of the staff;

step S303: acquiring second positioning information of a worker, matching the second positioning information with the first positioning information, and acquiring voice information communicated by the worker in a high-noise environment when the second positioning information is matched with the first positioning information;

the second positioning information confirms whether the worker is in a high-noise environment, and after a corresponding noise sample database is established in the high-noise environment where the worker is likely to be located in advance, the worker can be considered not to be in the high-noise environment when the position represented by the second positioning information is not the position where the first positioning information of the noise sample database is established; voice data information is not acquired.

Step S304: acquiring equipment operation information of the working environment of the current worker based on the second positioning information;

and the server acquires the operation information of each device by inquiring the device at the position of the second positioning information.

Step S305: calling a corresponding noise sample database based on the second positioning information and the equipment operation information of the current working environment of the working personnel;

and matching the second positioning information with the first positioning information in the first calling tag, and matching the equipment operation information of the current working environment of the working personnel with the equipment operation information in the first calling tag so as to match a noise sample database corresponding to the first calling tag.

Step S306: denoising the voice information based on a noise sample database to obtain denoised voice;

the initial denoising of the current speaking voice information of the working personnel is realized through the pre-established noise sample database, and the denoising effect is improved by adopting the accurate noise sample database.

Step S307: performing feature extraction on the de-noised voice to obtain a first sound feature; calling a corresponding human voice sample database based on the first voice characteristics;

the first sound characteristic is a sound characteristic for identifying differences among persons to which the speech belongs, and comprises timbre, loudness, tone and the like.

Step S308: matching the denoised voice with the called sample voice in the human voice sample database to obtain clean voice;

the matching method comprises the following steps: performing secondary feature extraction on the denoised voice, extracting a second sound feature, calculating the similarity between the second sound feature and the second sound feature of the sample voice, and calling the sample voice with the maximum similarity as clean voice; the second sound characteristic includes: short-time energy spectrum, formant frequency, amplitude spectrum, etc

Step S309: outputting the clean voice;

clean pronunciation broadcast is given another staff through pronunciation broadcast equipment, realizes two staff's exchange under the high noise environment, realizes the cooperation of staff between the high noise environment, avoids instruction or other to take place to convey the mistake under the interference of high noise, causes unexpected loss or accident.

Under the condition of not considering the denoising effect, the embodiment can also have another feasible scheme, namely, the existing denoising method is directly adopted for denoising, and then the denoised voice is directly matched with the sample voice in the human voice sample database to obtain the clean voice.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A method of assisting hearing in a high noise environment, comprising:

and outputting the clean voice.

2. The method for assisting hearing in a high noise environment according to claim 1, wherein the acquiring a plurality of voice information and establishing a voice sample database comprises:

3. The method for assisting hearing in a high noise environment according to claim 2, wherein the processing the digital signal using a fast fourier transform technique to obtain a plurality of frequency spectrums of the vocal information comprises:

the frequency spectrum of several vocal information is calculated by FFT:

wherein ,

is a positive integer of 0<＝θ<＝α-1。

4. The method of assisting in hearing in a high noise environment according to claim 1, wherein the processing the speech information based on a noise sample database and a human voice sample database to obtain clean speech comprises:

obtaining a noise frequency threshold according to the noise sample database;

5. The method of assisting a hearing in a high noise environment according to claim 4, wherein said deriving a noise frequency threshold from the noise sample database comprises:

calculating a noise frequency threshold:

6. The method for assisting a hearing in a high noise environment of claim 5,

the noise frequency threshold is determined according to a high noise environment;

the high noise environment includes: traffic noise and industrial noise.

7. The method of assisting a hearing in a high noise environment according to claim 6, wherein the high noise environment is determined by noise information in the high noise environment acquired within a preset time, and the method comprises:

8. An apparatus for assisting hearing in a high noise environment, comprising:

9. The device for assisting hearing in a high noise environment of claim 8, wherein the creating module performs the following operations:

10. The device for assisting hearing in a high noise environment of claim 8, wherein the matching module performs the following operations: