CN116597856B - Voice quality enhancement method based on frogman intercom - Google Patents

Voice quality enhancement method based on frogman intercom Download PDF

Info

Publication number
CN116597856B
CN116597856B CN202310876048.8A CN202310876048A CN116597856B CN 116597856 B CN116597856 B CN 116597856B CN 202310876048 A CN202310876048 A CN 202310876048A CN 116597856 B CN116597856 B CN 116597856B
Authority
CN
China
Prior art keywords
voice
noise
frogman
signal
definition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310876048.8A
Other languages
Chinese (zh)
Other versions
CN116597856A (en
Inventor
王银畦
王涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Benin Electronic Technology Development Co ltd
Original Assignee
Shandong Benin Electronic Technology Development Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Benin Electronic Technology Development Co ltd filed Critical Shandong Benin Electronic Technology Development Co ltd
Priority to CN202310876048.8A priority Critical patent/CN116597856B/en
Publication of CN116597856A publication Critical patent/CN116597856A/en
Application granted granted Critical
Publication of CN116597856B publication Critical patent/CN116597856B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0316Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude
    • G10L21/0364Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude for improving intelligibility
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04QSELECTING
    • H04Q5/00Selecting arrangements wherein two or more subscriber stations are connected by the same line to the exchange
    • H04Q5/24Selecting arrangements wherein two or more subscriber stations are connected by the same line to the exchange for two-party-line systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0316Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude
    • G10L21/0364Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude for improving intelligibility
    • G10L2021/03643Diver speech
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Abstract

A voice quality enhancement method based on frogman intercom relates to the technical field of voice communication, acquires voice characteristics of frogman and records the voice characteristics into a voice library, establishes a two-way information transmission channel according to voice recognition information of the frogman, and determines the proportion of noise signals with different voice transmission distances in noise-containing voice signals according to the distance of the two-way information transmission channel; setting an evaluation index of voice definition; the method comprises the steps of presetting a desired voice definition evaluation index of a frogman communication target, dynamically adjusting the length of an average window function according to a comparison result of the voice definition in the communication process and the desired voice definition evaluation index, setting a frequency threshold for noise-containing voice based on a human ear sound frequency masking characteristic, acquiring voice characteristics of the noise-containing voice filtered by the frequency threshold, inputting the voice characteristics into a voice broadcasting terminal, and generating voice information conforming to the voice definition of the frogman desired communication target according to the voice characteristics by the voice broadcasting terminal, so that the voice quality under a frogman intercom scene is remarkably improved.

Description

Voice quality enhancement method based on frogman intercom
Technical Field
The application relates to the technical field of voice communication, in particular to a voice quality enhancement method based on frog man intercom.
Background
Frogman intercom refers to voice communication carried out under water, because of the high density of water, the characteristic of sound propagation and the like, the underwater communication is often subjected to a plurality of interferences, so that the voice quality is poor, the communication effect is affected, voice enhancement is an effective method for solving noise pollution, the method is an effective method for extracting original voice which is as pure as possible from a voice signal with noise, and overall, the voice enhancement aims are mainly as follows: improving the voice quality, eliminating the background noise, ensuring that the listener is happy to accept, and not feel tired; the speech intelligibility is improved, and the listener is facilitated to understand.
In the prior art, when the voice is converted into voice characteristics during the voice enhancement of the noise-containing voice transmitted by the frogman, the noise-containing voice is converted according to the fixed window function length, the complexity of the underwater communication process is ignored, the complexity of noise signals is not considered to be higher and higher along with the gradual increase of the communication distance between the frogman, if the voice characteristics are still converted according to the fixed window function length, the converted voice characteristics cannot completely express the characteristics of the noise-containing voice, and how to completely express the characteristics of the noise-containing voice along with the gradual increase of the communication distance between the frogman by selecting different window function lengths is a problem which needs to be solved.
Disclosure of Invention
In order to solve the technical problems, the application aims to provide a voice quality enhancement method based on frog man intercom, which comprises the following steps:
step S1: acquiring pure voice sample data of a plurality of groups of frogmans and noise-containing voice sample data transmitted by the groups of frogmans in a frogman intercom scene, and recording the data into a voice library;
step S2: generating voice characteristics of the frogman according to the pure voice sample data and the noise-containing voice sample data, acquiring identity information of the frogman, and binding the voice characteristics of the frogman with the identity information to generate voice recognition information;
step S3: when a frogman puts forward a voice communication request, a two-way information transmission channel is established according to voice recognition information of the frogman, and the proportion of noise signals with different voice transmission distances in the noise-containing voice signals is determined according to the distance of the two-way information transmission channel; setting an evaluation index of the voice definition according to the proportion of the noise signal in the noise-containing voice signal;
step S4: the method comprises the steps of presetting a desired voice definition evaluation index of a frogman communication target, dynamically adjusting the average window function length in the process of windowing when noise-containing voice is converted into voice characteristics according to a comparison result of voice definition in the communication process and the desired voice definition evaluation index, setting a frequency threshold for the noise-containing voice based on a human ear sound frequency masking characteristic, filtering a frequency signal of the noise-containing voice, obtaining voice characteristics of the noise-containing voice filtered by the frequency threshold, inputting the voice characteristics into a voice broadcasting end, and generating voice information which accords with the voice definition of the frogman desired communication target by the voice broadcasting end according to the voice characteristics.
Further, the process of generating the voice features of the frogman according to the clean voice sample data and the noise-containing voice sample data comprises the following steps:
the method comprises the steps of converting sample data into digital voice signals, obtaining average framing parameters, window function types and average window function lengths of the sample data in a frogman intercom scene by utilizing big data, dividing the digital voice signals into a plurality of frames according to the average framing parameters, windowing each frame according to the window function, converting the digital voice signals of each frame into frequency signals through Fourier transformation, converting the frequency signals of each frame into time domain waveforms through Fourier transformation, and marking the time domain waveforms of the sample data as voice features.
Further, when the frogman puts forward a voice communication request, the process of establishing the two-way information transmission channel according to the voice recognition information of the frogman comprises the following steps:
when the frogman communication platform receives a frogman voice communication request, determining a frogman communication target according to the frogman voice communication request, and establishing a bidirectional information transmission channel between the frogman and the frogman communication target according to the frogman voice recognition information, wherein the frogman and the frogman communication target transmit voice through the bidirectional information transmission channel.
Further, the process of obtaining the voice transmission distance of the two-way information transmission channel comprises the following steps:
the frogman carries a position signal generating device; determining real-time distance values among the frogmans according to the position information generated by the position signal generating equipment; and determining the voice transmission distance of the two-way information transmission channel between the frogman and the frogman communication target according to the real-time distance value between the frogman, and recording the voice transmission distance of the two-way information transmission channel into a voice library in real time.
Further, the process of determining the proportion of the noise signals with different voice transmission distances in the noise-containing voice signals according to the distance between the two-way information transmission channels comprises the following steps:
acquiring noise-containing voice sample data of voice transmission processes of a plurality of groups of frogmans through a two-way information transmission channel and voice transmission distances of the two-way information transmission channel under a voice library historical talkback scene of a plurality of frogmans; the noise-containing voice comprises a clean voice signal and a noise signal;
carrying out data mining on noise signals of noise-containing voices in voice transmission distances of different two-way information transmission channels, and constructing a noise-containing voice probability distribution function model by taking the noise signals and the voice transmission distances of the two-way information transmission channels as input features and taking the distribution probability of the noise signals in the voice transmission distances of the different two-way information transmission channels as output tags; and acquiring the proportion of noise signals in the noise-containing voice when the voice transmission distances of different bidirectional information transmission channels are different according to the noise-containing voice probability distribution function model.
Further, the process of setting the evaluation index of the speech intelligibility according to the proportion of the noise signal in the noise-containing speech signal includes:
acquiring an evaluation index of voice definition corresponding to the proportion of the noise signal in the noise-containing voice by using a big data method, wherein the evaluation index comprises purity, definition, general and poor; setting index weight of an evaluation index, establishing an evaluation index matrix about voice definition according to an evaluation index of voice definition corresponding to the proportion of a noise signal in noise-containing voice, establishing a noise proportion matrix according to the proportion of the noise signal in the noise-containing voice, and acquiring a membership matrix of the proportion of the noise signal in the noise-containing voice to the voice definition through fuzzy comprehensive evaluation;
and obtaining the voice definition corresponding to different proportions of the noise signals in the noise-containing voice according to the membership degree matrix and the index weight.
Further, the quantization process for dynamically adjusting the average window function length in the process of windowing when the noisy speech is converted into the speech features according to the comparison result of the speech definition in the communication process and the expected speech definition evaluation index comprises:
presetting an expected voice definition evaluation index of a frogman communication target, and acquiring a frogman noisy voice signal received by the frogman communication target in the two-way information transmission channel between the frogman and the frogman communication target and a voice transmission distance of the two-way information transmission channel; inputting the voice transmission distance of the two-way information transmission channel and the received frogman noisy voice signal as input characteristics into a noisy voice probability distribution function model to obtain the proportion of the noise signal of the current two-way information transmission channel voice transmission distance in the noisy voice signal, and obtaining a voice definition evaluation index of the frogman noisy voice signal according to the proportion of the noise signal in the frogman noisy voice signal in the noisy voice;
comparing the expected voice definition evaluation index of the frogman communication target with the voice definition evaluation index of the noise-containing voice signal received by the frogman communication target, when the expected voice definition evaluation index is inconsistent with the voice definition evaluation index of the received noise-containing voice signal, acquiring the proportion of the noise signal corresponding to the expected voice definition evaluation index in the noise-containing voice and the proportion of the noise signal in the received noise-containing voice signal, and acquiring the average window function length when the received noise-containing voice signal is windowed in the process of converting the received noise-containing voice signal into voice characteristics according to the noise proportion deviation value; the larger the proportion of the noise signal in the noise-containing voice signal is, the shorter the average window function length is.
Further, the process of setting the frequency threshold for the noisy speech based on the human ear sound frequency masking feature comprises:
and S2, acquiring human ear sound frequency masking characteristics by using big data, wherein the human ear sound frequency masking characteristics comprise the highest sound frequency and the lowest sound frequency which can be perceived by human ears, filtering the frequency signals by taking the highest sound frequency and the lowest sound frequency as the highest threshold and the lowest threshold when frequency signals are generated in the step S2, filtering the frequency signals higher than the highest threshold and the frequency signals lower than the lowest threshold, converting the filtered frequency signals into time domain waveforms through inverse Fourier transform, marking the time domain waveforms as voice characteristics, inputting the voice characteristics into a voice broadcasting end, and generating voice information which accords with the voice definition of a communication target expected by a frog by the voice broadcasting end according to the voice characteristics.
Compared with the prior art, the application has the beneficial effects that: in the prior art, the noise-containing voice is converted according to the fixed window function length, the complexity of the underwater communication process is ignored, the complexity of noise signals is not considered to be higher and higher along with the gradual increase of communication distances among frogmans, if the noise-containing voice is converted according to the fixed window function length, the converted voice characteristics can not completely express the characteristics of the noise-containing voice, the data mining of the noise signals is carried out on the noise-containing voices in the voice transmission distances of different bidirectional information transmission channels in a voice library, a noise-containing voice probability distribution function model is constructed, the proportion of the noise signals in the noise-containing voices in the voice transmission distances of different bidirectional information transmission channels is obtained, the noise-containing voices transmitted by the frogmans are dynamically adjusted according to the proportion probability of the noise signals in the noise-containing voices in the voice transmission distances of different bidirectional information transmission channels, and the converted voice characteristics can completely express the characteristics of the noise-containing voices in the voice characteristic process, so that the quality of the intercom of the frogmans is remarkably improved.
Drawings
Fig. 1 is a schematic diagram of a voice quality enhancement method based on frog human intercom according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
As shown in fig. 1, the voice quality enhancement method based on frog human intercom comprises the following steps:
step S1: acquiring pure voice sample data of a plurality of groups of frogmans and noise-containing voice sample data transmitted by the groups of frogmans in a frogman intercom scene, and recording the data into a voice library;
step S2: generating voice characteristics of the frogman according to the pure voice sample data and the noise-containing voice sample data, acquiring identity information of the frogman, and binding the voice characteristics of the frogman with the identity information to generate voice recognition information;
step S3: when a frogman puts forward a voice communication request, a two-way information transmission channel is established according to voice recognition information of the frogman, and the proportion of noise signals with different voice transmission distances in the noise-containing voice signals is determined according to the distance of the two-way information transmission channel; setting an evaluation index of the voice definition according to the proportion of the noise signal in the noise-containing voice signal;
step S4: the method comprises the steps of presetting a desired voice definition evaluation index of a frogman communication target, dynamically adjusting the average window function length in the process of windowing when noise-containing voice is converted into voice characteristics according to a comparison result of voice definition in the communication process and the desired voice definition evaluation index, setting a frequency threshold for the noise-containing voice based on a human ear sound frequency masking characteristic, filtering a frequency signal of the noise-containing voice, obtaining voice characteristics of the noise-containing voice filtered by the frequency threshold, inputting the voice characteristics into a voice broadcasting end, and generating voice information which accords with the voice definition of the frogman desired communication target by the voice broadcasting end according to the voice characteristics.
It should be further noted that, in the implementation process, the process of generating the voice features of the frogman according to the clean voice sample data and the noisy voice sample data includes:
the method comprises the steps of converting sample data into digital voice signals, obtaining average framing parameters, window function types and average window function lengths of the sample data in a frogman intercom scene by utilizing big data, dividing the digital voice signals into a plurality of frames according to the average framing parameters, windowing each frame according to the window function, converting the digital voice signals of each frame into frequency signals through Fourier transformation, converting the frequency signals of each frame into time domain waveforms through Fourier transformation, and marking the time domain waveforms of the sample data as voice features.
It should be further noted that, in the implementation process, the process of constructing the speech enhancement model representing the mapping relationship between the noisy speech and the clean speech based on the deep neural network includes:
constructing a voice enhancement model based on an RBF neural network, and taking voice characteristics of pure voice sample data of a plurality of groups of frogmans in a voice library and voice characteristics of noisy voice sample data of the plurality of groups of frogmans in a frogman intercom scene as a training set and a test set to learn and train the voice enhancement model in real time; each audio file comprises single-segment voice and multi-segment voice, data acquisition is carried out at a sampling rate of 16000Hz during recording, coolEditPro software is used as an auxiliary tool on the basis of the data acquisition, the starting point and the end point of a pure voice sample are manually marked and used as voice detection standards, meanwhile, in order to obtain noisy voice, the voice transmission distance of the noisy voice and a bidirectional information transmission channel of 50 target persons during frogman operation is obtained, 1000 groups of voice sample data are obtained in total, 950 groups are used as training sets, 50 groups of the most tested sets train a voice enhancement model until the loss function training of the voice enhancement model is stable, and model parameters are saved.
It should be further noted that, in the implementation process, the process of establishing the frogman communication platform, when the frogman makes a voice communication request, the process of establishing the bidirectional information transmission channel according to the voice recognition information of the frogman includes:
when the frogman communication platform receives a frogman voice communication request, determining a frogman communication target according to the frogman voice communication request, and establishing a bidirectional information transmission channel between the frogman and the frogman communication target according to the frogman voice recognition information, wherein the frogman and the frogman communication target transmit voice through the bidirectional information transmission channel.
It should be further noted that, in the implementation process, the process of obtaining the voice transmission distance of the two-way information transmission channel includes:
the frogman carries a position signal generating device; determining real-time distance values among the frogmans according to the position information generated by the position signal generating equipment; and determining the voice transmission distance of the two-way information transmission channel between the frogman and the frogman communication target according to the real-time distance value between the frogman, and recording the voice transmission distance of the two-way information transmission channel into a voice library in real time.
It should be further noted that, in the implementation process, the process of determining the proportion of the noise signals with different voice transmission distances in the noise-containing voice signals according to the distance between the two-way information transmission channels includes:
acquiring noise-containing voice sample data of a plurality of groups of frogmans in a frog intercom scene through a two-way information transmission channel and the voice transmission distance of the two-way information transmission channel according to a voice library; the noise-containing voice comprises a clean voice signal and a noise signal;
carrying out data mining on noise signals of noise-containing voices in voice transmission distances of different two-way information transmission channels, and constructing a noise-containing voice probability distribution function model by taking the noise signals and the voice transmission distances of the two-way information transmission channels as input features and taking the distribution probability of the noise signals in the voice transmission distances of the different two-way information transmission channels as output tags; acquiring the proportion probability of noise signals in noise-containing voices when the voice transmission distances of different bidirectional information transmission channels are different according to the noise-containing voice probability distribution function model;
further, the process of setting the evaluation index of the speech intelligibility according to the proportion of the noise signal in the noise-containing speech signal includes:
acquiring an evaluation index of voice definition corresponding to the proportion of the noise signal in the noise-containing voice by using a big data method, wherein the evaluation index comprises purity, definition, general and poor; setting index weight of an evaluation index, establishing an evaluation index matrix about voice definition according to an evaluation index of voice definition corresponding to the proportion of a noise signal in noise-containing voice, establishing a noise proportion matrix according to the proportion of the noise signal in the noise-containing voice, and acquiring a membership matrix of the proportion of the noise signal in the noise-containing voice to the voice definition through fuzzy comprehensive evaluation;
and obtaining the voice definition corresponding to different proportions of the noise signals in the noise-containing voice according to the membership degree matrix and the index weight.
It should be further noted that, in the implementation process: the quantization process for dynamically adjusting the length of the average window function in the process of windowing when the noisy speech is converted into the speech characteristics according to the comparison result of the speech definition in the communication process and the expected speech definition evaluation index comprises the following steps:
presetting an expected voice definition evaluation index of a frogman communication target, and acquiring a frogman noisy voice signal received by the frogman communication target in the two-way information transmission channel between the frogman and the frogman communication target and a voice transmission distance of the two-way information transmission channel; inputting the voice transmission distance of the two-way information transmission channel and the received frogman noisy voice signal as input characteristics into a noisy voice probability distribution function model to obtain the proportion of the noise signal of the current two-way information transmission channel voice transmission distance in the noisy voice signal, and obtaining a voice definition evaluation index of the frogman noisy voice signal according to the proportion of the noise signal in the frogman noisy voice signal in the noisy voice;
comparing the expected voice definition evaluation index of the frogman communication target with the voice definition evaluation index of the noise-containing voice signal received by the frogman communication target, when the expected voice definition evaluation index is inconsistent with the voice definition evaluation index of the received noise-containing voice signal, acquiring the proportion of the noise signal corresponding to the expected voice definition evaluation index in the noise-containing voice and the proportion of the noise signal in the received noise-containing voice signal, and acquiring the average window function length when the received noise-containing voice signal is windowed in the process of converting the received noise-containing voice signal into voice characteristics according to the noise proportion deviation value; the larger the proportion of the noise signal in the noise-containing voice signal is, the shorter the average window function length is.
The characteristics of the noise-containing voice can be expressed completely by dynamically adjusting the noise-containing voice transmitted by the frogman according to the proportion probability of the noise signal in the noise-containing voice when the voice transmission distances of different two-way information transmission channels are different and converting the noise-containing voice into the average window function length in the voice characteristic process, thereby obviously improving the voice quality under the frogman intercom scene
It should be further noted that, in the implementation process, the process of setting the frequency threshold for the noise-containing speech based on the frequency masking feature of the human ear sound includes:
and S2, acquiring human ear sound frequency masking characteristics by using big data, wherein the human ear sound frequency masking characteristics comprise the highest sound frequency and the lowest sound frequency which can be perceived by human ears, filtering the frequency signals by taking the highest sound frequency and the lowest sound frequency as the highest threshold and the lowest threshold when frequency signals are generated in the step S2, filtering the frequency signals higher than the highest threshold and the frequency signals lower than the lowest threshold, converting the filtered frequency signals into time domain waveforms through inverse Fourier transform, marking the time domain waveforms as voice characteristics, inputting the voice characteristics into a voice enhancement model, and generating voice information meeting the voice definition of a communication target expected by a frog by a voice broadcasting terminal according to the voice characteristics.
The above embodiments are only for illustrating the technical method of the present application and not for limiting the same, and it should be understood by those skilled in the art that the technical method of the present application may be modified or substituted without departing from the spirit and scope of the technical method of the present application.

Claims (8)

1. The voice quality enhancement method based on frog man intercom is characterized by comprising the following steps of:
step S1: acquiring pure voice sample data of a plurality of groups of frogmans and noise-containing voice sample data transmitted by the groups of frogmans in a frogman intercom scene, and recording the data into a voice library;
step S2: generating voice characteristics of the frogman according to the pure voice sample data and the noise-containing voice sample data, acquiring identity information of the frogman, and binding the voice characteristics of the frogman with the identity information to generate voice recognition information;
step S3: when a frogman puts forward a voice communication request, a two-way information transmission channel is established according to voice recognition information of the frogman, and the proportion of noise signals with different voice transmission distances in the noise-containing voice signals is determined according to the distance of the two-way information transmission channel; setting an evaluation index of the voice definition according to the proportion of the noise signal in the noise-containing voice signal;
step S4: the method comprises the steps of presetting a desired voice definition evaluation index of a frogman communication target, dynamically adjusting the average window function length in the process of windowing when noise-containing voice is converted into voice characteristics according to a comparison result of voice definition in the communication process and the desired voice definition evaluation index, setting a frequency threshold for the noise-containing voice based on a human ear sound frequency masking characteristic, filtering a frequency signal of the noise-containing voice, obtaining voice characteristics of the noise-containing voice filtered by the frequency threshold, inputting the voice characteristics into a voice broadcasting end, and generating voice information which accords with the voice definition of the frogman desired communication target by the voice broadcasting end according to the voice characteristics.
2. The method of claim 1, wherein generating voice features of the frogman based on the clean voice sample data and the noisy voice sample data comprises:
the method comprises the steps of converting sample data into digital voice signals, obtaining average framing parameters, window function types and average window function lengths of the sample data in a frogman intercom scene by utilizing big data, dividing the digital voice signals into a plurality of frames according to the average framing parameters, windowing each frame according to the window function, converting the digital voice signals of each frame into frequency signals through Fourier transformation, converting the frequency signals of each frame into time domain waveforms through Fourier transformation, and marking the time domain waveforms of the sample data as voice features.
3. The method of claim 2, wherein the step of establishing a two-way information transmission channel based on the voice recognition information of the frogman when the frogman makes a voice communication request comprises the steps of:
when the frogman communication platform receives a frogman voice communication request, determining a frogman communication target according to the frogman voice communication request, and establishing a bidirectional information transmission channel between the frogman and the frogman communication target according to the frogman voice recognition information, wherein the frogman and the frogman communication target transmit voice through the bidirectional information transmission channel.
4. A method for enhancing speech quality based on frogman intercom according to claim 3, wherein the process of obtaining the speech transmission distance of the two-way information transmission channel comprises:
the frogman carries a position signal generating device; determining real-time distance values among the frogmans according to the position information generated by the position signal generating equipment; and determining the voice transmission distance of the two-way information transmission channel between the frogman and the frogman communication target according to the real-time distance value between the frogman, and recording the voice transmission distance of the two-way information transmission channel into a voice library in real time.
5. The method of claim 4, wherein determining the proportion of the noise signals with different voice transmission distances in the noise-containing voice signals according to the distance between the two-way information transmission channels comprises:
acquiring noise-containing voice sample data of voice transmission processes of a plurality of groups of frogmans through a two-way information transmission channel and voice transmission distances of the two-way information transmission channel under a voice library historical talkback scene of a plurality of frogmans; the noise-containing voice comprises a clean voice signal and a noise signal;
carrying out data mining on noise signals of noise-containing voices in voice transmission distances of different two-way information transmission channels, and constructing a noise-containing voice probability distribution function model by taking the noise signals and the voice transmission distances of the two-way information transmission channels as input features and taking the distribution probability of the noise signals in the voice transmission distances of the different two-way information transmission channels as output tags; and acquiring the proportion of noise signals in the noise-containing voice when the voice transmission distances of different bidirectional information transmission channels are different according to the noise-containing voice probability distribution function model.
6. The method for enhancing speech quality based on frog talk according to claim 5, wherein the step of setting the evaluation index of speech intelligibility according to the proportion of the noise signal in the noisy speech signal comprises:
acquiring an evaluation index of voice definition corresponding to the proportion of the noise signal in the noise-containing voice by using a big data method, wherein the evaluation index comprises purity, definition, general and poor; setting index weight of an evaluation index, establishing an evaluation index matrix about voice definition according to an evaluation index of voice definition corresponding to the proportion of a noise signal in noise-containing voice, establishing a noise proportion matrix according to the proportion of the noise signal in the noise-containing voice, and acquiring a membership matrix of the proportion of the noise signal in the noise-containing voice to the voice definition through fuzzy comprehensive evaluation;
and obtaining the voice definition corresponding to different proportions of the noise signals in the noise-containing voice according to the membership degree matrix and the index weight.
7. The method of claim 6, wherein dynamically adjusting the length of the average window function during the windowing process in the process of converting the noisy speech into the speech features according to the comparison result between the speech intelligibility in the communication process and the expected speech intelligibility assessment index comprises:
presetting an expected voice definition evaluation index of a frogman communication target, and acquiring a frogman noise-containing voice signal received by the frogman communication target in the two-way information transmission channel and a voice transmission distance of the two-way information transmission channel between the frogman and the frogman communication target; inputting the voice transmission distance of the two-way information transmission channel and the received frogman noisy voice signal as input characteristics into a noisy voice probability distribution function model to obtain the proportion of the noise signal of the current two-way information transmission channel voice transmission distance in the noisy voice signal, and obtaining a voice definition evaluation index of the frogman noisy voice signal according to the proportion of the noise signal in the frogman noisy voice signal in the noisy voice;
comparing the expected voice definition evaluation index of the frogman communication target with the voice definition evaluation index of the noise-containing voice signal received by the frogman communication target, when the expected voice definition evaluation index is inconsistent with the voice definition evaluation index of the received noise-containing voice signal, acquiring the proportion of the noise signal corresponding to the expected voice definition evaluation index in the noise-containing voice and the proportion of the noise signal in the received noise-containing voice signal, and acquiring the average window function length when the received noise-containing voice signal is windowed in the process of converting the received noise-containing voice signal into voice characteristics according to the noise proportion deviation value; the larger the proportion of the noise signal in the noise-containing voice signal is, the shorter the average window function length is.
8. The method of claim 7, wherein the step of setting a frequency threshold for noisy speech based on the masking characteristics of the frequency of the human ear sound comprises:
and (2) acquiring human ear sound frequency masking characteristics by using big data, wherein the human ear sound frequency masking characteristics comprise the highest sound frequency and the lowest sound frequency which can be perceived by human ears, filtering the frequency signals by taking the highest sound frequency and the lowest sound frequency as the highest threshold and the lowest threshold when frequency signals are generated in the step (S2), filtering the frequency signals higher than the highest threshold and the frequency signals lower than the lowest threshold, converting the filtered frequency signals into time domain waveforms through inverse Fourier transform, and marking the time domain waveforms as voice characteristics to be input to a voice broadcasting terminal.
CN202310876048.8A 2023-07-18 2023-07-18 Voice quality enhancement method based on frogman intercom Active CN116597856B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310876048.8A CN116597856B (en) 2023-07-18 2023-07-18 Voice quality enhancement method based on frogman intercom

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310876048.8A CN116597856B (en) 2023-07-18 2023-07-18 Voice quality enhancement method based on frogman intercom

Publications (2)

Publication Number Publication Date
CN116597856A CN116597856A (en) 2023-08-15
CN116597856B true CN116597856B (en) 2023-09-22

Family

ID=87599531

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310876048.8A Active CN116597856B (en) 2023-07-18 2023-07-18 Voice quality enhancement method based on frogman intercom

Country Status (1)

Country Link
CN (1) CN116597856B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117496953B (en) * 2023-12-29 2024-03-12 山东贝宁电子科技开发有限公司 Frog voice processing method based on voice enhancement technology

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104050971A (en) * 2013-03-15 2014-09-17 杜比实验室特许公司 Acoustic echo mitigating apparatus and method, audio processing apparatus, and voice communication terminal
WO2015005914A1 (en) * 2013-07-10 2015-01-15 Nuance Communications, Inc. Methods and apparatus for dynamic low frequency noise suppression
CN104704560A (en) * 2012-09-04 2015-06-10 纽昂斯通讯公司 Formant dependent speech signal enhancement
CN111968630A (en) * 2019-05-20 2020-11-20 北京字节跳动网络技术有限公司 Information processing method and device and electronic equipment
CN112102846A (en) * 2020-09-04 2020-12-18 腾讯科技(深圳)有限公司 Audio processing method and device, electronic equipment and storage medium
WO2021258832A1 (en) * 2020-06-23 2021-12-30 青岛科技大学 Method for denoising underwater acoustic signal on the basis of adaptive window filtering and wavelet threshold optimization
CN114822584A (en) * 2022-04-25 2022-07-29 东北大学 Transmission device signal separation method based on integral improved generalized cross-correlation

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104704560A (en) * 2012-09-04 2015-06-10 纽昂斯通讯公司 Formant dependent speech signal enhancement
CN104050971A (en) * 2013-03-15 2014-09-17 杜比实验室特许公司 Acoustic echo mitigating apparatus and method, audio processing apparatus, and voice communication terminal
WO2015005914A1 (en) * 2013-07-10 2015-01-15 Nuance Communications, Inc. Methods and apparatus for dynamic low frequency noise suppression
CN111968630A (en) * 2019-05-20 2020-11-20 北京字节跳动网络技术有限公司 Information processing method and device and electronic equipment
WO2021258832A1 (en) * 2020-06-23 2021-12-30 青岛科技大学 Method for denoising underwater acoustic signal on the basis of adaptive window filtering and wavelet threshold optimization
CN112102846A (en) * 2020-09-04 2020-12-18 腾讯科技(深圳)有限公司 Audio processing method and device, electronic equipment and storage medium
CN114822584A (en) * 2022-04-25 2022-07-29 东北大学 Transmission device signal separation method based on integral improved generalized cross-correlation

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
M序列信号在主动声纳中的性能研究;吴一飞;李玉伟;;舰船电子工程(10);全文 *
TRAINABLE ADAPTIVE WINDOW SWITCHING FOR SPEECH ENHANCEMENT;Yuma Koizumi et al.;《ICASSP 2019》;全文 *

Also Published As

Publication number Publication date
CN116597856A (en) 2023-08-15

Similar Documents

Publication Publication Date Title
CN108922538B (en) Conference information recording method, conference information recording device, computer equipment and storage medium
CN107393542B (en) Bird species identification method based on two-channel neural network
CN116597856B (en) Voice quality enhancement method based on frogman intercom
DE602004003443T2 (en) Speech period detection based on electromyography
CN105469785A (en) Voice activity detection method in communication-terminal double-microphone denoising system and apparatus thereof
CN108346434B (en) Voice quality assessment method and device
DE112017007005B4 (en) ACOUSTIC SIGNAL PROCESSING DEVICE, ACOUSTIC SIGNAL PROCESSING METHOD AND HANDS-FREE COMMUNICATION DEVICE
WO2021147237A1 (en) Voice signal processing method and apparatus, and electronic device and storage medium
CN112767963A (en) Voice enhancement method, device and system and computer readable storage medium
CN108597505A (en) Audio recognition method, device and terminal device
DE60127550T2 (en) METHOD AND SYSTEM FOR ADAPTIVE DISTRIBUTED LANGUAGE RECOGNITION
DE69635141T2 (en) Method for generating speech feature signals and apparatus for carrying it out
CN111710344A (en) Signal processing method, device, equipment and computer readable storage medium
CN114338623B (en) Audio processing method, device, equipment and medium
CN108133712A (en) A kind of method and apparatus for handling audio data
DE60300267T2 (en) Method and device for multi-reference correction of the spectral speech distortions caused by a communication network
CN111341331B (en) Voice enhancement method, device and medium based on local attention mechanism
DE102012102882A1 (en) An electrical device and method for receiving voiced voice signals therefor
CN105635453A (en) Conversation volume automatic adjusting method and system, vehicle-mounted device, and automobile
CN103474067A (en) Voice signal transmission method and system
EP0658874A1 (en) Process and circuit for producing from a speech signal with small bandwidth a speech signal with great bandwidth
US20230186943A1 (en) Voice activity detection method and apparatus, and storage medium
CN111341351A (en) Voice activity detection method and device based on self-attention mechanism and storage medium
CN113411456B (en) Voice quality assessment method and device based on voice recognition
CN114827363A (en) Method, device and readable storage medium for eliminating echo in call process

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant