CN112235462A

CN112235462A - Voice adjusting method, system, electronic equipment and computer readable storage medium

Info

Publication number: CN112235462A
Application number: CN202011098893.XA
Authority: CN
Inventors: 倪卫峰
Original assignee: Spreadtrum Communications Shanghai Co Ltd
Current assignee: Spreadtrum Communications Shanghai Co Ltd
Priority date: 2020-10-14
Filing date: 2020-10-14
Publication date: 2021-01-15

Abstract

The invention discloses a voice adjusting method, a system, electronic equipment and a computer readable storage medium, which are characterized in that the voice adjusting method comprises the following steps: collecting voice information in the communication process; detecting the voice information, and if the voice information is noise voice, acquiring noise energy of the noise voice; adjusting the voice parameters of the downlink voice information received by the downlink communication link according to the noise energy; and playing the adjusted downlink voice information. The method and the device can automatically adapt to the voice quality under different scenes, and the loudness of the downlink voice sent by the equipment changes along with the change of the voice quality under different noise environments, for example, the noise is high, and the loudness of the downlink voice is also increased; the noise is small, and the loudness of the downlink voice is also reduced; the proper signal-to-noise ratio is always kept, so that the human ear can clearly hear the semantics of the downlink voice.

Description

Voice adjusting method, system, electronic equipment and computer readable storage medium

Technical Field

The invention belongs to the technical field of mobile communication, and particularly relates to a voice adjusting method, a voice adjusting system, electronic equipment and a computer readable storage medium.

Background

When people are in different scenes, such as noisy environments: in scenes such as supermarkets, subways, intersections, KTVs and the like, because environmental noise is large, when voice calls are made, subjective listening feelings are fuzzy, and semantics of the opposite party cannot be received.

Disclosure of Invention

The invention provides a voice adjusting method, a system, electronic equipment and a computer readable storage medium, aiming at overcoming the defect that in the prior art, even if the volume of a mobile phone is adjusted to the maximum, the call experience is still very poor in a noisy environment.

The invention solves the technical problems through the following technical scheme:

a method of speech conditioning, comprising:

collecting voice information in the communication process;

detecting the voice information, and if the voice information is noise voice, acquiring noise energy of the noise voice;

adjusting the voice parameters of the downlink voice information received by the downlink communication link according to the noise energy;

and playing the adjusted downlink voice information.

Preferably, the step of detecting the voice information specifically includes:

detecting the voice information based on a voice activity detection algorithm.

Preferably, in the communication process, the step of collecting the voice information specifically includes:

in the communication process, collecting a frame of voice information every other preset time period;

the detecting the voice information, and if the voice information is a noise voice, the acquiring the noise energy of the noise voice specifically includes:

detecting whether the current frame voice information is noise voice;

if so, acquiring the noise energy of the noise voice of the current frame;

the step of adjusting the voice parameter of the downlink voice information received by the downlink communication link according to the noise energy specifically includes:

and adjusting the voice parameters of the downlink voice information in the same time period with the current frame noise voice according to the noise energy of the current frame noise voice.

Preferably, after the step of detecting the voice information, the voice adjusting method further includes:

and if the voice information is the call voice, sending the call voice through an uplink communication link.

Preferably, the voice parameters include volume and frequency response.

A voice adjusting system comprises an acquisition module, a detection module, an adjusting module and a playing module;

the acquisition module is used for acquiring voice information in the call process;

the detection module is used for detecting the voice information, and if the voice information is noise voice, noise energy of the noise voice is obtained;

the adjusting module is used for adjusting the voice parameters of the downlink voice information received by the downlink communication link according to the noise energy;

the playing module is used for playing the adjusted downlink voice information.

Preferably, the detection module is configured to detect the voice information based on a voice activation detection algorithm.

Preferably, the acquisition module is configured to acquire a frame of voice information every other preset time period during a call;

the detection module comprises a detection unit and a noise energy acquisition unit;

the detection unit is used for detecting whether the current frame voice information is noise voice, and if so, the noise energy acquisition unit is called;

the noise energy acquisition unit acquires the noise energy of the noise voice of the current frame;

the adjusting module is used for adjusting the voice parameters of the downlink voice information in the same time period as the current frame noise voice according to the noise energy of the current frame noise voice.

Preferably, the voice adjusting system further comprises a sending module;

the detection module is used for calling the sending module when the voice information is a call voice;

and the sending module is used for sending the call voice out through an uplink communication link.

Preferably, the voice parameters include volume and frequency response.

An electronic device comprises a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the voice adjusting method when executing the computer program.

A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned speech adaptation method.

The positive progress effects of the invention are as follows: the method and the device can automatically adapt to the voice quality under different scenes, and the loudness of the downlink voice sent by the equipment changes along with the change of the voice quality under different noise environments, for example, the noise is high, and the loudness of the downlink voice is also increased; the noise is small, and the loudness of the downlink voice is also reduced; the proper signal-to-noise ratio is always kept, so that the human ear can clearly hear the semantics of the downlink voice.

Drawings

Fig. 1 is a flowchart of a speech adjusting method according to embodiment 1 of the present invention.

Fig. 2 is a flowchart illustrating a voice call process according to embodiment 1 of the present invention.

Fig. 3 is a flowchart of a voice adjusting method according to embodiment 2 of the present invention.

Fig. 4 is a block diagram of a voice adjustment system according to embodiment 3 of the present invention.

Fig. 5 is a schematic structural diagram of an electronic device according to embodiment 4 of the present invention.

Detailed Description

The invention is further illustrated by the following examples, which are not intended to limit the scope of the invention.

Example 1

A method of speech conditioning, as shown in fig. 1, comprising:

step 10, collecting voice information in the communication process;

step 20, detecting the voice information, if the voice information is noise voice, executing step 30, and if the voice information is call voice, executing step 60;

step 30, obtaining noise energy of the noise voice; specifically, the voice information is detected based on a voice activation detection algorithm.

Step 40, adjusting the voice parameters of the downlink voice information received by the downlink communication link according to the noise energy; the speech parameters include volume and frequency response.

And step 50, playing the adjusted downlink voice information.

After the step of detecting the voice information, the voice adjusting method further includes:

and step 60, sending the call voice through an uplink communication link.

Taking a mobile phone call as an example, referring to fig. 2, fig. 2 shows a flow diagram of a voice call process, which is divided into an uplink path and a downlink path:

an uplink path: voice information (noise or voice) enters an Audio Codec through a MIC (microphone), is amplified by a PGA (programmable gain controller), enters an ADC (analog-to-digital converter), enters a DSP (digital signal processing module) with a data amount of one frame of 20ms, and can be determined to be a noise frame or a voice frame through a VAD (voice activity detector).

The Voice frame is a Voice signal to be transmitted to the opposite side, generally a near-field Voice, and the like, and the Voice signal passes through Noise estimate directly, then passes through Tx Process (uplink signal processing), including Noise suppression, echo suppression, automatic volume adjustment, and the like, and finally passes through Voice Encoder (Voice coding) and is transmitted.

The Noise frame refers to the environmental Noise, generally far-field Noise, at this time, after Noise estimate, the Noise energy is detected, and this value is sent to MBDRC (multi-band dynamic range controller) and EQ (equalizer) of the downlink, the voice parameters therein are automatically adjusted, the volume and frequency response of the downlink are enhanced or reduced, the modified signal enters the loudspeaker after being amplified by DAC (digital-to-analog converter) module and PGA (programmable gain controller) of Audio Codec, and then the sound is emitted.

In the figure, Voice Decoder is used for Voice decoding, Rx Process is used for downstream signal processing, and it is limited that the loudness emitted from the speaker cannot exceed the maximum value specified by 3GPP in any case.

In the embodiment, the voice quality under different scenes can be automatically adapted, and the loudness of the downlink voice sent by the equipment changes along with the change of the voice quality under different noise environments, for example, the loudness of the downlink voice is increased due to high noise; the noise is small, and the loudness of the downlink voice is also reduced; the proper signal-to-noise ratio is always kept, so that the human ear can clearly hear the semantics of the downlink voice. Example 2

This embodiment is a further improvement on the basis of embodiment 1, as shown in fig. 3, in the call process, step 10 specifically includes:

step 101, in a communication process, acquiring a frame of voice information every other preset time period;

further, step 20 specifically includes: step 201, detecting whether the current frame voice information is noise voice, if yes, executing step 301;

301, acquiring noise energy of noise voice of a current frame;

step 401, adjusting the voice parameters of the downlink voice information in the same time period as the current frame noise voice according to the noise energy of the current frame noise voice, and then executing step 50.

This embodiment illustrates that, when processing the voice information, reading a frame of data at preset time intervals, for example, in reference to embodiment 1, reading the frame of data with a data amount of 20ms, and then adjusting the downlink voice information in the same time period as the noise voice.

Example 3

A voice adjusting system, as shown in FIG. 4, includes a collecting module 1, a detecting module 2, an adjusting module 3 and a playing module 4;

the acquisition module 1 is used for acquiring voice information in the call process;

the detection module 2 is configured to detect the voice information, and if the voice information is a noise voice, obtain noise energy of the noise voice; specifically, the detection module 2 is configured to detect the voice information based on a voice activation detection algorithm.

The adjusting module 3 is used for adjusting the voice parameters of the downlink voice information received by the downlink communication link according to the noise energy; the speech parameters include volume and frequency response.

And the playing module 4 is used for playing the adjusted downlink voice information.

The voice adjusting system also comprises a sending module 5;

the detection module 2 is used for calling the sending module 5 when the voice information is a call voice;

the sending module 5 is configured to send the call voice through an uplink communication link.

In this embodiment, the detection module 2 includes a detection unit 21 and a noise energy obtaining unit 22; the acquisition module 1 is used for acquiring a frame of voice information every other preset time period in the call process;

the detecting unit 21 is configured to detect whether the current frame speech information is a noise speech, and if so, invoke the noise energy obtaining unit 22;

the noise energy obtaining unit 22 obtains the noise energy of the noise voice of the current frame;

the adjusting module 3 is configured to adjust a speech parameter of the downlink speech information in the same time period as the current frame noise speech according to the noise energy of the current frame noise speech.

In the embodiment, the voice quality under different scenes can be automatically adapted, and the loudness of the downlink voice sent by the equipment changes along with the change of the voice quality under different noise environments, for example, the loudness of the downlink voice is increased due to high noise; the noise is small, and the loudness of the downlink voice is also reduced; the proper signal-to-noise ratio is always kept, so that the human ear can clearly hear the semantics of the downlink voice.

Example 4

An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the speech adjustment method of

embodiment

1 or 2 when executing the computer program.

Fig. 5 is a schematic structural diagram of an electronic device provided in this embodiment. Fig. 5 illustrates a block diagram of an exemplary electronic device 90 suitable for use in implementing embodiments of the present invention. The electronic device 90 shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiment of the present invention.

As shown in fig. 5, the electronic device 90 may take the form of a general purpose computing device, which may be a server device, for example. The components of the electronic device 90 may include, but are not limited to: at least one processor 91, at least one memory 92, and a bus 93 that connects the various system components (including the memory 92 and the processor 91).

The bus 93 includes a data bus, an address bus, and a control bus.

Memory 92 may include volatile memory, such as Random Access Memory (RAM)921 and/or cache memory 922, and may further include Read Only Memory (ROM) 923.

Memory 92 may also include a program tool 925 having a set (at least one) of program modules 924, such program modules 924 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

The processor 91 executes various functional applications and data processing by running a computer program stored in the memory 92.

The electronic device 90 may also communicate with one or more external devices 94 (e.g., keyboard, pointing device, etc.). Such communication may be through an input/output (I/O) interface 95. Also, the electronic device 90 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet) via a network adapter 96. The network adapter 96 communicates with the other modules of the electronic device 90 via the bus 93. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device 90, including but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID (disk array) systems, tape drives, and data backup storage systems, etc.

It should be noted that although in the above detailed description several units/modules or sub-units/modules of the electronic device are mentioned, such a division is merely exemplary and not mandatory. Indeed, the features and functionality of two or more of the units/modules described above may be embodied in one unit/module, according to embodiments of the application. Conversely, the features and functions of one unit/module described above may be further divided into embodiments by a plurality of units/modules.

Example 5

A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the speech adjustment method of

embodiment

1 or 2.

More specific examples, among others, that the readable storage medium may employ may include, but are not limited to: a portable disk, a hard disk, random access memory, read only memory, erasable programmable read only memory, optical storage device, magnetic storage device, or any suitable combination of the foregoing.

In a possible implementation, the invention can also be implemented in the form of a program product comprising program code means for causing a terminal device to carry out the steps of implementing the speech adaptation method as described in

embodiment

1 or 2, when said program product is run on said terminal device.

Where program code for carrying out the invention is written in any combination of one or more programming languages, the program code may be executed entirely on the user device, partly on the user device, as a stand-alone software package, partly on the user device and partly on a remote device or entirely on the remote device.

While specific embodiments of the invention have been described above, it will be appreciated by those skilled in the art that this is by way of example only, and that the scope of the invention is defined by the appended claims. Various changes and modifications to these embodiments may be made by those skilled in the art without departing from the spirit and scope of the invention, and these changes and modifications are within the scope of the invention.

Claims

1. A method for speech conditioning, comprising:

collecting voice information in the communication process;

and playing the adjusted downlink voice information.

2. The speech adaptation method of claim 1, wherein the step of detecting the speech information specifically comprises:

detecting the voice information based on a voice activity detection algorithm.

3. The voice adjustment method according to claim 1, wherein the step of collecting voice information during the call specifically comprises:

detecting whether the current frame voice information is noise voice;

if so, acquiring the noise energy of the noise voice of the current frame;

4. The speech adaptation method of claim 2, wherein after the step of detecting the speech information, the speech adaptation method further comprises:

5. The speech adaptation method of claim 1, wherein the speech parameters include volume and frequency response.

6. A voice regulation system is characterized by comprising an acquisition module, a detection module, a regulation module and a playing module;

the playing module is used for playing the adjusted downlink voice information.

7. The voice conditioning system of claim 6, wherein the detection module is to detect the voice information based on a voice activity detection algorithm.

8. The voice adjustment system of claim 6, wherein the collecting module is configured to collect a frame of voice information every other predetermined time period during a call;

9. The voice conditioning system of claim 7, wherein the voice conditioning system further comprises a transmit module;

10. The speech modification system of claim 6, wherein the speech parameters include volume and frequency response.

11. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the speech adaptation method of any of claims 1 to 5 when executing the computer program.

12. A computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, carries out the steps of the speech adaptation method of one of claims 1 to 5.