CN115910078A

CN115910078A - Adaptive speech coding and decoding adjusting method, device, equipment and medium

Info

Publication number: CN115910078A
Application number: CN202211285610.1A
Authority: CN
Inventors: 邱春毓; 刘海; 廖蓉晖; 李鉴; 姜永广; 杨宏; 杨龙剑; 张鹤鸣; 康敏
Original assignee: CETC 30 Research Institute
Current assignee: CETC 30 Research Institute
Priority date: 2022-10-20
Filing date: 2022-10-20
Publication date: 2023-04-04

Abstract

The invention discloses a method, a device, equipment and a medium for adjusting self-adaptive voice coding and decoding. The invention utilizes the real-time intelligent voice measurement algorithm to detect the voice quality under the current channel condition, and adaptively adjusts the voice coding and decoding type based on the voice quality detection result, thereby improving the conversation voice quality and improving the user experience.

Description

Adaptive speech coding and decoding adjusting method, device, equipment and medium

Technical Field

The invention belongs to the technical field of voice call coding and decoding, and particularly relates to a self-adaptive voice coding and decoding adjusting method, device, equipment and medium.

Background

Voice communication is an important communication means. In the voice communication, the voice analog signal is transmitted to the network through a communication module by sampling quantization and coding, and the opposite-end voice terminal decodes the voice analog signal and finally converts the voice analog signal into the voice to play the voice.

The speech sound signal contains a large amount of redundancies, namely, some parts of the signal do not contain information, some parts of the signal can be estimated according to the correlation of the amplitudes of adjacent sample values of the speech signal, and the signal frequency spectrum is compressed through the digital coding of the speech signal, so that the system bandwidth occupied by the speech data during transmission can be reduced. The coding types are more, and the coding quality and the anti-interference capability of various codes are different. The method mainly comprises three categories of waveform coding, parameter coding and mixed coding. Waveform coding is a digital speech signal obtained by sampling, quantizing, and coding a waveform signal of analog speech in the time domain. Coding parameter coding is based on the pronunciation mechanism of human language, finds out the characteristic parameters characterizing the voice and codes the characteristic parameters. The mixed coding integrates the high quality of the waveform coding and the high efficiency of the parametric coding, and adds certain waveform coding characteristics on the basis of the parametric coding, thereby realizing the purpose of properly improving the naturalness on the basis of intelligibility. Under the three major classes, the coding classes are further subdivided into a number of subclasses.

In the current voice communication process, the channel quality (especially short wave, satellite and other wireless channels) is always in dynamic change, and when the channel quality changes, the voice device still adopts the voice coding and decoding specified in the calling process, so that the self-adaptive adjustment in real time cannot be carried out, and the conversation experience is influenced. When the channel quality is poor (network congestion is increased, packet loss is increased, the transmission rate is slowed down, and the time delay is increased), a voice coding and decoding algorithm with good anti-interference performance cannot be selected; when the channel quality becomes good (network congestion is reduced, packet loss is reduced, the transmission rate is changed into blocks, and time delay is reduced), a speech coding and decoding algorithm with better voice quality cannot be selected.

The current voice device can not intelligently detect the conversation voice quality in the voice communication process in real time, and can not carry out the self-adaptive adjustment of the voice coding and decoding type in real time, so that the problem of the deterioration of the conversation voice quality is caused, and the user experience is influenced.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a self-adaptive voice coding and decoding adjusting method, a device, equipment and a medium, which utilize a real-time intelligent voice measurement algorithm to detect the voice quality under the current channel condition and self-adaptively adjust the type of voice coding and decoding based on the voice quality detection result, thereby improving the quality of conversation voice and improving the user experience.

The purpose of the invention is realized by the following technical scheme:

an adaptive speech codec adaptation method, applied to a talk phase of a voice communication system, comprising:

step S100: acquiring a coding sample of a voice transmitting end and a corresponding decoding sample of a voice receiving end;

step S200: judging whether voice quality detection is finished for all voice decoding types supported by the voice receiving end, if so, entering a step S400, and if not, performing a step S300;

step S300: carrying out voice quality detection on the voice decoding type which is supported by the voice receiving end but is not subjected to voice quality detection, and judging whether the voice quality of the voice decoding type which is not subjected to voice quality detection under the condition of the current channel meets a preset requirement or not;

step S400: constructing an available coding and decoding type set of the voice transmitting end and an available coding and decoding type set of the voice receiving end;

step S500: judging whether the available coding and decoding type set of the voice sending end and the available coding and decoding type set of the voice receiving end have intersection, if not, keeping the original voice coding and decoding types unchanged, and if so, entering the step S600;

step S600: selecting the voice coding and decoding type with the highest priority in the intersection as the optimal voice coding and decoding type;

step S700: and judging whether the optimal voice coding and decoding type is the same as the voice coding and decoding type used in the current communication, if so, keeping the original voice coding and decoding type unchanged, and if not, interactively updating the voice coding and decoding type by the voice transmitting end and the voice receiving end, and using the optimal voice coding and decoding type in the voice communication.

Further, the acquiring the coded samples at the voice transmitting end and the corresponding decoded samples at the voice receiving end specifically includes:

recording a period of time by a microphone, and quantizing a voice analog signal into pulse modulation coding voice data through sampling;

and carrying out voice coding to obtain coded samples and directly decoding the coded samples in the processor to obtain decoded samples.

Further, the determining whether the voice quality of the speech decoding type without voice quality detection under the current channel condition meets a preset requirement specifically includes:

measuring a similarity of the pulse modulation encoded voice data and the decoded samples;

and judging whether the voice quality of the voice decoding type without voice quality detection under the current channel condition meets the preset requirement or not according to the similarity and a preset threshold.

Further, the priority of the voice codec type is self-defined by the user.

Further, the interactively updating the voice encoding and decoding type by the voice sending end and the voice receiving end, and the using the optimal voice encoding and decoding type in the voice communication specifically includes:

the voice sending end adopts the optimal voice coding and decoding type to code in the next frame of the current communication and sends a coding and decoding type updating identification to the voice receiving end;

and after receiving the coding and decoding type updating identification, the voice receiving end performs decoding operation on the current frame and subsequent voice data by using the new coding and decoding type.

On the other hand, the present invention further provides an adaptive speech codec adjusting apparatus, where the apparatus is configured to implement the foregoing method, and the apparatus specifically includes:

a coding and decoding sample obtaining module, configured to implement step S100: acquiring a coding sample of a voice transmitting end and a corresponding decoding sample of a voice receiving end;

a voice quality detection completion judging module, configured to implement step S200: judging whether voice quality detection is finished for all voice decoding types supported by the voice receiving end, if so, entering a step S400, and if not, executing a step S300;

a voice quality detection module, configured to implement step S300 to perform voice quality detection on the speech decoding type that is supported by the voice receiving end but is not subjected to voice quality detection, and determine whether the voice quality of the speech decoding type that is not subjected to voice quality detection under the current channel condition meets a preset requirement;

a codec type set construction module, configured to implement step S400 to construct an available codec type set of the voice transmitting end and an available codec type set of the voice receiving end;

a codec type intersection judgment module, configured to implement step S500: judging whether the available coding and decoding type set of the voice sending end and the available coding and decoding type set of the voice receiving end have intersection, if not, keeping the original voice coding and decoding types unchanged, and if so, entering the step S600;

a codec type selection module, configured to implement step S600: selecting the voice coding and decoding type with the highest priority in the intersection as the optimal voice coding and decoding type;

a codec type switching module, configured to implement step S700: and judging whether the optimal voice coding and decoding type is the same as the voice coding and decoding type used in the current communication, if so, keeping the original voice coding and decoding type unchanged, if not, interactively updating the voice coding and decoding type by the voice sending end and the voice receiving end, and using the optimal voice coding and decoding type in the voice communication.

In another aspect, the present invention further provides a computer device, where the computer device includes a processor and a memory, where the memory stores a computer program, and the computer program is loaded by the processor and executed to implement any one of the above adaptive speech codec adjusting methods.

In another aspect, the present invention further provides a computer-readable storage medium, where a computer program is stored, and the computer program is loaded and executed by a processor to implement any one of the above adaptive speech codec adjusting methods.

The invention has the beneficial effects that:

the self-adaptive speech coding and decoding adjusting method, the device, the equipment and the medium can solve the problem of poor call speech quality caused by mismatching of speech coding and decoding due to the fact that channel quality changes in a speech call, detect the speech quality under the current channel condition by using a real-time speech quality detection algorithm, and perform self-adaptive adjustment on the type of speech coding and decoding by using an intelligent self-adaptive speech coding and decoding adjusting algorithm based on a speech quality detection result, so that the call speech quality is improved, and the call experience is improved.

Drawings

FIG. 1 is a flow chart of an adaptive speech codec adjusting method according to an embodiment of the present invention;

fig. 2 is a block diagram of a voice communication system according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a voice transmit frame data structure according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a voice data flow according to an embodiment of the present invention;

fig. 5 is a block diagram of an adaptive speech codec adjusting apparatus according to an embodiment of the present invention.

Detailed Description

The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict.

All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The current voice device can not intelligently detect the quality of conversation voice in the voice communication process in real time, and can not carry out self-adaptive adjustment of the voice coding and decoding type in real time, so that the problem of poor quality of conversation voice is caused, and the user experience is influenced.

In order to solve the above technical problem, the following embodiments of the adaptive speech codec adjusting method, apparatus, device and medium of the present invention are provided.

Example 1

The embodiment provides a method for adjusting adaptive speech coding and decoding, which has the following basic working principle:

in voice communication, voice analog signals are transmitted to a network through a communication module by sampling quantization and coding, and an opposite-end voice terminal decodes the voice analog signals and finally converts the voice analog signals into voice signals to play voice.

The speech sound signal contains a large amount of redundancy, namely, some parts of the signal do not contain information, some parts of the signal can be estimated according to the correlation of the amplitudes of adjacent samples of the speech signal, and the signal frequency spectrum is compressed through the digitized coding of the speech signal, so that the system bandwidth occupied by the speech data during transmission can be reduced. The coding types are more, and the coding quality and the anti-interference capability of various codes are different. The method mainly comprises three types of waveform coding, parameter coding and mixed coding. Waveform coding is a digital speech signal obtained by sampling, quantizing, and coding a waveform signal of analog speech in the time domain. Coding parameter coding is based on the pronunciation mechanism of human language, finding out the characteristic parameters characterizing the voice and coding the characteristic parameters. The mixed coding integrates the high quality of the waveform coding and the high efficiency of the parametric coding, and adds certain waveform coding characteristics on the basis of the parametric coding, thereby realizing the purpose of properly improving the naturalness on the basis of intelligibility.

In this embodiment, based on the real-time detection result of the voice quality detection, the function of adaptively adjusting the voice encoding and decoding types is realized through the voice quality detection result.

The present embodiment provides a typical application example, and referring to fig. 2, as shown in fig. 2, a block diagram of a voice communication system provided by the present embodiment is shown. The real-time voice quality detection algorithm module adds coding samples into the coded data of the local voice terminal and detects the voice quality by using the opposite-end decoded data. The intelligent self-adaptive speech coding and decoding regulating algorithm realizes the function of self-adaptive regulation of the type of the speech coding and decoding according to the detection result of the voice quality. And in each voice communication stage after the voice terminal is dialed up, the intelligent adjustment of the voice coding and decoding type is only carried out once in each communication process.

Referring to fig. 1, as shown in fig. 1, a schematic flow chart of a method for adjusting adaptive speech codec provided in this embodiment is shown, where the method specifically includes the following steps:

step S200: judging whether voice quality detection is finished for all voice decoding types supported by a voice receiving end, if so, entering a step S400, and if not, executing a step S300;

step S300: carrying out voice quality detection on the voice decoding types which are supported by the voice receiving end but are not subjected to voice quality detection, and judging whether the voice quality of the voice decoding types which are not subjected to voice quality detection under the condition of the current channel meets the preset requirement or not;

step S400: constructing an available coding and decoding type set of a voice transmitting end and an available coding and decoding type set of a voice receiving end;

step S700: and judging whether the optimal voice coding and decoding type is the same as the voice coding and decoding type used in the current communication, if so, keeping the original voice coding and decoding type unchanged, and if not, interactively updating the voice coding and decoding type by the voice sending end and the voice receiving end, so that the optimal voice coding and decoding type is used in the voice communication.

Referring to fig. 3, fig. 3 is a schematic diagram showing the structure of the voice transmission frame data according to the present embodiment. In this embodiment, the encoded sample is attached to the tail of the encoded voice data and transmitted to the opposite-end voice terminal through the network, and the opposite-end voice terminal decodes the encoded sample, compares the decoded data with the decoded sample in terms of correlation, and obtains the current voice quality detection result.

Referring to fig. 4, fig. 4 is a schematic diagram illustrating a voice data flow according to the present embodiment. PCM is called Pulse-Code Modulation, i.e. Pulse Modulation coding, and PCM voice data refers to Pulse Modulation coded voice data, and PCM is a method of using analog signals as data representation.

The coding and decoding types supported by the voice terminal all need to be traversed to TEST voice quality detection, and the coding and decoding type currently carrying out voice quality detection in the coding and decoding types supported by the voice terminal is named as a CODEC _ TEST voice coding and decoding type.

The voice analog signal is converted into PCM voice data through sampling quantization, the PCM voice data is generated through CODEC _ TEST voice coding, then the CODEC _ TEST coded voice data is sent to a network through a communication module, the CODEC _ TEST decoding is carried out after the CODEC _ TEST coded voice data is received by the opposite-end voice terminal, the PCM voice signal is obtained through decoding, and finally the PCM voice signal is converted into the voice analog signal to play voice.

In order to obtain and generate a CODEC _ TEST coding sample and a CODEC _ TEST decoding sample, recording k seconds by a microphone (the k value is set by a user according to different communication channels), quantizing a voice analog signal into PCM voice data by sampling, then carrying out CODEC _ TEST voice coding to obtain CODEC _ TEST coding samples (A0 to An), and directly decoding the CODEC _ TEST coding voice data in a processor without passing through the channel to obtain CODEC _ TEST decoding PCM voice data samples (G0 to Gn), namely the CODEC _ TEST decoding samples. Various types of codec samples need to be preset into the voice terminal. The recording time of the coding and decoding sample is k seconds, and the specific numerical value of k is set by a user.

In this embodiment, a specific manner for determining whether the voice quality of the speech decoding type without voice quality detection under the current channel condition meets the preset requirement is as follows:

after passing through the communication network, the data received by the voice terminal at the opposite end is separated to obtain the code sample data (a 0, a1, a2, \ 8230;, an), and the code sample data is decoded by the CODEC _ TEST to obtain the PCM voice data (g 0, g1, g2, \8230;, gn).

The similarity of the PCM voice data (G0-Gn) and the CODEC _ TEST decoded samples (G0-Gn) is measured by "Psum", so that

Dx = | Gx-Gx |, where x =0 to n;

Psum＝D0+D1+…+Dn；

the maximum value of PCM voice data (4095 if using 13bit linear PCM) is represented by "V _ max", and is a positive value, and the minimum value of PCM voice data (4096 if using 13bit linear PCM) is represented by "V _ min".

When Dx/V _ max <1% (x = 0-n) and Psum/V _ max <5%, then the CODEC _ TEST speech CODEC is determined to be good in voice quality under the current channel quality, and is available speech CODEC, otherwise, is unavailable.

All the speech coding and decoding types supported by the voice terminal are traversed, and whether the speech coding and decoding types are available speech coding under the current channel quality is judged.

In this embodiment, the user sets the priority of the voice codec type supported by the voice terminal.

After all the voice codecs supported by the calling voice terminal and the called voice terminal by traversing and sending the current voice terminal are detected and judged by the real-time voice quality detection algorithm module, a Set Host _ Codec _ Available _ Set = { Codec Host _1, codec Host \ 2, \ 8230 } formed by the Available coding types of the current calling end of the calling voice terminal and a Set Client _ Codec _ Available _ Set = { Codec Client _1, codec Client \2, \8230 } formed by the Available coding types of the current called end of the called voice terminal under the current channel environment are obtained on the processors of the calling voice terminal and the called voice terminal respectively. The called voice terminal sends the currently Available called terminal voice coding and decoding type Set Client _ Codec _ Available _ Set to the calling voice terminal. The calling voice terminal utilizes the Available coding type of the calling terminal to form a Set Host _ Codec _ Available _ Set and the Available voice coding type of the called terminal to form a Set Client _ Codec _ Available _ Set to carry out intelligent voice coding and decoding type adjustment.

The following two scenarios may occur in specific applications:

scene one:

when the calling voice terminal and the called voice terminal do not have the same Available speech coding and decoding type (namely, host _ Codec _ Available _ Set and Client _ Codec _ Available _ Set do not intersect), the speech coding and decoding type of the current call keeps the original speech coding and decoding type unchanged, the calling voice terminal sends 'no-update identification' to the called voice terminal, and the intelligent speech coding and decoding adjustment process is ended.

Scene two:

when the calling voice terminal and the called voice terminal have one or more than one same Available voice Codec type (namely, host _ Codec _ Available _ Set intersects with Client _ Codec _ Available _ Set), the Common Available voice Codec Set Common _ Codec _ Available _ Set = { Codec Host _1, \8230 }.

If the highest priority voice coding and decoding type in Common _ Codec _ Available _ Set is the same as the currently used voice coding and decoding type, the voice coding and decoding type of the current call keeps the original voice coding and decoding type unchanged, the calling voice terminal sends 'no-update identification' to the called voice terminal, and the intelligent voice coding and decoding adjustment process is ended.

If the highest priority voice coding and decoding type in Common _ Codec _ Available _ Set is different from the currently used voice coding and decoding type, selecting the highest priority voice coding and decoding type in Common _ Codec _ Available _ Set as the voice coding and decoding type of the current call, coding by using the updated voice coding and decoding type in the next frame, sending a coding and decoding type updating identifier to the called voice terminal, after the called voice terminal receives the updating identifier, decoding the current frame and the subsequent voice data by using the new coding and decoding type, coding by using the updated voice coding and decoding type in the next frame, feeding back the coding and decoding type updating identifier to the calling party, after the calling voice terminal receives the updating identifier, decoding the current frame and the subsequent voice data by using the new coding and decoding type, and ending the intelligent voice coding and decoding adjustment process.

The adaptive speech codec adjusting method provided by the embodiment can solve the problem of poor call speech quality caused by mismatching of speech codecs due to the change of channel quality in a speech call, detects the speech quality under the current channel condition by using a real-time speech quality detection algorithm, and adaptively adjusts the type of speech codec by using an intelligent adaptive speech codec adjusting algorithm based on the speech quality detection result, thereby improving the call speech quality and improving the call experience.

Example 2

Referring to fig. 5, as shown in fig. 5, a block diagram of a structure of an adaptive speech codec adjusting apparatus provided in this embodiment is shown, where the apparatus specifically includes:

a coding and decoding sample obtaining module 10, configured to implement step S100: acquiring a coding sample of a voice transmitting end and a corresponding decoding sample of a voice receiving end;

a voice quality detection completion judging module 20, configured to implement step S200: judging whether voice quality detection is finished for all voice decoding types supported by the voice receiving end, if so, entering a step S400, and if not, executing a step S300;

a voice quality detection module 30, configured to implement step S300 to perform voice quality detection on the speech decoding type that is supported by the voice receiving end but is not subjected to voice quality detection, and determine whether the voice quality of the speech decoding type that is not subjected to voice quality detection under the current channel condition meets a preset requirement;

a codec type set constructing module 40, configured to implement step S400 to construct an available codec type set of the voice transmitting end and an available codec type set of the voice receiving end;

a codec type intersection determining module 50, configured to implement step S500: judging whether the available coding and decoding type set of the voice sending end and the available coding and decoding type set of the voice receiving end have intersection, if not, keeping the original voice coding and decoding types unchanged, and if so, entering the step S600;

a codec type selection module 60, configured to implement step S600: selecting the voice coding and decoding type with the highest priority in the intersection as the optimal voice coding and decoding type;

a codec type switching module 70, configured to implement step S700: and judging whether the optimal voice coding and decoding type is the same as the voice coding and decoding type used in the current communication, if so, keeping the original voice coding and decoding type unchanged, and if not, interactively updating the voice coding and decoding type by the voice transmitting end and the voice receiving end, and using the optimal voice coding and decoding type in the voice communication.

The beneficial effects of the adaptive speech codec adjusting apparatus provided in this embodiment are detailed in the foregoing embodiments, and are not described herein again.

Example 3

The preferred embodiment provides a computer device, which can implement the steps in any embodiment of the adaptive speech codec adjusting method provided in the embodiment of the present application, and therefore, the beneficial effects of the adaptive speech codec adjusting method provided in the embodiment of the present application can be achieved, for details, see the foregoing embodiments, and are not described herein again.

Example 4

It will be understood by those skilled in the art that all or part of the steps of the methods of the above embodiments may be performed by instructions or by associated hardware controlled by the instructions, which may be stored in a computer readable storage medium and loaded and executed by a processor. To this end, an embodiment of the present invention provides a storage medium, in which a plurality of instructions are stored, and the instructions can be loaded by a processor to execute the steps of any embodiment of the adaptive speech codec adjusting method provided in the embodiment of the present invention.

Wherein the storage medium may include: read Only Memory (ROM), random Access Memory (RAM), magnetic or optical disks, and the like.

Since the instructions stored in the storage medium can execute the steps in any adaptive speech codec adjusting method provided in the embodiments of the present invention, the beneficial effects that can be achieved by any adaptive speech codec adjusting method provided in the embodiments of the present invention can be achieved, which are detailed in the foregoing embodiments and will not be described herein again.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims

1. An adaptive speech codec adaptation method for use in a talk phase of a voice communication system, the method comprising:

step S300: carrying out voice quality detection on the voice decoding types which are supported by the voice receiving end but are not subjected to voice quality detection, and judging whether the voice quality of the voice decoding types which are not subjected to voice quality detection under the current channel condition meets the preset requirement or not;

step S700: and judging whether the optimal voice coding and decoding type is the same as the voice coding and decoding type used in the current communication, if so, keeping the original voice coding and decoding type unchanged, if not, interactively updating the voice coding and decoding type by the voice sending end and the voice receiving end, and using the optimal voice coding and decoding type in the voice communication.

2. The adaptive speech codec adjusting method of claim 1, wherein the obtaining of the encoded samples at the speech transmitting end and the corresponding decoded samples at the speech receiving end specifically comprises:

3. The adaptive speech codec adjusting method of claim 1, wherein the determining whether the speech quality of the speech decoding type without speech quality detection under the current channel condition meets a preset requirement specifically comprises:

4. The adaptive speech codec adaptation method of claim 1, wherein the priority of the speech codec type is self-defined by a user.

5. The adaptive speech codec adaptation method of claim 1, wherein the voice transmitting end and the voice receiving end interactively update a speech codec type, and wherein using the optimal speech codec type in speech communication specifically comprises:

6. An adaptive speech codec adaptation apparatus, characterized in that, the apparatus is configured to implement the method of claim 1, and the apparatus specifically includes:

a codec type set constructing module, configured to implement step S400 to construct an available codec type set of the voice transmitting end and an available codec type set of the voice receiving end;

7. A computer device, characterized in that the computer device comprises a processor and a memory, in which a computer program is stored, which computer program is loaded and executed by the processor to implement the adaptive speech codec adaptation method according to any one of claims 1-5.

8. A computer-readable storage medium, in which a computer program is stored, which is loaded and executed by a processor to implement the adaptive speech codec adaptation method according to any one of claims 1 to 5.