CN109473116B - Voice coding method, voice decoding method and device - Google Patents

Voice coding method, voice decoding method and device Download PDF

Info

Publication number
CN109473116B
CN109473116B CN201811518677.9A CN201811518677A CN109473116B CN 109473116 B CN109473116 B CN 109473116B CN 201811518677 A CN201811518677 A CN 201811518677A CN 109473116 B CN109473116 B CN 109473116B
Authority
CN
China
Prior art keywords
coding
decoding
encoding
difference value
result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811518677.9A
Other languages
Chinese (zh)
Other versions
CN109473116A (en
Inventor
牛坤
姜友海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sipic Technology Co Ltd
Original Assignee
Sipic Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sipic Technology Co Ltd filed Critical Sipic Technology Co Ltd
Priority to CN201811518677.9A priority Critical patent/CN109473116B/en
Publication of CN109473116A publication Critical patent/CN109473116A/en
Application granted granted Critical
Publication of CN109473116B publication Critical patent/CN109473116B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques

Abstract

The invention discloses a voice coding/decoding method and a device, wherein the voice signal coding method comprises the following steps: predetermining a fixed step value for encoding a speech signal; calculating a difference between the speech signal at the current time and the predicted speech signal at the previous time; and coding the difference value based on the fixed stepping value to obtain a coding result. In the embodiment of the invention, the fixed stepping value is adopted, so that the nonlinear change of the original voice signal can be reduced, and the processing result cannot be influenced by the amplified nonlinear change when the front-end signal processing (such as echo cancellation and beam forming) is carried out subsequently, thereby avoiding the influence on the back-end voice recognition.

Description

Voice coding method, voice decoding method and device
Technical Field
The present invention relates to the field of speech signal processing technologies, and in particular, to a speech encoding method and apparatus, and a speech encoding method and apparatus.
Background
Existing audio coding and decoding techniques can be divided into two categories. One is lossless compression, e.g., FLAC, APE, etc.; another class is lossy compression, such as compressing techniques like opus, ogg, mp3, etc.
Lossless compression is the most ideal state and has no influence on signal processing, but the algorithm has high calculation complexity and is not suitable for the current low power consumption requirement, the selectivity of a CPU is reduced, and meanwhile, the compression is small and is usually less than 2: 1; the lossy compression commonly used in the market does not meet the requirement of signal processing, and the computation amount is usually large.
For lossless compression, its inherent algorithm complexity makes it unsuitable for real-time signal processing such as speech recognition. For lossy compression, adaptive linear filtering is adopted for echo cancellation, beam forming and the like in signal processing, and the lossy compression usually introduces some nonlinear changes, which are amplified in the signal processing to cause poor signal processing results and have great influence on the speech recognition at the back end.
The front-end signal processing is very dependent on the original data, and the nonlinear change in the middle can cause the abnormity of the signal processing. Other companies in the market often adopt a way to upgrade their hardware configuration to ensure the stability of data sent to the front-end signal processing for some situations where the hardware processing capacity is insufficient. At present, no low-consumption coding and decoding algorithm applied to front-end signal processing exists in the market.
Disclosure of Invention
The inventor initially considers the speex codec algorithm in the ogg format in the process of implementing the invention, because the pure voice is used for awakening and voice recognition after being coded and decoded, the overall performance is reduced little and is basically usable. However, after the original speech signal is coded and decoded and then subjected to front-end signal processing, the signal distortion is serious, and the performance of performing speech awakening and speech recognition is greatly reduced.
The ADPCM algorithm is used at first, and the ADPCM algorithm has advantages and disadvantages compared with the DPCM, and ADPCM coding is an adaptive differential coding technique, and has an advantage over DPCM in that it has an adaptive capability. It can adaptively change the previously described step values, using small step values for small differences and large step values for large differences. The inventor finds that the method has better performance in most cases in the process of implementing the invention, but the situation that some signal mutation occurs can cause a great deal of nonlinear change of the signal, for example, the situation that a frame is lost in signal transmission.
The audio is compressed through a low-loss compression coding mode DPCM, the DPCM compression can ensure extremely low audio loss and cause extremely small frequency spectrum distortion, and the influence on the awakening and recognition of the voice can be ignored through verification of a large number of test sets. Moreover, the calculation complexity of the algorithm is extremely low, and in the prior art, for example, the CPU occupancy rate of the cortex-m4 series does not exceed 10%. In addition, the DPCM compression ratio reaches 2-4 times, and basically reaches most transmission requirements.
The principle of DPCM is to use the correlation between signal sampling points, i.e. the difference between each sampling point and the adjacent sampling point is small, so that the compression can be performed by using this characteristic. Generally speaking, the value of the first sampling point is stored, and then the difference value between each sampling point and the previous sampling point is stored as the compressed data. Thus, the first sample point is added with the difference to obtain the second sample point, and then the difference … is added continuously to obtain the values of all sample points, thereby completing the speech restoration. Moreover, because the difference between each sampling point is small, the difference is not large, so that the compressed data can be stored by using a small number of bits, and the compression is realized.
An embodiment of the present invention provides a speech encoding and decoding method and apparatus, which are used to solve at least one of the above technical problems.
In a first aspect, an embodiment of the present invention provides a speech coding method, including:
predetermining a fixed step value for encoding a speech signal;
calculating a difference between the speech signal at the current time and the predicted speech signal at the previous time;
and coding the difference value based on the fixed stepping value to obtain a coding result.
In a second aspect, an embodiment of the present invention provides a speech decoding method, including:
receiving a coding result obtained by coding the voice signal at the current moment by adopting the voice coding method of the embodiment of the invention;
decoding the encoding result based on the fixed stepping value to obtain a decoding difference value;
and adding the decoding difference value and the predicted voice signal at the previous moment to obtain a decoding result.
In a third aspect, the present invention provides a speech signal encoding and decoding method, including:
the encoding step performed at the first terminal:
predetermining a fixed step value for encoding a speech signal;
calculating a difference between the speech signal at the current time and the predicted speech signal at the previous time;
coding the difference value based on the fixed stepping value to obtain a coding result;
decoding step performed at the second terminal:
receiving the encoding result;
decoding the encoding result based on the fixed stepping value to obtain a decoding difference value;
and adding the decoding difference value and the predicted voice signal at the previous moment to obtain a decoding result.
In a fourth aspect, the present invention provides a speech encoding apparatus, comprising a fixed step value determining module, configured to determine in advance a fixed step value for encoding a speech signal; the difference value calculating module is used for calculating the difference value between the voice signal at the current moment and the predicted voice signal at the previous moment; and the coding logic module is used for coding the difference value based on the fixed stepping value to obtain a coding result.
In a fifth aspect, the present invention provides a speech decoding apparatus, including: the signal receiving module is used for receiving a coding result obtained by coding the voice signal at the current moment by adopting the voice coding method of the embodiment of the invention; a decoding logic module, configured to decode the encoding result based on the fixed step value to obtain a decoding difference value; and the adder module is used for adding the decoding difference value and the predicted voice signal at the previous moment to obtain a decoding result.
In a sixth aspect, an embodiment of the present invention provides a storage medium, where one or more programs including execution instructions are stored, and the execution instructions can be read and executed by an electronic device (including but not limited to a computer, a server, or a network device, etc.) to perform any one of the above-described speech signal encoding methods or speech signal decoding methods of the present invention.
In a seventh aspect, an electronic device is provided, which includes: at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor, the instructions being executable by the at least one processor to enable the at least one processor to perform any one of the speech signal encoding method or the speech signal decoding method of the present invention.
In an eighth aspect, an embodiment of the present invention further provides a computer program product, where the computer program product includes a computer program stored on a storage medium, and the computer program includes program instructions, which, when executed by a computer, cause the computer to execute any one of the speech signal encoding method or the speech signal decoding method described above.
The embodiment of the invention has the beneficial effects that: by adopting the fixed stepping value, the nonlinear change of the original voice signal can be reduced, so that the processing result is not influenced by the amplified nonlinear change when the front-end signal processing (such as echo cancellation and beam forming) is carried out subsequently, and the influence on the back-end voice recognition is avoided.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a flowchart illustrating a method for encoding a speech signal according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating a method for decoding a speech signal according to an embodiment of the present invention;
FIG. 3 is a flowchart illustrating a speech signal encoding/decoding method according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of an embodiment of a speech signal codec algorithm according to the invention;
FIG. 5 is a schematic block diagram of an embodiment of a speech signal encoding apparatus according to the present invention;
FIG. 6 is a schematic block diagram of an embodiment of a speech signal decoding apparatus according to the present invention;
fig. 7 is a schematic structural diagram of an embodiment of an electronic device according to the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
The invention may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
As used in this disclosure, "module," "device," "system," and the like are intended to refer to a computer-related entity, either hardware, a combination of hardware and software, or software in execution. In particular, for example, an element may be, but is not limited to being, a process running on a processor, an object, an executable, a thread of execution, a program, and/or a computer. Also, an application or script running on a server, or a server, may be an element. One or more elements may be in a process and/or thread of execution and an element may be localized on one computer and/or distributed between two or more computers and may be operated by various computer-readable media. The elements may also communicate by way of local and/or remote processes based on a signal having one or more data packets, e.g., from a data packet interacting with another element in a local system, distributed system, and/or across a network in the internet with other systems by way of the signal.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
As shown in fig. 1, an embodiment of the present invention provides a speech signal encoding method, including:
s11, a fixed step value for encoding the speech signal is predetermined.
S12, calculating the difference between the speech signal at the current moment and the predicted speech signal at the previous moment; the voice signal at the current moment is PCM signal data based on the original voice signal.
And S13, coding the difference value based on the fixed stepping value to obtain a coding result.
In the embodiment of the invention, the fixed stepping value is adopted, so that the nonlinear change of the original voice signal can be reduced, and the processing result cannot be influenced by the amplified nonlinear change when the front-end signal processing (such as echo cancellation and beam forming) is carried out subsequently, thereby avoiding the influence on the back-end voice recognition.
In some embodiments, further comprising: quantizing the difference value by adopting a preset quantization factor to compress the difference value; at this time, the encoding of the difference value based on the fixed step value to obtain an encoding result is: and coding the result after the difference value quantization based on the fixed stepping value to obtain a change result.
Since the difference may be too large, a quantization factor is typically set for storage, e.g., if the quantization factor is 100, the difference 400 may be mapped to 4, so that the compressed data may be stored with fewer bits. However, the inventors further found that quantization errors easily occur in this case. If all differences are fixed within a certain range, e.g. 4, the difference range is-8 to 7, multiplied by the quantization factor. Therefore, if the difference is too large or too small, and out of range, it can be determined only by the boundary value. And in this case, quantization errors are caused.
If quantization error occurs in one data, the following data is restored on the wrong data in the restoration process, so that the error occurring before is accumulated all the time, and all the following data restoration is affected. Therefore, the inventor proposes a scheme that: and compressing the current data by using the last restored data, so that even if a quantization error is generated, only one sampling point is influenced, and the restoration of the subsequent sampling point is not influenced.
In response to the above-mentioned problem, in an embodiment of the present invention, the predicted speech signal at the previous time is obtained based on decoding the encoding result.
Fig. 2 is a flowchart of a speech signal decoding method according to an embodiment of the present invention, the method including:
s21, receiving a coding result obtained by coding the speech signal at the current time by using the speech coding method according to any of the foregoing embodiments of the present invention;
s22, decoding the coding result based on the fixed stepping value to obtain a decoding difference value;
and S23, adding the decoding difference value and the predicted voice signal of the previous moment to obtain a decoding result.
Fig. 3 is a flowchart of a speech signal encoding/decoding method according to an embodiment of the present invention, the method including:
s31, the encoding step performed at the first terminal:
s311, predetermining a fixed stepping value for coding the voice signal;
s312, calculating a difference value between the voice signal at the current moment and the predicted voice signal at the previous moment; the voice signal at the current moment is PCM signal data based on an original voice signal; the predicted speech signal at the previous time is obtained based on decoding the encoding result.
S313, coding the difference value based on the fixed stepping value to obtain a coding result;
s32, decoding step performed at the second terminal:
s321, receiving the coding result;
s322, decoding the coding result based on the fixed stepping value to obtain a decoding difference value;
and S323, adding the decoding difference value and the predicted voice signal at the previous moment to obtain a decoding result.
Fig. 4 is a schematic diagram of an embodiment of the speech signal codec algorithm according to the present invention. The left end of the transmission is device 1, the right end is device 2, and the device is defined in a broad sense, and may refer to two different devices or two CPUs.
X (n) is the original PCM signal, and X (n-1) is the predicted estimated value of the previous frame; d (n) is X (n) minus the difference of X (n-1); s (n) quantizing each step value for encoding; y (n) is output encoded data.
Step 1, firstly, DPCM coding operation is carried out on the original PCM signal to obtain coded data for transmission. Wherein, the detailed flow of the coding is as follows:
1) and subtracting the original signal from the prediction estimation value stored in the previous frame to obtain a signal difference value.
2) Step value calculation is used for quantizing the scale of coding, the algorithm comprises non-uniform quantization and uniform quantization, in order to reduce the non-linear change of the original speech signal, the uniform quantization is used, and the step value is a fixed value S (n).
The step value is adjusted by combining the characteristics of the voice signal and the corresponding compression ratio, the bit width of the voice signal is usually 16 bits, if the bit width is 2 times of the compression ratio, the voice signal needs to be compressed to 8 bits, and therefore the information quantity and the like need to be quantized to 8 bits step by step. The step length of each quantization is obtained by the above method.
3) And (3) carrying out quantization coding on the difference value in the step 1) through the quantization scale obtained in the step 2 to obtain a final coded value Y (n).
4) The above steps have completed a coding process, and the next frame needs to use the predicted value of the current frame and needs to be decoded by the decoding logic unit to obtain the predicted value
Figure BDA0001902685380000081
(with respect to the next frame X (n + 1)). The decoding operation is the inverse operation of step 3, and the quantization operation is also performed, and the quantization scale is the same as that of step 3.
And 2, the device 2 receives the coded data to perform decoding operation. Wherein the decoding operation is similar to the encoding operation:
1) and the coded data is sent to a step length calculator to calculate the step length, wherein the step length is constant by adopting uniform quantization as the coding.
2) And sending the step length and the coded data to a decoding logic unit together to obtain a difference value.
3) And adding the difference value and the prediction signal of the previous frame to obtain decoded original data. The whole process is the reverse process of coding.
And 3, sending the decoded data to front-end signal processing, such as echo cancellation, beam forming and the like, to obtain a voice signal after signal processing.
And 4, sending the voice signal obtained in the step 3 to a recognition engine or a wake-up engine.
The coding and decoding algorithm of the embodiment of the invention can firstly ensure the low loss of the signal and achieve the purpose of not influencing the signal processing. Meanwhile, the applicability of the method can be greatly expanded because the reliability of signal processing can be ensured. For example, data can be transmitted after being compressed on some CPUs without DMA, and the CPU occupancy rate is greatly reduced. In addition, when some low-end chips and high-end chips are communicated, data can be compressed and transmitted to the high-end chips for signal processing, and therefore the cost and low power consumption of the low-end chips are guaranteed.
It should be noted that for simplicity of explanation, the foregoing method embodiments are described as a series of acts or combination of acts, but those skilled in the art will appreciate that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention. In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
As shown in fig. 5, an embodiment of the present invention further provides a speech signal encoding apparatus 500, including:
a fixed step value determination module 510 for determining a fixed step value for encoding the speech signal in advance.
A difference calculation module 520, configured to calculate a difference between the speech signal at the current time and the predicted speech signal at the previous time; the voice signal at the current moment is PCM signal data based on an original voice signal; the predicted speech signal at the previous time is obtained based on decoding the encoding result.
An encoding logic module 530, configured to encode the difference value based on the fixed step value to obtain an encoding result.
As shown in fig. 6, an embodiment of the present invention further provides a speech signal decoding apparatus 600, including:
the signal receiving module 610 is configured to receive an encoding result obtained by encoding the speech signal at the current time by using the speech encoding method according to any of the foregoing embodiments of the present invention.
A decoding logic module 620, configured to decode the encoding result based on the fixed step value to obtain a decoding difference value.
An adder module 630, configured to add the decoded difference to the predicted speech signal at the previous time to obtain a decoded result.
In some embodiments, the present invention provides a non-transitory computer-readable storage medium, in which one or more programs including executable instructions are stored, and the executable instructions can be read and executed by an electronic device (including but not limited to a computer, a server, or a network device, etc.) to perform any one of the above-mentioned speech signal encoding methods or speech signal decoding methods of the present invention.
In some embodiments, the present invention further provides a computer program product comprising a computer program stored on a non-volatile computer-readable storage medium, the computer program comprising program instructions that, when executed by a computer, cause the computer to perform any one of the above speech signal encoding method or speech signal decoding method.
In some embodiments, an embodiment of the present invention further provides an electronic device, which includes: at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a speech signal encoding method or a speech signal decoding method.
In some embodiments, an embodiment of the present invention further provides a storage medium having a computer program stored thereon, wherein the program is characterized by a speech signal encoding method or a speech signal decoding method when the program is executed by a processor.
The speech signal encoding apparatus or the speech signal decoding apparatus according to the above-mentioned embodiment of the present invention may be configured to execute the speech signal encoding method or the speech signal decoding method according to the above-mentioned embodiment of the present invention, and accordingly achieve the technical effect achieved by the speech signal encoding method or the speech signal decoding method according to the above-mentioned embodiment of the present invention, which is not described herein again. In the embodiment of the present invention, the relevant functional module may be implemented by a hardware processor (hardware processor).
Fig. 7 is a schematic diagram of a hardware structure of an electronic device for performing a speech signal encoding/decoding method according to another embodiment of the present application, and as shown in fig. 7, the electronic device includes:
one or more processors 710 and a memory 720, one processor 710 being illustrated in fig. 7.
The apparatus for performing the voice signal encoding/decoding method may further include: an input device 730 and an output device 740.
The processor 710, the memory 720, the input device 730, and the output device 740 may be connected by a bus or other means, such as the bus connection in fig. 7.
The memory 720, which is a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules, such as program instructions/modules corresponding to the speech signal encoding/decoding method in the embodiments of the present application. The processor 710 executes various functional applications of the server and data processing, i.e., implements the voice signal encoding/decoding method of the above-described method embodiment, by executing nonvolatile software programs, instructions, and modules stored in the memory 720.
The memory 720 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the voice signal encoding/decoding method apparatus, and the like. Further, the memory 720 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, the memory 720 may optionally include a memory remotely located from the processor 710, and these remote memories may be connected to the speech signal encoding/decoding method apparatus through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input device 730 may receive input numeric or character information and generate signals related to user settings and function control of the voice signal encoding/decoding method device. The output device 740 may include a display device such as a display screen.
The one or more modules are stored in the memory 720 and, when executed by the one or more processors 710, perform a speech signal encoding/decoding method in any of the method embodiments described above.
The product can execute the method provided by the embodiment of the application, and has the corresponding functional modules and beneficial effects of the execution method. For technical details that are not described in detail in this embodiment, reference may be made to the methods provided in the embodiments of the present application.
The electronic device of the embodiments of the present application exists in various forms, including but not limited to:
(1) mobile communication devices, which are characterized by mobile communication capabilities and are primarily targeted at providing voice and data communications. Such terminals include smart speakers, story machines, smart phones (e.g., iphones), multimedia phones, functional phones, and low-end phones, among others.
(2) The ultra-mobile personal computer equipment belongs to the category of personal computers, has calculation and processing functions and generally has the characteristic of mobile internet access. Such terminals include PDA, MID, and UMPC devices, such as ipads.
(3) Portable entertainment devices such devices may display and play multimedia content. Such devices include audio and video players (e.g., ipods), handheld game consoles, electronic books, as well as smart toys and portable car navigation devices.
(4) The server is similar to a general computer architecture, but has higher requirements on processing capability, stability, reliability, safety, expandability, manageability and the like because of the need of providing highly reliable services.
(5) And other electronic devices with data interaction functions.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a general hardware platform, and certainly can also be implemented by hardware. Based on such understanding, the above technical solutions substantially or contributing to the related art may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims (7)

1. A speech signal encoding method comprising:
predetermining a fixed step value for encoding a speech signal;
calculating a difference between the speech signal at the current time and the predicted speech signal at the previous time; quantizing the difference value by adopting a preset quantization factor to compress the difference value; the voice signal at the current moment is PCM signal data based on an original voice signal;
coding the difference value based on the fixed stepping value to obtain a coding result; the predicted speech signal at the previous moment is obtained by decoding the coding result at the previous moment;
wherein the encoding the difference value based on the fixed step value to obtain an encoding result comprises: and coding the result after the difference value quantization based on the fixed stepping value to obtain a coding result.
2. A method of decoding a speech signal, comprising:
receiving a coding result obtained by coding the voice signal at the current moment by using the voice signal coding method according to claim 1;
decoding the encoding result based on the fixed stepping value to obtain a decoding difference value;
and adding the decoding difference value and the predicted voice signal at the previous moment to obtain a decoding result.
3. A method of encoding and decoding a speech signal, comprising:
the encoding step performed at the first terminal:
predetermining a fixed step value for encoding a speech signal;
calculating a difference between the speech signal at the current time and the predicted speech signal at the previous time; quantizing the difference value by adopting a preset quantization factor to compress the difference value; the voice signal at the current moment is PCM signal data based on an original voice signal;
coding the difference value based on the fixed stepping value to obtain a coding result; the predicted speech signal at the previous moment is obtained by decoding the coding result at the previous moment; wherein the encoding the difference value based on the fixed step value to obtain an encoding result comprises: coding the result after the difference value quantization based on the fixed stepping value to obtain a coding result;
decoding step performed at the second terminal:
receiving the encoding result;
decoding the encoding result based on the fixed stepping value to obtain a decoding difference value;
and adding the decoding difference value and the predicted voice signal at the previous moment to obtain a decoding result.
4. A speech signal encoding apparatus comprising:
a fixed step value determination module for determining in advance a fixed step value for encoding the speech signal;
the difference value calculating module is used for calculating the difference value between the voice signal at the current moment and the predicted voice signal at the previous moment; quantizing the difference value by adopting a preset quantization factor to compress the difference value; the voice signal at the current moment is PCM signal data based on an original voice signal;
the coding logic module is used for coding the difference value based on the fixed stepping value to obtain a coding result; the predicted speech signal at the previous moment is obtained by decoding the coding result at the previous moment; wherein the encoding the difference value based on the fixed step value to obtain an encoding result comprises: and coding the result after the difference value quantization based on the fixed stepping value to obtain a coding result.
5. A speech signal decoding apparatus comprising:
a signal receiving module, configured to receive a coding result obtained by coding the speech signal at the current time by using the speech signal coding method according to claim 1;
a decoding logic module, configured to decode the encoding result based on the fixed step value to obtain a decoding difference value;
and the adder module is used for adding the decoding difference value and the predicted voice signal at the previous moment to obtain a decoding result.
6. An electronic device, comprising: at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the steps of the method of any of claims 1-2.
7. A storage medium on which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any one of claims 1-2.
CN201811518677.9A 2018-12-12 2018-12-12 Voice coding method, voice decoding method and device Active CN109473116B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811518677.9A CN109473116B (en) 2018-12-12 2018-12-12 Voice coding method, voice decoding method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811518677.9A CN109473116B (en) 2018-12-12 2018-12-12 Voice coding method, voice decoding method and device

Publications (2)

Publication Number Publication Date
CN109473116A CN109473116A (en) 2019-03-15
CN109473116B true CN109473116B (en) 2021-07-20

Family

ID=65674796

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811518677.9A Active CN109473116B (en) 2018-12-12 2018-12-12 Voice coding method, voice decoding method and device

Country Status (1)

Country Link
CN (1) CN109473116B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111063347B (en) * 2019-12-12 2022-06-07 安徽听见科技有限公司 Real-time voice recognition method, server and client

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1432176A (en) * 2000-04-24 2003-07-23 高通股份有限公司 Method and appts. for predictively quantizing voice speech
CN105719654A (en) * 2011-04-21 2016-06-29 三星电子株式会社 Decoding device and method for sound signal or audio signal and quantizing device
CN106803425A (en) * 2011-06-01 2017-06-06 三星电子株式会社 Audio coding method and equipment, audio-frequency decoding method and equipment
CN107533847A (en) * 2015-03-09 2018-01-02 弗劳恩霍夫应用研究促进协会 Audio coder, audio decoder, the method for coded audio signal and the method for decoding encoded audio signal
US20180182403A1 (en) * 2016-12-27 2018-06-28 Fujitsu Limited Audio coding device and audio coding method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1432176A (en) * 2000-04-24 2003-07-23 高通股份有限公司 Method and appts. for predictively quantizing voice speech
CN105719654A (en) * 2011-04-21 2016-06-29 三星电子株式会社 Decoding device and method for sound signal or audio signal and quantizing device
CN106803425A (en) * 2011-06-01 2017-06-06 三星电子株式会社 Audio coding method and equipment, audio-frequency decoding method and equipment
CN107533847A (en) * 2015-03-09 2018-01-02 弗劳恩霍夫应用研究促进协会 Audio coder, audio decoder, the method for coded audio signal and the method for decoding encoded audio signal
US20180182403A1 (en) * 2016-12-27 2018-06-28 Fujitsu Limited Audio coding device and audio coding method

Also Published As

Publication number Publication date
CN109473116A (en) 2019-03-15

Similar Documents

Publication Publication Date Title
JP6125031B2 (en) Audio signal encoding and decoding method and audio signal encoding and decoding apparatus
US10089997B2 (en) Method for predicting high frequency band signal, encoding device, and decoding device
US9853659B2 (en) Split gain shape vector coding
JP6812504B2 (en) Voice coding method and related equipment
CN114550732B (en) Coding and decoding method and related device for high-frequency audio signal
CN109473116B (en) Voice coding method, voice decoding method and device
CN113299306B (en) Echo cancellation method, echo cancellation device, electronic equipment and computer-readable storage medium
CN115426075A (en) Encoding transmission method of semantic communication and related equipment
EP3079150B1 (en) Signal processing method and device
CN112968886B (en) Vibration signal compression method and device, storage medium and computer equipment
CN113409792A (en) Voice recognition method and related equipment thereof
CN109741756B (en) Method and system for transmitting operation signal based on USB external equipment
CN113903345A (en) Audio processing method and device and electronic device
CN108364657B (en) Method and decoder for processing lost frame
CN102419978B (en) Audio decoder and frequency spectrum reconstructing method and device for audio decoding
US20230298603A1 (en) Method for encoding and decoding audio signal using normalizing flow, and training method thereof
CN110033781B (en) Audio processing method, apparatus and non-transitory computer readable medium
US20230075562A1 (en) Audio Transcoding Method and Apparatus, Audio Transcoder, Device, and Storage Medium
KR20230069167A (en) Trained generative model voice coding
CN114613375A (en) Time domain noise shaping method and device for audio signal
CN115206330A (en) Audio processing method, audio processing apparatus, electronic device, and storage medium
CN117651081A (en) Data transmission method, device, equipment and storage medium
CN112908346A (en) Packet loss recovery method and device, electronic equipment and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 215123 building 14, Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou City, Jiangsu Province

Applicant after: Sipic Technology Co.,Ltd.

Address before: 215123 building 14, Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou City, Jiangsu Province

Applicant before: AI SPEECH Co.,Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: Speech coding method, speech decoding method and device

Effective date of registration: 20230726

Granted publication date: 20210720

Pledgee: CITIC Bank Limited by Share Ltd. Suzhou branch

Pledgor: Sipic Technology Co.,Ltd.

Registration number: Y2023980049433

PE01 Entry into force of the registration of the contract for pledge of patent right