CN111028848B - Compressed voice processing method and device and electronic equipment - Google Patents
Compressed voice processing method and device and electronic equipment Download PDFInfo
- Publication number
- CN111028848B CN111028848B CN201911162541.3A CN201911162541A CN111028848B CN 111028848 B CN111028848 B CN 111028848B CN 201911162541 A CN201911162541 A CN 201911162541A CN 111028848 B CN111028848 B CN 111028848B
- Authority
- CN
- China
- Prior art keywords
- audio
- data
- decompressed
- compressed
- original
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 13
- 238000012545 processing Methods 0.000 claims abstract description 60
- 238000000034 method Methods 0.000 claims abstract description 37
- 230000006837 decompression Effects 0.000 claims abstract description 33
- 230000009467 reduction Effects 0.000 claims abstract description 10
- 238000005070 sampling Methods 0.000 claims description 73
- 238000000605 extraction Methods 0.000 claims description 22
- 230000008569 process Effects 0.000 claims description 14
- 230000005236 sound signal Effects 0.000 claims description 9
- 238000004590 computer program Methods 0.000 description 14
- 238000010586 diagram Methods 0.000 description 12
- 230000006870 function Effects 0.000 description 6
- 238000011156 evaluation Methods 0.000 description 5
- 238000006243 chemical reaction Methods 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 4
- 238000011946 reduction process Methods 0.000 description 4
- 238000004891 communication Methods 0.000 description 3
- 230000000644 propagated effect Effects 0.000 description 2
- 238000013519 translation Methods 0.000 description 2
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Mathematical Physics (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Stereophonic System (AREA)
Abstract
The embodiment of the disclosure provides a compressed voice processing method, a device and an electronic device, belonging to the technical field of voice processing, wherein the method comprises the following steps: acquiring original audio needing decompression processing, wherein the original audio comprises one or more compressed audio channel data; performing decompression and frequency reduction processing on compressed data in the original audio to obtain decompressed audio data comprising a plurality of decompressed audio channels; generating output audio corresponding to the original audio based on the decompressed audio data. The scheme of the disclosure can improve the usability of the compressed voice.
Description
Technical Field
The present disclosure relates to the field of speech processing technologies, and in particular, to a compressed speech processing method and apparatus, and an electronic device.
Background
Recording is the process of recording sound signals on a medium, so that the sound is converted into electric signals through a microphone and an amplifier, and the electric signals are recorded by using different materials and processes. The recording occupies an important position in the intelligent sound box, only correct and high-quality voice data can exist, and the intelligent sound box can ensure subsequent better user experience. The speech data commonly used in smart-phones are raw PCM (pulse code modulation) data, which is heard by the human ear as an analog signal, and PCM is a technique for converting sound from an analog signal to a digital signal. The principle is that an analog signal is sampled by a fixed frequency, the sampled signal looks like a series of continuous pulses with different amplitudes on the waveform, the amplitudes of the pulses are quantized according to certain precision, and the quantized values are continuously output, transmitted, processed or recorded in a storage medium, so that the generation of digital audio needs three processes of sampling, quantizing and encoding. The sampling frequency is the number of times the device samples the analog signal in one second. The number of sampling bits, such as 8 bits, 16 bits, and 24 bits, refers to the number of bits used to describe the digital signal. The number of channels is the independent audio signals collected or played back at different spatial positions when the sound is recorded or played. During recording, the audio driver continuously sends the sampled PCM data back to the upper layer application, so that the upper layer application can complete other operations.
Disclosure of Invention
In view of this, embodiments of the present disclosure provide a compressed speech processing method, apparatus, and electronic device, which at least partially solve the problems in the prior art.
In a first aspect, an embodiment of the present disclosure provides a compressed speech processing method, including:
acquiring original audio needing decompression processing, wherein the original audio comprises one or more compressed audio channel data;
performing decompression and frequency reduction processing on compressed data in the original audio to obtain decompressed audio data comprising a plurality of decompressed audio channels;
generating output audio corresponding to the original audio based on the decompressed audio data.
According to a specific implementation manner of the embodiment of the present disclosure, the generating an output audio corresponding to the original audio based on the decompressed audio data includes:
and adding an extraction signal channel in the decompressed audio data, wherein the extraction signal channel is used for simulating an extraction signal.
According to a specific implementation manner of the embodiment of the present disclosure, the acquiring an original audio that needs to be decompressed includes:
sampling the audio signals of the N channels by adopting audio sampling equipment according to a preset sampling frequency;
and storing the sampling result into compressed audio channel data of M channels in a digital signal form, wherein N and M are natural numbers, and N is larger than M.
According to a specific implementation manner of the embodiment of the present disclosure, the acquiring the original audio that needs to be decompressed includes:
acquiring a storage position of a storage unit for storing the original audio;
the raw audio is read from the storage location.
According to a specific implementation manner of the embodiment of the present disclosure, the performing a decompression and frequency reduction process on the compressed data in the original audio includes:
acquiring the bit number L of the audio data in the compressed audio channel;
and splitting the audio data in the compressed audio channels based on the bit number L of the audio data to obtain two or more decompressed audio channels.
According to a specific implementation manner of the embodiment of the present disclosure, the splitting the audio data in the compressed audio channel includes:
and splitting the data in one compressed audio channel into the data in two decompressed audio channels in a mode of halving the bit number L of the audio data.
According to a specific implementation manner of the embodiment of the present disclosure, the performing a decompression and frequency reduction process on the compressed data in the original audio includes:
acquiring an original sampling frequency of the original audio;
and generating the decompressed audio data of the decompressed audio channel according to a decompressed sampling frequency, wherein the decompressed sampling frequency is less than the original sampling frequency.
According to a specific implementation manner of the embodiment of the present disclosure, the adding of an extraction signal channel in the decompressed audio data includes:
all values in the extraction signal channel are set to 0.
In a second aspect, an embodiment of the present disclosure provides a compressed speech processing apparatus, including:
the device comprises an acquisition module, a decompression module and a processing module, wherein the acquisition module is used for acquiring an original audio needing decompression processing, and the original audio comprises one or more compressed audio channel data;
the processing module is used for performing decompression and frequency reduction processing on compressed data in the original audio to obtain decompressed audio data comprising a plurality of decompressed audio channels;
and the generating module is used for generating output audio corresponding to the original audio based on the decompressed audio data.
According to a specific implementation manner of the embodiment of the present disclosure, the obtaining module is further configured to:
sampling the audio signals of the N channels by adopting audio sampling equipment according to a preset sampling frequency;
and storing the sampling result into compressed audio channel data of M channels in a digital signal form, wherein N and M are natural numbers, and N is larger than M.
According to a specific implementation manner of the embodiment of the present disclosure, the processing module is further configured to:
acquiring the bit number L of the audio data in the compressed audio channel;
and splitting the audio data in the compressed audio channels based on the bit number L of the audio data to obtain two or more decompressed audio channels.
According to a specific implementation manner of the embodiment of the present disclosure, the processing module is further configured to:
and splitting the data in one compressed audio channel into the data in two decompressed audio channels in a mode of halving the bit number L of the audio data.
According to a specific implementation manner of the embodiment of the present disclosure, the processing module is further configured to:
acquiring an original sampling frequency of the original audio;
and generating the decompressed audio data of the decompressed audio channel according to a decompressed sampling frequency, wherein the decompressed sampling frequency is less than the original sampling frequency.
In a third aspect, an embodiment of the present disclosure further provides an electronic device, where the electronic device includes:
at least one processor; and (c) a second step of,
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of processing compressed speech in any of the first aspects or any implementation manner of the first aspect.
In a fourth aspect, the disclosed embodiments also provide a non-transitory computer-readable storage medium storing computer instructions for causing the computer to execute the compressed speech processing method in the first aspect or any implementation manner of the first aspect.
In a fifth aspect, the disclosed embodiments also provide a computer program product, the computer program product comprising a computer program stored on a non-transitory computer-readable storage medium, the computer program comprising program instructions that, when executed by a computer, cause the computer to perform the compressed speech processing method in the foregoing first aspect or any implementation manner of the first aspect.
The compressed voice processing scheme in the embodiment of the disclosure comprises acquiring an original audio needing decompression processing, wherein the original audio comprises one or more compressed audio channel data; decompressing and frequency-reducing the compressed data in the original audio to obtain decompressed audio data comprising a plurality of decompressed audio channels; generating output audio corresponding to the original audio based on the decompressed audio data. According to the scheme, decompression and frequency reduction processing can be performed on the original audio, the audio data which accord with the actual application scene is generated, and the usability of the audio data is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings required to be used in the embodiments will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present disclosure, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a schematic diagram of a compressed speech processing flow according to an embodiment of the present disclosure;
FIG. 2 is a schematic diagram of another compressed speech processing flow provided by the embodiment of the present disclosure;
FIG. 3 is a schematic diagram of another compressed speech processing flow provided by the embodiment of the present disclosure;
FIG. 4 is a schematic diagram of another compressed speech processing flow provided by the embodiment of the present disclosure;
fig. 5 is a schematic structural diagram of a compressed speech processing apparatus according to an embodiment of the present disclosure;
fig. 6 is a schematic diagram of an electronic device provided in an embodiment of the disclosure.
Detailed Description
The embodiments of the present disclosure are described in detail below with reference to the accompanying drawings.
The embodiments of the present disclosure are described below with specific examples, and other advantages and effects of the present disclosure will be readily apparent to those skilled in the art from the disclosure in the specification. It is to be understood that the described embodiments are merely illustrative of some, and not restrictive, of the embodiments of the disclosure. The disclosure may be embodied or carried out in various other specific embodiments, and various modifications and changes may be made in the details within the description without departing from the spirit of the disclosure. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without inventive step, are intended to be within the scope of the present disclosure.
It is noted that various aspects of the embodiments are described below within the scope of the appended claims. It should be apparent that the aspects described herein may be embodied in a wide variety of forms and that any specific structure and/or function described herein is merely illustrative. Based on the disclosure, one skilled in the art should appreciate that one aspect described herein may be implemented independently of any other aspects and that two or more of these aspects may be combined in various ways. For example, an apparatus may be implemented and/or a method practiced using any number of the aspects set forth herein. Additionally, such an apparatus may be implemented and/or such a method may be practiced using other structure and/or functionality in addition to one or more of the aspects set forth herein.
It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present disclosure, and the drawings only show the components related to the present disclosure rather than the number, shape and size of the components in actual implementation, and the type, amount and ratio of the components in actual implementation may be changed arbitrarily, and the layout of the components may be more complicated.
In addition, in the following description, specific details are provided to facilitate a thorough understanding of the examples. However, it will be understood by those skilled in the art that the aspects may be practiced without these specific details.
The embodiment of the disclosure provides a compressed voice processing method. The compressed speech processing method provided by the embodiment can be executed by a computing device, which can be implemented as software or as a combination of software and hardware, and can be integrally arranged in a server, a terminal device and the like.
Referring to fig. 1, a method for processing compressed speech according to an embodiment of the present disclosure includes the following steps:
s101, original audio needing decompression processing is obtained, and the original audio comprises one or more compressed audio channel data.
The original audio is an original audio file sampled by a voice device (e.g., a smart speaker) in various ways, and the original audio may be various types of audio data. For example, the original audio may have different types of audio sampling rates (e.g., 32khz,64khz,128khz, etc.), and the original audio may further include a plurality of compressed audio channels, each of which contains a predetermined number of bits (e.g., 512 bits) of audio data.
As an example, 16 ms of audio data is read from a device at a sampling frequency of 32khz, the voice data in the device is compressed, 4 channels of data are compressed into 2 channels of data, and cannot be used directly, and the voice data format is arranged as follows:
the 1 st channel:
1_1 | 1_2 | ... | 1_256 | 1_257 | 1_258 | ... | 1_512 |
and (2) a channel:
2_1 | 2_2 | ... | 2_256 | 2_257 | 2_258 | ... | 2_512 |
s102, performing decompression and frequency reduction processing on the compressed data in the original audio to obtain decompressed audio data containing a plurality of decompressed audio channels.
The data in the original audio is compressed and cannot be directly used as audio data, so that decompression processing needs to be performed on the compressed data in the original audio, in the process of the decompression processing, in order to enable the decompressed data to be more suitable for different application scenes, down-conversion processing needs to be performed on the compressed data in the original audio, and the decompressed audio data comprising a plurality of decompressed audio channels can be obtained after the down-conversion processing.
As an example, for compressed speech data comprising 2 channels in the original audio, each channel comprises 512 bits of data, with a sampling frequency of 32khz. In the process of performing decompression and frequency reduction, the compressed voice data is decompressed and audio data with a sampling frequency of 16khz is output. The data of 1 to 256 of the first channel of voice data is taken as the processed first channel voice data, and the data of 257 to 512 of the first channel of voice data is taken as the processed second channel voice data. Data of 1 to 256 of the second channel of the voice data is taken as processed voice data of the third channel, and data of 257 to 512 of the second channel of the voice data is taken as processed voice data of the fourth channel. As follows:
channel 1:
1_1 | 1_2 | ... | 1_256 |
and (2) a channel:
1_257 | 1_258 | ... | 1_512 |
and (3) a channel:
2_1 | 2_2 | ... | 2_256 |
and (4) a channel:
2_257 | 2_258 | ... | 2_512 |
s103, generating output audio corresponding to the original audio based on the decompressed audio data.
After the decompressed audio data is obtained, an output audio conforming to an actual application scene may be generated based on the decompressed audio data. For example, the output audio may find application in application scenarios such as wake-up, recognition, translation, etc.
Through the content in the embodiment, audio data conforming to an actual application scene can be generated.
According to a specific implementation manner of the embodiment of the present disclosure, in a process of generating an output audio corresponding to the original audio based on the decompressed audio data, an extraction signal channel may be added to the decompressed audio data, where the extraction signal channel is used to simulate an extraction signal. For example, after the audio data of 4 channels generated in step S102, an extraction signal channel is added, all values of which are 0, so as to simulate a normal extraction signal for voice data processing.
And (5) a channel:
0 | 0 | ... | 0 |
by adding an extraction signal channel, more easy-to-use channels are provided in the output audio.
Referring to fig. 2, according to a specific implementation manner of the embodiment of the present disclosure, the acquiring original audio that needs to be decompressed includes:
s201, sampling the audio signals of the N channels by adopting an audio sampling device according to a preset sampling frequency.
The sound may be directly collected by an audio sampling device (e.g., a microphone) to obtain the original audio information, and during the sampling process, a preset sampling frequency may be set, and in order to ensure the authenticity of the sampled signal, the sampling frequency is usually set to a higher sampling frequency. For performing a sampling operation on the audio signals of the N channels. Typically N is an even number, e.g., N may be 4, 8, etc.
S202, storing the sampling result into compressed audio channel data of M channels in a digital signal form, wherein N and M are natural numbers, and N is larger than M.
In obtaining original audio data, in order to reduce the size of the sampled data, the sampled data is usually subjected to compression processing, for example, the sampled result is stored in the form of a digital signal as compressed audio channel data of M channels, where N and M are natural numbers, and N is greater than M. For example, N is 4 and M is 2.
In addition to performing direct sampling, the method may further directly read the original audio from a preset location, and according to a specific implementation manner of the embodiment of the present disclosure, the acquiring the original audio that needs to be decompressed includes: acquiring a storage position of a storage unit for storing the original audio; the raw audio is read from the storage location.
Referring to fig. 3, according to a specific implementation manner of the embodiment of the present disclosure, the performing a decompression and frequency reduction process on the compressed data in the original audio includes:
s301, acquiring the bit number L of the audio data in the compressed audio channel.
The audio data in the compressed audio channel typically stores audio data with a higher number of data bits, for which purpose the number L of bits of audio data in the compressed audio channel may be obtained, e.g. L is 512.
S302, splitting the audio data in the compressed audio channel based on the bit number L of the audio data to obtain two or more decompressed audio channels.
As an example, for compressed speech data comprising 2 channels in the original audio, each channel comprises 512 bits of data, with a sampling frequency of 32khz. In the process of performing decompression and frequency reduction, the compressed voice data is decompressed and audio data with a sampling frequency of 16khz is output. The data of 1 to 256 of the first channel of voice data is taken as the processed first channel voice data, and the data of 257 to 512 of the first channel of voice data is taken as the processed second channel voice data. And taking the data from 1 to 256 of the second channel of the voice data as the processed voice data of the third channel, and taking the data from 257 to 512 of the second channel of the voice data as the processed voice data of the fourth channel.
According to a specific implementation manner of the embodiment of the present disclosure, the splitting the audio data in the compressed audio channel includes: and splitting the data in one compressed audio channel into the data in two decompressed audio channels in a mode of halving the bit number L of the audio data.
Referring to fig. 4, according to a specific implementation manner of the embodiment of the present disclosure, the performing a decompression and frequency reduction process on the compressed data in the original audio includes:
s401, acquiring an original sampling frequency of the original audio.
S402, generating the decompressed audio data of the decompressed audio channel according to a decompressed sampling frequency, wherein the decompressed sampling frequency is less than the original sampling frequency.
The original sampling frequency is usually a higher sampling frequency, and a decompression processing needs to be performed on the sampling frequency during the decompression processing, and for this reason, a decompression sampling frequency smaller than the original sampling frequency may be selected to perform the decompression processing on the original audio. As an example, the original sampling frequency may be 128khz and the decompressed sampling frequency may be 32khz.
According to a specific implementation manner of the embodiment of the present disclosure, the adding of an extraction signal channel in the decompressed audio data includes: all values in the extraction signal path are set to 0.
Corresponding to the above method embodiment, referring to fig. 5, the embodiment of the present disclosure further provides a compressed speech processing apparatus 50, which includes
An obtaining module 501, configured to obtain an original audio that needs to be decompressed, where the original audio includes one or more compressed audio channel data.
The original audio is an original audio file sampled by a voice device (e.g., a smart speaker) in various ways, and the original audio may be various types of audio data. For example, the original audio may have different types of audio sampling rates (e.g., 32khz,64khz,128khz, etc.), and the original audio may further include a plurality of compressed audio channels, each of which contains a predetermined number of bits (e.g., 512 bits) of audio data.
As an example, 16 ms of audio data is read from a device at a sampling frequency of 32khz, the voice data in the device is compressed, 4 channels of data are compressed into 2 channels of data, and the data cannot be used directly, and the format of the voice data is arranged as follows:
the 1 st channel:
1_1 | 1_2 | ... | 1_256 | 1_257 | 1_258 | ... | 1_512 |
and (2) a channel:
2_1 | 2_2 | ... | 2_256 | 2_257 | 2_258 | ... | 2_512 |
the processing module 502 is configured to perform decompression and frequency reduction processing on the compressed data in the original audio to obtain decompressed audio data including multiple decompressed audio channels.
The data in the original audio is compressed and cannot be directly used as audio data, so that decompression processing needs to be performed on the compressed data in the original audio, in the process of the decompression processing, in order to enable the decompressed data to be more suitable for different application scenes, down-conversion processing needs to be performed on the compressed data in the original audio, and the decompressed audio data comprising a plurality of decompressed audio channels can be obtained after the down-conversion processing.
As an example, for compressed speech data comprising 2 channels in the original audio, each channel comprises 512 bits of data, with a sampling frequency of 32khz. In the process of performing decompression and frequency reduction, the compressed voice data is decompressed and audio data with a sampling frequency of 16khz is output. The data of 1 to 256 of the first channel of voice data is taken as the processed first channel voice data, and the data of 257 to 512 of the first channel of voice data is taken as the processed second channel voice data. And taking the data from 1 to 256 of the second channel of the voice data as the processed voice data of the third channel, and taking the data from 257 to 512 of the second channel of the voice data as the processed voice data of the fourth channel. As follows:
channel 1:
1_1 | 1_2 | ... | 1_256 |
and (2) a channel:
1_257 | 1_258 | ... | 1_512 |
and (3) a channel:
2_1 | 2_2 | ... | 2_256 |
and (4) a channel:
2_257 | 2_258 | ... | 2_512 |
a generating module 503, configured to generate an output audio corresponding to the original audio based on the decompressed audio data.
After the decompressed audio data is obtained, an output audio conforming to an actual application scene may be generated based on the decompressed audio data. For example, the output audio may find application in application scenarios such as wake-up, recognition, translation, etc.
By the content in the embodiment, audio data conforming to an actual application scene can be generated.
According to a specific implementation manner of the embodiment of the present disclosure, the obtaining module is further configured to:
sampling the audio signals of the N channels by adopting audio sampling equipment according to a preset sampling frequency;
and storing the sampled result in a digital signal form as compressed audio channel data of M channels, wherein N and M are natural numbers, and N is greater than M.
According to a specific implementation manner of the embodiment of the present disclosure, the processing module is further configured to:
acquiring the bit number L of the audio data in the compressed audio channel;
and splitting the audio data in the compressed audio channels based on the bit number L of the audio data to obtain two or more decompressed audio channels.
According to a specific implementation manner of the embodiment of the present disclosure, the processing module is further configured to:
and splitting the data in one compressed audio channel into the data in two decompressed audio channels in a mode of equally dividing the bit number L of the audio data.
According to a specific implementation manner of the embodiment of the present disclosure, the processing module is further configured to:
acquiring an original sampling frequency of the original audio;
and generating the decompressed audio data of the decompressed audio channel according to a decompressed sampling frequency, wherein the decompressed sampling frequency is less than the original sampling frequency.
The apparatus shown in fig. 5 may correspondingly execute the content in the above method embodiment, and details of the part not described in detail in this embodiment refer to the content described in the above method embodiment, which is not described again here.
Referring to fig. 6, an embodiment of the present disclosure also provides an electronic device 60, which includes:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the method of compressed speech processing of the method embodiments described above.
Embodiments of the present disclosure also provide a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the foregoing method embodiments.
Embodiments of the present disclosure also provide a computer program product comprising a computer program stored on a non-transitory computer-readable storage medium, the computer program comprising program instructions that, when executed by a computer, cause the computer to perform the compressed speech processing method in the foregoing method embodiments.
Referring now to FIG. 6, a schematic diagram of an electronic device 60 suitable for use in implementing embodiments of the present disclosure is shown. The electronic devices in the embodiments of the present disclosure may include, but are not limited to, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., car navigation terminals), and the like, and fixed terminals such as digital TVs, desktop computers, and the like. The electronic device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 6, the electronic device 60 may include a processing means (e.g., central processing unit, graphics processor, etc.) 601 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM) 602 or a program loaded from a storage means 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data necessary for the operation of the electronic apparatus 60 are also stored. The processing device 601, the ROM 602, and the RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
Generally, the following devices may be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touch pad, keyboard, mouse, image sensor, microphone, accelerometer, gyroscope, or the like; output devices 607 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 608 including, for example, magnetic tape, hard disk, etc.; and a communication device 609. The communication means 609 may allow the electronic device 60 to communicate with other devices wirelessly or by wire to exchange data. While the figures illustrate an electronic device 60 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.
In particular, the processes described above with reference to the flow diagrams may be implemented as computer software programs, according to embodiments of the present disclosure. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 609, or may be installed from the storage means 608, or may be installed from the ROM 602. The computer program, when executed by the processing device 601, performs the above-described functions defined in the methods of the embodiments of the present disclosure.
It should be noted that the computer readable medium of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
The computer readable medium may be embodied in the electronic device; or may be separate and not incorporated into the electronic device.
The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring at least two internet protocol addresses; sending a node evaluation request comprising the at least two internet protocol addresses to node evaluation equipment, wherein the node evaluation equipment selects the internet protocol addresses from the at least two internet protocol addresses and returns the internet protocol addresses; receiving an internet protocol address returned by the node evaluation equipment; wherein the obtained internet protocol address indicates an edge node in the content distribution network.
Alternatively, the computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: receiving a node evaluation request comprising at least two internet protocol addresses; selecting an internet protocol address from the at least two internet protocol addresses; returning the selected internet protocol address; wherein the received internet protocol address indicates an edge node in the content distribution network.
Computer program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present disclosure may be implemented by software or hardware. Where the name of a unit does not in some cases constitute a limitation of the unit itself, for example, the first retrieving unit may also be described as a "unit for retrieving at least two internet protocol addresses".
It should be understood that portions of the present disclosure may be implemented in hardware, software, firmware, or a combination thereof.
The above description is only for the specific embodiments of the present disclosure, but the scope of the present disclosure is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present disclosure should be covered within the scope of the present disclosure. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.
Claims (13)
1. A method for compressed speech processing, comprising:
acquiring original audio needing decompression processing, wherein the original audio comprises one or more compressed audio channel data;
decompressing and frequency-reducing the compressed data in the original audio to obtain decompressed audio data comprising a plurality of decompressed audio channels;
generating output audio corresponding to the original audio based on the decompressed audio data;
wherein the generating of the output audio corresponding to the original audio based on the decompressed audio data comprises:
adding an extraction signal channel in the decompressed audio data, wherein the extraction signal channel is used for simulating an extraction signal;
wherein, the adding of an extraction signal channel in the decompressed audio data comprises:
all values in the extraction signal path are set to 0.
2. The method according to claim 1, wherein the obtaining the original audio that needs to be decompressed comprises:
sampling the audio signals of the N channels by adopting audio sampling equipment according to a preset sampling frequency;
and storing the sampling result into compressed audio channel data of M channels in a digital signal form, wherein N and M are natural numbers, and N is larger than M.
3. The method according to claim 1, wherein the obtaining the original audio that needs to be decompressed comprises:
acquiring a storage position of a storage unit for storing the original audio;
the raw audio is read from the storage location.
4. The method of claim 1, wherein performing a decompression downconversion process on the compressed data in the original audio comprises:
acquiring the bit number L of the audio data in the compressed audio channel;
and splitting the audio data in the compressed audio channels based on the bit number L of the audio data to obtain two or more decompressed audio channels.
5. The method of claim 4, wherein the splitting the audio data in the compressed audio channels comprises:
and splitting the data in one compressed audio channel into the data in two decompressed audio channels in a mode of halving the bit number L of the audio data.
6. The method of claim 1, wherein performing a decompression downconversion process on the compressed data in the original audio comprises:
acquiring an original sampling frequency of the original audio;
and generating the decompressed audio data of the decompressed audio channel according to a decompressed sampling frequency, wherein the decompressed sampling frequency is less than the original sampling frequency.
7. A compressed speech processing apparatus, comprising:
the device comprises an acquisition module, a decompression module and a processing module, wherein the acquisition module is used for acquiring an original audio needing decompression processing, and the original audio comprises one or more compressed audio channel data; the processing module is used for performing decompression and frequency reduction processing on compressed data in the original audio to obtain decompressed audio data comprising a plurality of decompressed audio channels;
a generating module, configured to generate an output audio corresponding to the original audio based on the decompressed audio data;
the generating module is further configured to add an extraction signal channel to the decompressed audio data, where the extraction signal channel is used to simulate an extraction signal;
the generating module is further configured to set all values in the extraction signal channel to 0.
8. The apparatus of claim 7, wherein the obtaining module is further configured to: sampling the audio signals of the N channels by adopting audio sampling equipment according to a preset sampling frequency;
and storing the sampled result in a digital signal form as compressed audio channel data of M channels, wherein N and M are natural numbers, and N is greater than M.
9. The apparatus of claim 7, wherein the processing module is further configured to: acquiring the bit number L of the audio data in the compressed audio channel;
and splitting the audio data in the compressed audio channels based on the bit number L of the audio data to obtain two or more decompressed audio channels.
10. The apparatus of claim 7, wherein the processing module is further configured to: and splitting the data in one compressed audio channel into the data in two decompressed audio channels in a mode of halving the bit number L of the audio data.
11. The apparatus of claim 7, wherein the processing module is further configured to: acquiring an original sampling frequency of the original audio;
and generating the decompressed audio data of the decompressed audio channel according to a decompressed sampling frequency, wherein the decompressed sampling frequency is less than the original sampling frequency.
12. An electronic device, characterized in that the electronic device comprises:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the compressed speech processing method of any of the preceding claims 1-6.
13. A non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the compressed speech processing method of any one of claims 1-6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911162541.3A CN111028848B (en) | 2019-11-25 | 2019-11-25 | Compressed voice processing method and device and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911162541.3A CN111028848B (en) | 2019-11-25 | 2019-11-25 | Compressed voice processing method and device and electronic equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111028848A CN111028848A (en) | 2020-04-17 |
CN111028848B true CN111028848B (en) | 2022-10-11 |
Family
ID=70203379
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911162541.3A Active CN111028848B (en) | 2019-11-25 | 2019-11-25 | Compressed voice processing method and device and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111028848B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101964188A (en) * | 2010-04-09 | 2011-02-02 | 华为技术有限公司 | Voice signal coding and decoding methods, devices and systems |
CN103037222A (en) * | 2012-12-04 | 2013-04-10 | 中国北方车辆研究所 | Compression transmission device and method of parallel digital video signal |
JP2018007126A (en) * | 2016-07-06 | 2018-01-11 | 株式会社日立製作所 | Signal compression/decompression method |
CN110335615A (en) * | 2019-05-05 | 2019-10-15 | 北京字节跳动网络技术有限公司 | Processing method, device, electronic equipment and the storage medium of audio data |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8020075B2 (en) * | 2007-03-16 | 2011-09-13 | Apple Inc. | Channel quality index feedback reduction for broadband systems |
US10382950B2 (en) * | 2016-09-16 | 2019-08-13 | Qualcomm Incorporated | Techniques and apparatuses for accessing a header-compressed broadcast transmission |
-
2019
- 2019-11-25 CN CN201911162541.3A patent/CN111028848B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101964188A (en) * | 2010-04-09 | 2011-02-02 | 华为技术有限公司 | Voice signal coding and decoding methods, devices and systems |
CN103037222A (en) * | 2012-12-04 | 2013-04-10 | 中国北方车辆研究所 | Compression transmission device and method of parallel digital video signal |
JP2018007126A (en) * | 2016-07-06 | 2018-01-11 | 株式会社日立製作所 | Signal compression/decompression method |
CN110335615A (en) * | 2019-05-05 | 2019-10-15 | 北京字节跳动网络技术有限公司 | Processing method, device, electronic equipment and the storage medium of audio data |
Also Published As
Publication number | Publication date |
---|---|
CN111028848A (en) | 2020-04-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10867618B2 (en) | Speech noise reduction method and device based on artificial intelligence and computer device | |
CN110335615B (en) | Audio data processing method and device, electronic equipment and storage medium | |
CN113257218B (en) | Speech synthesis method, device, electronic equipment and storage medium | |
CN110070884B (en) | Audio starting point detection method and device | |
CN111724807A (en) | Audio separation method and device, electronic equipment and computer readable storage medium | |
CN111540344B (en) | Acoustic network model training method and device and electronic equipment | |
CN111417054B (en) | Multi-audio-frequency data channel array generating method and device, electronic equipment and storage medium | |
JP2023520570A (en) | Volume automatic adjustment method, device, medium and equipment | |
CN111402867B (en) | Hybrid sampling rate acoustic model training method and device and electronic equipment | |
CN116072108A (en) | Model generation method, voice recognition method, device, medium and equipment | |
CN110070885B (en) | Audio starting point detection method and device | |
CN111369968A (en) | Sound reproduction method, device, readable medium and electronic equipment | |
CN111028848B (en) | Compressed voice processing method and device and electronic equipment | |
CN112461244A (en) | Express cabinet positioning method and device based on longitude and latitude and electronic equipment | |
CN112734631A (en) | Video image face changing method, device, equipment and medium based on fine adjustment model | |
CN110085214B (en) | Audio starting point detection method and device | |
CN110852042A (en) | Character type conversion method and device | |
CN109375892B (en) | Method and apparatus for playing audio | |
CN111768762B (en) | Voice recognition method and device and electronic equipment | |
CN112307161B (en) | Method and apparatus for playing audio | |
CN111147655B (en) | Model generation method and device | |
CN111190599A (en) | Method and device for realizing android self-defined soft keyboard by using Dialog and electronic equipment | |
CN115051991B (en) | Audio processing method, device, storage medium and electronic equipment | |
CN110263797B (en) | Method, device and equipment for estimating key points of skeleton and readable storage medium | |
CN110715720B (en) | Terminal vibration method, terminal and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |