WO2020238058A1 - 语音传输方法、装置、计算机装置及存储介质 - Google Patents

语音传输方法、装置、计算机装置及存储介质 Download PDF

Info

Publication number
WO2020238058A1
WO2020238058A1 PCT/CN2019/118022 CN2019118022W WO2020238058A1 WO 2020238058 A1 WO2020238058 A1 WO 2020238058A1 CN 2019118022 W CN2019118022 W CN 2019118022W WO 2020238058 A1 WO2020238058 A1 WO 2020238058A1
Authority
WO
WIPO (PCT)
Prior art keywords
transmission rate
voice
transmitted
voice information
information
Prior art date
Application number
PCT/CN2019/118022
Other languages
English (en)
French (fr)
Inventor
邹昆伦
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020238058A1 publication Critical patent/WO2020238058A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • H04L1/0001Systems modifying transmission characteristics according to link quality, e.g. power backoff
    • H04L1/0002Systems modifying transmission characteristics according to link quality, e.g. power backoff by adapting the transmission rate

Definitions

  • This application relates to the field of communication technology, and in particular to a voice transmission method, device, computer device and storage medium.
  • voice call products have better call quality when the network condition is good.
  • the network condition is not good, they may be During transmission, the sound freezes due to discontinuous transmission, which reduces the quality of voice calls and affects user experience.
  • This application provides a voice transmission method, the method includes:
  • the transmission rate is lower than the preset transmission rate, performing voice recognition on the voice information to be transmitted to obtain a voice recognition result, where the voice recognition result includes text information corresponding to the voice information to be transmitted;
  • the present application also provides a voice transmission device, which includes:
  • a receiving module configured to receive a voice call transmission instruction sent by a first terminal, obtain voice information to be transmitted according to the voice call transmission instruction, and a second terminal that receives the voice information to be transmitted;
  • An acquiring module for acquiring the transmission rate when transmitting the voice information to be transmitted
  • a judging module for judging whether the transmission rate is lower than a preset transmission rate
  • a recognition module configured to perform voice recognition on the voice information to be transmitted if the transmission rate is lower than the preset transmission rate to obtain a voice recognition result, where the voice recognition result includes the text corresponding to the voice information to be transmitted information;
  • An encoding module configured to perform voice encoding on the text information contained in the voice recognition result to obtain target voice information
  • the first transmission module is configured to transmit the target voice information to the second terminal.
  • the present application also provides a computer device.
  • the computer device includes a memory and a processor, the memory is used to store at least one instruction, and the processor is used to execute the at least one instruction to implement the voice described in any embodiment. Transmission method.
  • the present application also provides a non-volatile readable storage medium, the non-volatile readable storage medium stores at least one instruction, and when the at least one instruction is executed by a processor, the voice described in any of the embodiments is implemented. Transmission method.
  • this application receives the voice call transmission instruction sent by the first terminal, obtains the voice information to be transmitted and the second terminal receiving the voice information to be transmitted according to the voice call transmission instruction;
  • the transmission rate of the voice information to be transmitted determine whether the transmission rate is lower than the preset transmission rate; if the transmission rate is lower than the preset transmission rate, perform voice recognition on the voice information to be transmitted to obtain voice recognition
  • the voice recognition result includes the text information corresponding to the voice information to be transmitted; the text information contained in the voice recognition result is voice-encoded to obtain target voice information; the target voice information is transmitted to the second terminal.
  • the transmission rate is lower than the preset transmission rate, the text information contained in the voice recognition result is encoded, the voice content of the voice information to be transmitted is retained, and the information encoded during voice encoding is reduced, which is beneficial for smooth voice calls.
  • the transmission rate is lower than the preset transmission rate, the text information contained in the voice recognition result is encoded, the voice content of the voice information to be transmitted is retained, and the information encoded during voice encoding is reduced, which is beneficial for smooth voice calls.
  • FIG. 1 is a flowchart of a voice transmission method provided by an embodiment of the present application
  • Figure 2 is a functional block diagram of a voice transmission device provided by an embodiment of the present application.
  • Fig. 3 is a schematic structural diagram of a computer device implementing a preferred embodiment of the voice transmission method according to the present application.
  • FIG. 1 is a flowchart of a voice transmission method provided by an embodiment of this application. According to different needs, the order of the steps in the flowchart can be changed, and some steps can be omitted.
  • S11 Receive a voice call transmission instruction sent by a first terminal, and acquire voice information to be transmitted and a second terminal that receives the voice information to be transmitted according to the voice call transmission instruction.
  • the first terminal and the second terminal may be the same electronic device or different electronic devices, for example, the first terminal and the second terminal are both mobile phones, or the first terminal One terminal is a mobile phone, and the second terminal is a computer.
  • the voice call transmission instruction is an instruction used to send voice information between two terminals.
  • the first terminal is the sender of voice information, that is, the calling party
  • the second terminal is the receiver of voice information, that is, the called party.
  • obtaining the voice information to be transmitted and the second terminal receiving the voice information to be transmitted according to the voice call transmission instruction includes: obtaining the voice information to be transmitted indicated by the voice call transmission instruction and receiving the voice information to be transmitted The second terminal of information.
  • the voice call transmission instruction includes the voice information to be transmitted and the receiver of the voice information to be transmitted, that is, the second terminal that receives the transmitted voice information.
  • S12 Acquire a transmission rate when transmitting the voice information to be transmitted.
  • the transmission rate is the network transmission rate, which refers to the rate at which the host on the computer network transmits data on the digital channel.
  • the transmission rate is 16bit/s, which means that the amount of data transmitted per second is 16bit.
  • acquiring the transmission rate when transmitting the voice information to be transmitted includes: acquiring the sending rate of the first terminal or the receiving rate of the second terminal when the first terminal transmits the voice information to be transmitted to the second terminal.
  • the transmission rate of the calling party which reflects the data transmission rate when the caller sends voice information to the base station/server
  • Get the transmission rate of the called party which reflects the data transmission rate when the called party receives voice information.
  • determining whether the transmission rate is lower than the preset transmission rate is used to determine whether the two parties in communication are in a poor network environment during voice transmission, and whether the call quality will be affected.
  • the specific value of the preset transmission rate can be preset according to needs.
  • the preset transmission rate is 8 kbit/s.
  • the method further includes:
  • the transmission rate is lower than the first transmission rate, encode and transmit the voice information to be transmitted according to the GIA coding standard;
  • the transmission rate is lower than the second transmission rate, encode and transmit the voice information to be transmitted according to the GSM coding standard;
  • the transmission rate is lower than the three transmission rates, encode and transmit the voice information to be transmitted according to the G.728 coding standard;
  • the transmission rate is lower than the fourth transmission rate, encode and transmit the voice information to be transmitted according to the G.721 coding standard;
  • the transmission rate is lower than the fifth transmission rate, encode and transmit the voice information to be transmitted according to the G.722 coding standard;
  • the voice information to be transmitted is encoded and transmitted using the MPE encoding standard.
  • Encoding is the process of expressing information with codes.
  • the frequency value of the sound at a certain point and the energy value of the frequency are extracted and digitally quantized.
  • any digital audio coding scheme is lossy.
  • the highest fidelity encoding method is PCM encoding.
  • PCM encoding can be close to the original sound infinitely.
  • PCM is bulky and is not conducive to transmission. Therefore, during the audio transmission process, we will perform other forms of encoding on the audio. The audio is compressed to improve the smoothness of transmission.
  • different encoding algorithms are used to encode voice information based on different encoding standards.
  • encoding based on the G.722 encoding standard is achieved through the SB-ADPCM algorithm, encoding based on the G.721 encoding standard through the ADPCM algorithm, encoding based on the G.728 encoding standard through the LD-CELP algorithm, and encoding based on the RPE-LTP algorithm Realize coding based on GSM coding standard, and realize coding based on GIA coding standard through VSELPC algorithm.
  • the first transmission rate is 13.2 kbt/s
  • the second transmission rate is 16 kbt/s
  • the third transmission rate is 32 kbt/s
  • the fourth transmission rate is 64 kbt/s. 5.
  • the transmission rate is 128kbt/s.
  • voice recognition refers to converting a voice signal into corresponding text information.
  • voice recognition is performed on the voice information to be transmitted through voice recognition technology.
  • the performing voice recognition on the voice information to be transmitted includes:
  • the preset acoustic model and the preset language model can be selected as required.
  • S15 Perform voice encoding on the text information contained in the voice recognition result to obtain target voice information.
  • the voice encoding of the text information contained in the voice recognition result is to encode the text information, which is different from the traditional encoding of sound samples, which can greatly reduce the amount of data during transmission.
  • the traditional encoding method is to sample and encode the frequency and amplitude of the sound.
  • the calculation method of the amount of data transmitted during traditional encoding is as follows:
  • Different numbers of characters such as Chinese characters
  • have corresponding character encoding sizes which can be determined according to the correspondence between preset characters and character encoding sizes Corresponding character encoding size.
  • the amount of data transmitted per second is greatly reduced.
  • the voice recognition result further includes a voice feature of the voice information to be transmitted, and the voice feature includes a pitch frequency;
  • the voice encoding of the text information contained in the voice recognition result includes:
  • Voice features refer to information that reflects voice features. For example, the intensity, loudness, or pitch of speech.
  • the air flow through the glottis causes the vocal cords to produce relaxation and oscillating vibrations, resulting in a quasi-periodic pulsed air flow.
  • This air flow excites the vocal tract to produce voiced sounds, also known as voiced speech, which carries the big words in the speech. Part of the energy. The frequency of this vocal cord vibration becomes the pitch frequency.
  • the pitch frequency is related to the length, thickness, toughness, stiffness and pronunciation habits of the vocal cords, which can reflect personal characteristics to a large extent. Therefore, in this embodiment, encoding is performed in combination with the pitch frequency, which can ensure the accurate delivery of the content while retaining the sound characteristics to the greatest extent.
  • the pitch frequency of the voice information can be obtained by the cepstrum method.
  • the text information corresponding to the voice information to be transmitted and the voice features of the voice information to be transmitted are encoded by combining text information with voice features, which is also different from the traditional encoding of sound samples. Greatly reduce the amount of data during transmission.
  • the encoding size can be determined according to the corresponding relationship between the preset character and the character encoding size.
  • the embodiment performs voice information transmission, the amount of data transmitted per second is greatly reduced.
  • the target voice information is decoded, that is, the received text information (or text information and voice features) is restored to voice.
  • white noise is a piece of sound. Specifically, white noise is noise whose power spectral density is uniformly distributed in the entire frequency domain.
  • the voice content can still be greatly retained in the case of extremely poor network, avoiding the voice intermittent, loss of voice content or even failure during voice calls. The status of the call.
  • the method further includes:
  • sending a suggestion message for enhancing the network signal strength to the first terminal or the second terminal may specifically be a suggestion on how to make the first terminal or the second terminal enhance the network signal Therefore, it is beneficial to have a higher transmission rate when the voice information to be transmitted is transmitted between the first terminal and the second terminal, thereby improving the quality of the voice call.
  • the suggestion message includes a recommended network connection or a recommended moving route.
  • the recommended connection network is another connectable network recommended to the first terminal or the second terminal.
  • the recommended moving position refers to the position where the first terminal or the second terminal can be moved to enhance the network signal of the first terminal or the second terminal.
  • the recommended connection network may be obtained in the following manner, and the method further includes:
  • Acquire connectable networks around the first terminal or the second terminal acquire historical connected networks in the connectable networks, and acquire networks with network signal strengths greater than the network signal strength threshold in the historical networks as recommended networks.
  • the historical connection network refers to a network to which the first terminal or the second terminal has been connected.
  • a secure network By obtaining a network with a network signal strength greater than a network signal strength threshold in a secure network as a recommended network, a secure network can be obtained, and the first terminal or the second terminal can be connected to the secure network, avoiding network security problems.
  • the recommended moving route may be obtained in the following manner, and the method further includes:
  • the first position is used as the starting position
  • the second position is used as the ending position
  • the moving route between the starting position and the ending position is acquired as the recommended moving route.
  • the closer you are to the router the better the network signal strength can be obtained.
  • obtaining the recommended moving route can facilitate the movement of the first terminal or the second terminal, so that the first terminal or the second terminal has better network signal strength, which is beneficial to the first terminal and the second terminal
  • the transmission rate when the voice information to be transmitted is transmitted between each other is higher, thereby improving the quality of the voice call.
  • the present application provides a voice transmission method, which receives a voice call transmission instruction sent by a first terminal, obtains voice information to be transmitted and a second terminal receiving the voice information to be transmitted according to the voice call transmission instruction; The transmission rate of the voice information to be transmitted; determine whether the transmission rate is lower than the preset transmission rate; if the transmission rate is lower than the preset transmission rate, perform voice recognition on the voice information to be transmitted to obtain voice recognition As a result, the voice recognition result includes the text information corresponding to the voice information to be transmitted; the text information contained in the voice recognition result is voice-encoded to obtain target voice information; the target voice information is transmitted to the second terminal.
  • the transmission rate is lower than the preset transmission rate, the text information contained in the voice recognition result is encoded, the voice content of the voice information to be transmitted is retained, and the information encoded during voice encoding is reduced, which is beneficial for smooth voice calls.
  • the transmission rate is lower than the preset transmission rate, the text information contained in the voice recognition result is encoded, the voice content of the voice information to be transmitted is retained, and the information encoded during voice encoding is reduced, which is beneficial for smooth voice calls.
  • FIG. 2 is a functional module diagram of a voice transmission device provided by an embodiment of the application.
  • the voice transmission device includes a receiving module 210, an acquisition module 220, a judgment module 230, an identification module 240, an encoding module 250, and a first transmission module 260.
  • the module referred to in this application refers to a series of computer-readable instruction segments that can be executed by a processor and can complete fixed functions, which are stored in the memory of the computer device. In this embodiment, the function of each module will be described in detail in subsequent embodiments.
  • the receiving module 210 is configured to receive a voice call transmission instruction sent by a first terminal, obtain voice information to be transmitted and a second terminal that receives the voice information to be transmitted according to the voice call transmission instruction.
  • the first terminal and the second terminal may be the same electronic device or different electronic devices, for example, the first terminal and the second terminal are both mobile phones, or the first terminal One terminal is a mobile phone, and the second terminal is a computer.
  • the voice call transmission instruction is an instruction used to send voice information between two terminals.
  • the first terminal is the sender of voice information, that is, the calling party
  • the second terminal is the receiver of voice information, that is, the called party.
  • obtaining the voice information to be transmitted and the second terminal receiving the voice information to be transmitted according to the voice call transmission instruction includes: obtaining the voice information to be transmitted indicated by the voice call transmission instruction and receiving the voice information to be transmitted The second terminal of information.
  • the voice call transmission instruction includes the voice information to be transmitted and the receiver of the voice information to be transmitted, that is, the second terminal that receives the transmitted voice information.
  • the acquiring module 220 is configured to acquire the transmission rate when transmitting the voice information to be transmitted.
  • the transmission rate is the network transmission rate, which refers to the rate at which the host on the computer network transmits data on the digital channel.
  • the transmission rate is 16bit/s, which means that the amount of data transmitted per second is 16bit.
  • acquiring the transmission rate when transmitting the voice information to be transmitted includes: acquiring the sending rate of the first terminal or the receiving rate of the second terminal when the first terminal transmits the voice information to be transmitted to the second terminal.
  • the transmission rate of the calling party which reflects the data transmission rate when the caller sends voice information to the base station/server
  • Get the transmission rate of the called party which reflects the data transmission rate when the called party receives voice information.
  • the judging module 230 is configured to judge whether the transmission rate is lower than a preset transmission rate.
  • determining whether the transmission rate is lower than the preset transmission rate is used to determine whether the two parties in communication are in a poor network environment during voice transmission, and whether the call quality will be affected.
  • the specific value of the preset transmission rate can be preset according to needs.
  • the preset transmission rate is 8 kbit/s.
  • the recognition module 240 is configured to, if the transmission rate is lower than the preset transmission rate, perform voice recognition on the voice information to be transmitted to obtain a voice recognition result, where the voice recognition result includes the voice information corresponding to the voice information to be transmitted text information.
  • voice recognition refers to converting a voice signal into corresponding text information.
  • voice recognition is performed on the voice information to be transmitted through voice recognition technology.
  • the recognition module 240 performing voice recognition on the voice information to be transmitted includes:
  • the preset acoustic model and the preset language model can be selected as required.
  • the encoding module 250 is configured to perform voice encoding on the text information contained in the voice recognition result to obtain target voice information.
  • the text information contained in the voice recognition result is voice coded, that is, text information is coded, which is different from the traditional voice sample coding, which can greatly reduce the amount of data during transmission.
  • the traditional encoding method is to sample and encode the frequency and amplitude of the sound.
  • the calculation method of the amount of data transmitted during traditional encoding is as follows:
  • Different numbers of characters such as Chinese characters
  • have corresponding character encoding sizes which can be determined according to the correspondence between preset characters and character encoding sizes Corresponding character encoding size.
  • the amount of data transmitted per second is greatly reduced.
  • the voice recognition result further includes a voice feature of the voice information to be transmitted, and the voice feature includes a pitch frequency;
  • the encoding module 250 voice encoding the text information contained in the voice recognition result includes:
  • Voice features refer to information that reflects voice features. For example, the intensity, loudness, or pitch of speech.
  • the air flow through the glottis causes the vocal cords to produce relaxation and oscillating vibrations, resulting in a quasi-periodic pulsed air flow.
  • This air flow excites the vocal tract to produce voiced sounds, also known as voiced speech, which carries the big words in the speech. Part of the energy. The frequency of this vocal cord vibration becomes the pitch frequency.
  • the pitch frequency is related to the length, thickness, toughness, stiffness and pronunciation habits of the vocal cords, which can reflect personal characteristics to a large extent. Therefore, in this embodiment, encoding is performed in combination with the pitch frequency, which can ensure the accurate delivery of the content while retaining the sound characteristics to the greatest extent.
  • the pitch frequency of the voice information can be obtained by the cepstrum method.
  • the text information corresponding to the voice information to be transmitted and the voice features of the voice information to be transmitted are encoded by combining text information with voice features, which is also different from the traditional encoding of sound samples. Greatly reduce the amount of data during transmission.
  • the encoding size can be determined according to the corresponding relationship between the preset character and the character encoding size.
  • the embodiment performs voice information transmission, the amount of data transmitted per second is greatly reduced.
  • the first transmission module 260 is configured to transmit the target voice information to the second terminal.
  • the target voice information is decoded, that is, the received text information (or text information and voice features) is restored to voice.
  • white noise is a piece of sound. Specifically, white noise is noise whose power spectral density is uniformly distributed in the entire frequency domain.
  • the voice content can still be greatly retained in the case of extremely poor network, avoiding the voice intermittent, loss of voice content or even failure during voice calls. The status of the call.
  • the device further includes:
  • a reminder module configured to send a suggestion message for enhancing network signal strength to the first terminal or the second terminal if the transmission rate is lower than the preset transmission rate, or to send a presence message to the second terminal Reminder message of voice transmission.
  • sending a suggestion message for enhancing the network signal strength to the first terminal or the second terminal may specifically be a suggestion on how to make the first terminal or the second terminal enhance the network signal Therefore, it is beneficial to have a higher transmission rate when the voice information to be transmitted is transmitted between the first terminal and the second terminal, thereby improving the quality of the voice call.
  • the suggestion message includes a recommended network connection or a recommended moving route.
  • the recommended connection network is another connectable network recommended to the first terminal or the second terminal.
  • the recommended moving position refers to the position where the first terminal or the second terminal can be moved to enhance the network signal of the first terminal or the second terminal.
  • the recommended connection network may be obtained through a recommendation module, and the recommendation module is used for:
  • Acquire connectable networks around the first terminal or the second terminal acquire historical connected networks in the connectable networks, and acquire networks with network signal strengths greater than the network signal strength threshold in the historical networks as recommended networks.
  • the historical connection network refers to a network to which the first terminal or the second terminal has been connected.
  • a secure network By obtaining a network with a network signal strength greater than a network signal strength threshold in a secure network as a recommended network, a secure network can be obtained, and the first terminal or the second terminal can be connected to the secure network, avoiding network security problems.
  • a recommended moving route may also be obtained through a recommendation module, and the recommendation module is further used for:
  • the first position is used as the starting position
  • the second position is used as the ending position
  • the moving route between the starting position and the ending position is acquired as the recommended moving route.
  • the closer you are to the router the better the network signal strength can be obtained.
  • obtaining the recommended moving route can facilitate the movement of the first terminal or the second terminal, so that the first terminal or the second terminal has better network signal strength, which is beneficial to the first terminal and the second terminal
  • the transmission rate when the voice information to be transmitted is transmitted between each other is higher, thereby improving the quality of the voice call.
  • the device further includes a second transmission module, and the second transmission module is configured to:
  • the transmission rate is lower than the first transmission rate, encode and transmit the voice information to be transmitted according to the GIA coding standard;
  • the transmission rate is lower than the second transmission rate, encode and transmit the voice information to be transmitted according to the GSM coding standard;
  • the transmission rate is lower than the three transmission rates, encode and transmit the voice information to be transmitted according to the G.728 coding standard;
  • the transmission rate is lower than the fourth transmission rate, encode and transmit the voice information to be transmitted according to the G.721 coding standard;
  • the transmission rate is lower than the fifth transmission rate, encode and transmit the voice information to be transmitted according to the G.722 coding standard;
  • the voice information to be transmitted is encoded and transmitted using the MPE encoding standard.
  • Encoding is the process of expressing information with codes.
  • the frequency value of the sound at a certain point and the energy value of the frequency are extracted and digitally quantized.
  • any digital audio coding scheme is lossy.
  • the highest fidelity encoding method is PCM encoding.
  • PCM encoding can be close to the original sound infinitely.
  • PCM is bulky and is not conducive to transmission. Therefore, during the audio transmission process, we will perform other forms of encoding on the audio. The audio is compressed to improve the smoothness of transmission.
  • different encoding algorithms are used to encode voice information based on different encoding standards.
  • encoding based on the G.722 encoding standard is achieved through the SB-ADPCM algorithm, encoding based on the G.721 encoding standard through the ADPCM algorithm, encoding based on the G.728 encoding standard through the LD-CELP algorithm, and encoding based on the RPE-LTP algorithm Realize coding based on GSM coding standard, and realize coding based on GIA coding standard through VSELPC algorithm.
  • the first transmission rate is 13.2 kbt/s
  • the second transmission rate is 16 kbt/s
  • the third transmission rate is 32 kbt/s
  • the fourth transmission rate is 64 kbt/s. 5.
  • the transmission rate is 128kbt/s.
  • the voice transmission device receives a voice call transmission instruction sent by a first terminal through a receiving module, obtains voice information to be transmitted and a second terminal that receives the voice information to be transmitted according to the voice call transmission instruction;
  • the module acquires the transmission rate when transmitting the voice information to be transmitted;
  • the judging module determines whether the transmission rate is lower than the preset transmission rate; if the transmission rate is lower than the preset transmission rate, the identification module responds to the transmission Perform voice recognition on the voice information to obtain a voice recognition result, where the voice recognition result includes text information corresponding to the voice information to be transmitted;
  • the encoding module performs voice coding on the text information contained in the voice recognition result to obtain target voice information;
  • a transmission module transmits the target voice information to the second terminal.
  • the transmission rate is lower than the preset transmission rate, the text information contained in the voice recognition result is encoded, the voice content of the voice information to be transmitted is retained, and the information encoded during voice encoding is reduced, which is beneficial for smooth voice calls.
  • the transmission rate is lower than the preset transmission rate, the text information contained in the voice recognition result is encoded, the voice content of the voice information to be transmitted is retained, and the information encoded during voice encoding is reduced, which is beneficial for smooth voice calls.
  • the aforementioned integrated unit implemented in the form of a software function module may be stored in a non-volatile readable storage medium.
  • the above-mentioned software function module is stored in a storage medium, and includes several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) execute the method described in each embodiment of the present application All or part of the steps.
  • FIG. 3 is a schematic structural diagram of a computer device implementing a preferred embodiment of the voice transmission method of the present application.
  • the computer device includes at least one sending device 31, at least one memory 32, at least one processor 33, at least one receiving device 34, and at least one communication bus.
  • the communication bus is used to realize the connection and communication between these components.
  • the computer device is a device that can automatically perform numerical calculation and/or information processing in accordance with pre-set or stored instructions.
  • Its hardware includes, but is not limited to, a microprocessor and an application specific integrated circuit (ASIC) , Field-Programmable Gate Array (FPGA), Digital Processor (Digital Signal Processor, DSP), embedded equipment, etc.
  • the computer device may also include network equipment and/or user equipment.
  • the network device includes, but is not limited to, a single network server, a server group composed of multiple network servers, or a cloud composed of a large number of hosts or network servers based on Cloud Computing, where cloud computing is distributed computing One type, a super virtual computer composed of a group of loosely coupled computers.
  • the computer device may be, but is not limited to, any electronic product that can interact with a user through a keyboard, a touch panel, or a voice control device, for example, a terminal such as a tablet computer, a smart phone, and a monitoring device.
  • the network where the computer device is located includes, but is not limited to, the Internet, a wide area network, a metropolitan area network, a local area network, a virtual private network (Virtual Private Network, VPN), etc.
  • the receiving device 34 and the sending device 31 may be wired sending ports or wireless devices, for example, including an antenna device, which is used for data communication with other devices.
  • the memory 32 is used to store program codes.
  • the memory 32 may be a circuit with a storage function without physical form in an integrated circuit, or the memory 32 may also be a memory with physical form, such as a memory stick, a TF card (Trans-flash Card), or a smart media card. (smart media card), secure digital card (secure digital card), flash memory card (flash card) and other storage devices, etc.
  • the processor 33 may include one or more microprocessors and digital processors.
  • the processor 33 can call the program code stored in the memory 32 to perform related functions.
  • the various modules described in FIG. 2 are program codes stored in the memory 32 and executed by the processor 33 to implement a voice transmission method.
  • the processor 33 is also called a central processing unit (CPU, Central Processing Unit), which is a very large-scale integrated circuit, which is a computing core (Core) and a control core (Control Unit).
  • CPU Central Processing Unit
  • Core computing core
  • Control Unit Control Unit

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Mathematical Physics (AREA)
  • Quality & Reliability (AREA)
  • Telephonic Communication Services (AREA)

Abstract

一种语音传输方法,包括:接收第一终端发送的语音通话传输指令,根据语音通话传输指令获取待传输语音信息以及接收待传输语音信息的第二终端(S11);获取传输待传输语音信息时的传输速率(S12);判断传输速率是否低于预设传输速率(S13);若传输速率低于预设传输速率,对待传输语音信息进行语音识别,获取语音识别结果,语音识别结果包括待传输语音信息对应的文字信息(S14);将语音识别结果包含的文字信息进行语音编码,得到目标语音信息(S15);将目标语音信息传输至第二终端(S16)。还公开了一种语音传输装置、计算机装置和非易失性可读存储介质,可以提高语音通话的质量。

Description

语音传输方法、装置、计算机装置及存储介质
本申请要求于2019年05月29日提交中国专利局,申请号为201910459488.7发明名称为“语音传输方法、装置、计算机装置及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及通信技术领域,尤其涉及一种语音传输方法、装置、计算机装置及存储介质。
背景技术
随着计算机技术的发展和移动终端的普及,各种语音通话产品越来越多,这些语音通话产品在网络状况较好时,通话质量也较好,在网络状况不好时,可能会在语音传输时出现由于传输不连续导致的声音卡顿等状况,降低语音通话的质量,影响用户体验。
发明内容
鉴于以上内容,有必要提供一种语音传输方法、装置、计算机装置及存储介质,能够提高语音通话的质量。
本申请提供一种语音传输方法,所述方法包括:
接收第一终端发送的语音通话传输指令,根据所述语音通话传输指令获取待传输语音信息以及接收所述待传输语音信息的第二终端;
获取传输所述待传输语音信息时的传输速率;
判断所述传输速率是否低于预设传输速率;
若所述传输速率低于所述预设传输速率,对所述待传输语音信息进行语音识别,获取语音识别结果,所述语音识别结果包括所述待传输语音信息对应的文字信息;
将所述语音识别结果包含的文字信息进行语音编码,得到目标语音信息;
将所述目标语音信息传输至所述第二终端。
本申请还提供一种语音传输装置,所述装置包括:
接收模块,用于接收第一终端发送的语音通话传输指令,根据所述语音通话传输指令获取待传输语音信息以及接收所述待传输语音信息的第二终端;
获取模块,用于获取传输所述待传输语音信息时的传输速率;
判断模块,用于判断所述传输速率是否低于预设传输速率;
识别模块,用于若所述传输速率低于所述预设传输速率,对所述待传输语音信息进行语音识别,获取语音识别结果,所述语音识别结果包括所述待传输语音信息对应的文字信息;
编码模块,用于将所述语音识别结果包含的文字信息进行语音编码,得到目标语音信息;
第一传输模块,用于将所述目标语音信息传输至所述第二终端。
本申请还提供一种计算机装置,所述计算机装置包括存储器及处理器,所述存储器用于存储至少一个指令,所述处理器用于执行所述至少一个指令以实现任意实施例中所述的语音传输方法。
本申请还提供一种非易失性可读存储介质,所述非易失性可读存储介质存储有至少一个指令,所述至少一个指令被处理器执行时实现任意实施例中所述的语音传输方法。
由以上技术方案看出,本申请通过接收第一终端发送的语音通话传输指令,根据所述语音通话传输指令获取待传输语音信息以及接收所述待传输语音信息的第二终端;获取传输所述待传输语音信息时的传输速率;判断所述传输速率是否低于预设传输速率;若所述传输速率低于所述预设传输速率,对所述待传输语音信息进行语音识别,获取语音识别结果,所述语音识别结果包括所述待传输语音信息对应的文字信息;将所述语音识别结果包含的文字信息进行语音编码,得到目标语音信息;将所述目标语音信息传输至所述第二终端。由于在传输速率低于预设传输速率时,将语音识别结果包含的文字信息进行编码,保留了待传输语音信息的语音内容,减少了语音编码时编码的信息,从而有利于语音通话时进行流畅的通话,实现提高语音通话的质量的目的,避免语音通话时出现卡顿或通话中断。
附图说明
图1是本申请实施例提供的一种语音传输方法的流程图;
图2是本申请实施例提供的一种语音传输装置的功能模块图;
图3是本申请实现语音传输方法的较佳实施例的计算机装置的结构示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
如图1所示,图1为本申请实施例提供的一种语音传输方法的流程图。根据不同的需求,该流程图中步骤的顺序可以改变,某些步骤可以省略。
S11,接收第一终端发送的语音通话传输指令,根据所述语音通话传输指令获取待传输语音信息以及接收所述待传输语音信息的第二终端。
本实施例中,所述第一终端以及所述第二终端可以为相同的电子设备或不同的电子设备,例如,所述第一终端和所述第二终端都为手机,或者,所述第一终端为手机,所述第二终端为电脑。
所述语音通话传输指令是用于在两个终端之间发送语音信息的指令。
在本实施例中,所述第一终端为语音信息发送的发送方,即主叫方,所述第二终端为语音信息的接收方,即被叫方。
一种可能的实施例中,根据所述语音通话传输指令获取待传输语音信息以及接收所述待传输语音信息的第二终端包括:获取语音通话传输指令指示的待传输语音信息以及接收待传输语音信息的第二终端。
例如,语音通话传输指令中包含待传输语音信息,以及待传输语音信息的接收方,即接收到传输语音信息的第二终端。
S12,获取传输所述待传输语音信息时的传输速率。
所述传输速率即网络传输速率,是指在计算机网络上的主机在数字信道上传送数据的速率。例如,传输速率为16bit/s,表示每秒传输16bit的数据量。
在本实施例中,获取传输所述待传输语音信息时的传输速率包括:获取第一终端向第二终端传输待传输语音信息时第一终端的发送速率或所述第二终端的接收速率。
例如,在通过通信软件进行语音传输时,获取主叫方的传输速率,该速率反映了主叫方向基站/服务器发送语音信息时的数据传输速率;或者,在通过通信如软件进行语音传输时,获取被叫方的传输速率,该速率反映了被叫方接收语音信息时的数据传输速率。
S13,判断所述传输速率是否低于预设传输速率。
本实施例中,判断传输速率是否低于预设传输速率用于确定进行语音传输时,通信双方是否处于较差的网络环境中,是否会影响通话质量。
所述预设传输速率的具体值可以根据需要预先设定。
可选的,所述预设传输速率为8kbit/s。
可选的,在本申请另一实施例中,所述判断所述传输速率是否低于预设传输速率之后,所述方法还包括:
若所述传输速率高于所述预设传输速率,判断所述传输速率是否低于第一传输速率;
若所述传输速率低于所述第一传输速率,通过GIA编码标准对所述待传输语音信息进行编码并传输;
若所述传输速率高于所述第一传输速率,判断所述传输速率是否低于第二传输速率;
若所述传输速率低于所述第二传输速率,通过GSM编码标准对所述待传输语音信息进行编码并传输;
若所述传输速率高于所述二传输速率,判断所述传输速率是否低于第三传输速率;
若所述传输速率低于所述三传输速率,通过G.728编码标准对所述待传输语音信息进行编码并传输;
若所述传输速率高于所述第三传输速率,判断所述传输速率是否低于第四传输速率;
若所述传输速率低于所述第四传输速率,通过G.721编码标准对所述待传输语音信息进行编码并传输;
若所述传输速率高于所述第四传输速率,判断所述传输速率是否低于第五传输速率;
若所述传输速率低于所述第五传输速率,通过G.722编码标准对所述待传输语音信息进行编码并传输;
若所述传输速率高于所述第五传输速率,通过MPE编码标准对所述待传输语音信息进行编码并传输。
编码是用代码表示信息的过程,数字编码过程中,抽取某点的声音的频率值以及该频率的能量值并通过数字量化,相对于自然界的信号,任何数字音频编码方案都是有损的,目前最高保真的编码方式就是PCM编码,通过PCM编码可以无限程度的接近原始声音,但是PCM体积庞大,不利于传输,因此在音频传输过程中,我们会对音频进行其他形式的编码,以对音频进行压缩,提高传输的流畅度。
本实施例中,基于不同的编码标准采用不同的编码算法对语音信息进行编码。
例如,通过SB-ADPCM算法实现基于G.722编码标准进行编码,通过ADPCM算法实现基于G.721编码标准进行编码,通过LD-CELP算法实现基于G.728编码标准进行编码,通过RPE-LTP算法实现基于GSM编码标准进行编码,通过VSELPC算法实现基于 GIA编码标准进行编码。
在本实施例中,在不同的传输速率状况下,采用不同的编码标准进行编码,从而在以不同的传输速率传输过程中,可以尽可能保留更全面的语音信息,提高声音的质量。
可选的,第一传输速率为13.2kbt/s,所述第二传输速率为16kbt/s,所述第三传输速率为32kbt/s,所述第四传输速率为64kbt/s,所述第五传输速率为128kbt/s。
S14,若所述传输速率低于所述预设传输速率,对所述待传输语音信息进行语音识别,获取语音识别结果,所述语音识别结果包括所述待传输语音信息对应的文字信息。
本实施例中,语音识别是指将语音信号转换成对应的文字信息。
具体的,通过语音识别技术对待传输语音信息进行语音识别。
可选的,在本申请另一实施例中,所述对所述待传输语音信息进行语音识别包括:
提取所述待传输语音信息的特征,得到表示所述待传输语音信息的特征向量;
将所述特征向量输入至预设声学模型,得到所述特征向量对应的音素信息;
将所述音素信息输入至预设语言模型,得到所述音素信息包含的元素,所述元素包括由字或词组成的字词序列;
基于预设字典对所述字词序列进行解码,得到所述待传输语音信息对应的文字信息。
所述预设声学模型和预设语言模型可以根据需要选取。
S15,将所述语音识别结果包含的文字信息进行语音编码,得到目标语音信息。
本实施例中将语音识别结果包含的文字信息进行语音编码是将文字信息进行编码,不同于传统的对声音采样进行编码,可以大大减少传输时的数据量。
传统编码方式是对声音的频率和振幅进行采样编码,传统编码时传输的数据量计算方式如下:
数据量(字节/秒)=采样率(Hz)*采样大小(bit)*声道数/8
以采样率16K单声道为例:1s的声音数据大小为:16000*16*1/8=32Kb
在本实施例中,目标语音信息进行编码后每秒传输的语音数据为:语音编码时传输的数据量(字节/秒)=每秒说出字符数*对应字符编码大小(bit),其中,每秒说出字符数为语音识别到的语音信息中每秒的字符数,不同的字符数(如汉字)有对应的字符编码大小,可以根据预设字符与字符编码大小的对应关系来确定对应字符编码大小。
以待传输语音信息是中文为例,一般人每秒能说出的汉字在10个以下,汉字编码为2字符/汉字,则1s的数据量为:10*2=20bit,可以见得,本实施例进行语音信息传输时,每秒传输的数据量大大减小。
可选的,在本申请另一实施例中,所述语音识别结果还包括所述待传输语音信息的语音特征,所述语音特征包括基音频率;
所述将所述语音识别结果包含的文字信息进行语音编码包括:
将所述待传输语音信息对应的文字信息以及所述待传输语音信息的语音特征进行语音编码。
语音特征是指反映语音特征的信息。例如,语音的声强、响度或音高。
通常人在发浊声时,气流通过声门使声带产生张弛震荡式振动,产生一股准周期脉冲气流,这一气流激励声道就产生浊音,又称有声语音,它携带着语音中的大部分能量。这种声带振动的频率成为基音频率。
基音频率与声带的长短、薄厚、韧性、劲度和发音习惯等有关,能够很大程度上反应个人的特征。因此,在本实施例中结合基音频率进行编码,能够在保证内容准确传递的同时,最大程度保留声音的特征。
本实施例中,可以通过倒谱法获取语音信息的基音频率。
本实施例中将所述待传输语音信息对应的文字信息以及所述待传输语音信息的语音特征进行语音编码是将文字信息结合语音特征进行编码,也不同于传统的对声音采样进行编码,可以大大减少传输时的数据量。
在本实施例中,目标语音信息进行编码后每秒传输的语音数据为:语音编码时传输的数据量(字节/秒)=每秒说出字符数*对应字符编码大小(bit)+语音特征(依据提取的语音特征而定,如10bit/s),其中,每秒说出字符数为语音识别到的语音信息中每秒的字符数,不同的字符数(如汉字)有对应的字符编码大小,可以根据预设字符与字符编码大小的对应关系来确定对应字符编码大小。
以待传输语音信息是中文为例,一般人每秒能说出的汉字在10个以下,汉字编码为2字符/汉字,则1s的数据量为:10*2+10=30bit,可以见得,本实施例进行语音信息传输时,每秒传输的数据量大大减小。
S16,将所述目标语音信息传输至所述第二终端。
一种可选实施例中,在第二终端接收到目标语音信息之后,对目标语音信息进行解码,即将接收到的文字信息(或文字信息和语音特征)还原成语音。
一种可选实施例中,若在还原后的语音中没有内容时,通过白噪声填充。其中,白噪声是一段声音,具体的,白噪声是功率谱密度在整个频域内均匀分布的噪声。
通过在还原的声音中通过白噪声填充,可以避免用户通过第二终端听还原后的语音 时,在没有听到声音时以为语音中断而引起的误操作(如退出)。
通过本实施例,虽然在编码过程中丢失了如音频、音量等特征,但在网络极差的情况下,仍然能够极大的保留语音内容,避免语音通话时出现语音断断续续、丢失语音内容甚至无法通话的状况。
在本申请另一实施例中,所述方法还包括:
若所述传输速率低于所述预设传输速率,向所述第一终端或所述第二终端发送增强网络信号强度的建议消息,或者,向所述第二终端发送存在语音传输的提醒消息。
在本实施例中,在传输速率低于预设传输速率时,向第一终端或第二终端发送增强网络信号强度的建议消息具体可以是如何使第一终端或第二终端增强网络信号的建议,从而有利于第一终端和第二终端之间传输待传输语音信息时的传输速率更高,进而提高语音通话的质量。
可选的,所述建议消息包括推荐连接网络或推荐移动路线。
本实施例中,所述推荐连接网络是向第一终端或第二终端推荐的其他可连接网络。所述推荐移动位置是指将第一终端或第二终端移动至何位置可以使第一终端或第二终端的网络信号增强。
进一步的,在本申请另一实施例中,可以通过以下方式获取推荐连接网络,所述方法还包括:
获取第一终端或第二终端周围的可连接网络,获取所述可连接网络中网络信号强度大于所述网络信号强度阈值的网络为推荐连接网络;或者
获取第一终端或第二终端周围的可连接网络,获取所述可连接网络中网路信号强度最强的网络为推荐连接网络;或者
获取第一终端或第二终端周围的可连接网络,获取所述可连接网络中的安全网络,获取所述安全网络中网络信号强度大于所述网络信号强度阈值的网络为推荐网络;或者
获取第一终端或第二终端周围的可连接网络,获取所述可连接网络中的历史连接网络,获取所述历史网络中网络信号强度大于所述网络信号强度阈值的网络为推荐网络。
其中,所述历史连接网络是指第一终端或第二终端曾连接过的网络。
通过获取安全网络中网络信号强度大于网络信号强度阈值的网络为推荐网络,从而能获取到安全的网络,使第一终端或第二终端连接至安全的网络中,避免存在网络安全问题。
进一步的,在本申请另一实施例中,可以通过以下方式获取推荐移动路线,所述方 法还包括:
获取第一终端或第二终端所处的第一位置,以及第一终端或第二终端周围的可用连接网络;
获取所述可用连接网络的第二位置;
将所述第一位置作为起始位置,所述第二位置作为终止位置,获取所述起始位置与所述终止位置之间的移动路线为所述推荐移动路线。
当第一终端以及第二终端的距离与可连接网络越近,越能获得更好的网络信号强度。例如,距离路由器越近时,越能获得更好的网络信号强度。
在本实施例中,获取推荐移动路线,可以有利于第一终端或第二终端进行移动,从而使得第一终端或第二终端具有更好的网络信号强度,有利于第一终端和第二终端之间传输待传输语音信息时的传输速率更高,进而提高语音通话的质量。
本申请提供的一种语音传输方法,接收第一终端发送的语音通话传输指令,根据所述语音通话传输指令获取待传输语音信息以及接收所述待传输语音信息的第二终端;获取传输所述待传输语音信息时的传输速率;判断所述传输速率是否低于预设传输速率;若所述传输速率低于所述预设传输速率,对所述待传输语音信息进行语音识别,获取语音识别结果,所述语音识别结果包括所述待传输语音信息对应的文字信息;将所述语音识别结果包含的文字信息进行语音编码,得到目标语音信息;将所述目标语音信息传输至所述第二终端。由于在传输速率低于预设传输速率时,将语音识别结果包含的文字信息进行编码,保留了待传输语音信息的语音内容,减少了语音编码时编码的信息,从而有利于语音通话时进行流畅的通话,实现提高语音通话的质量的目的,避免语音通话时出现卡顿或通话中断。
如图2所示,图2为本申请实施例提供的一种语音传输装置的功能模块图。所述语音传输装置包括接收模块210、获取模块220、判断模块230、识别模块240、编码模块250和第一传输模块260。本申请所称的模块是指一种能够被处理器所执行并且能够完成固定功能的一系列计算机可读指令段,其存储在计算机装置的存储器中。在本实施例中,关于各模块的功能将在后续的实施例中详述。
接收模块210,用于接收第一终端发送的语音通话传输指令,根据所述语音通话传输指令获取待传输语音信息以及接收所述待传输语音信息的第二终端。
本实施例中,所述第一终端以及所述第二终端可以为相同的电子设备或不同的电子 设备,例如,所述第一终端和所述第二终端都为手机,或者,所述第一终端为手机,所述第二终端为电脑。
所述语音通话传输指令是用于在两个终端之间发送语音信息的指令。
在本实施例中,所述第一终端为语音信息发送的发送方,即主叫方,所述第二终端为语音信息的接收方,即被叫方。
一种可能的实施例中,根据所述语音通话传输指令获取待传输语音信息以及接收所述待传输语音信息的第二终端包括:获取语音通话传输指令指示的待传输语音信息以及接收待传输语音信息的第二终端。
例如,语音通话传输指令中包含待传输语音信息,以及待传输语音信息的接收方,即接收到传输语音信息的第二终端。
获取模块220,用于获取传输所述待传输语音信息时的传输速率。
所述传输速率即网络传输速率,是指在计算机网络上的主机在数字信道上传送数据的速率。例如,传输速率为16bit/s,表示每秒传输16bit的数据量。
在本实施例中,获取传输所述待传输语音信息时的传输速率包括:获取第一终端向第二终端传输待传输语音信息时第一终端的发送速率或所述第二终端的接收速率。
例如,在通过通信软件进行语音传输时,获取主叫方的传输速率,该速率反映了主叫方向基站/服务器发送语音信息时的数据传输速率;或者,在通过通信如软件进行语音传输时,获取被叫方的传输速率,该速率反映了被叫方接收语音信息时的数据传输速率。
判断模块230,用于判断所述传输速率是否低于预设传输速率。
本实施例中,判断传输速率是否低于预设传输速率用于确定进行语音传输时,通信双方是否处于较差的网络环境中,是否会影响通话质量。
所述预设传输速率的具体值可以根据需要预先设定。
可选的,所述预设传输速率为8kbit/s。
识别模块240,用于若所述传输速率低于所述预设传输速率,对所述待传输语音信息进行语音识别,获取语音识别结果,所述语音识别结果包括所述待传输语音信息对应的文字信息。
本实施例中,语音识别是指将语音信号转换成对应的文字信息。
具体的,通过语音识别技术对待传输语音信息进行语音识别。
可选的,在本申请另一实施例中,所述识别模块240对所述待传输语音信息进行语音识别包括:
提取所述待传输语音信息的特征,得到表示所述待传输语音信息的特征向量;
将所述特征向量输入至预设声学模型,得到所述特征向量对应的音素信息;
将所述音素信息输入至预设语言模型,得到所述音素信息包含的元素,所述元素包括由字或词组成的字词序列;
基于预设字典对所述字词序列进行解码,得到所述待传输语音信息对应的文字信息。
所述预设声学模型和预设语言模型可以根据需要选取。
编码模块250,用于将所述语音识别结果包含的文字信息进行语音编码,得到目标语音信息。
本实施例中将语音识别结果包含的文字信息进行语音编码即将文字信息进行编码,不同于传统的对声音采样进行编码,可以大大减少传输时的数据量。
传统编码方式是对声音的频率和振幅进行采样编码,传统编码时传输的数据量计算方式如下:
数据量(字节/秒)=采样率(Hz)*采样大小(bit)*声道数/8
以采样率16K单声道为例:1s的声音数据大小为:16000*16*1/8=32Kb
在本实施例中,目标语音信息进行编码后每秒传输的语音数据为:语音编码时传输的数据量(字节/秒)=每秒说出字符数*对应字符编码大小(bit),其中,每秒说出字符数为语音识别到的语音信息中每秒的字符数,不同的字符数(如汉字)有对应的字符编码大小,可以根据预设字符与字符编码大小的对应关系来确定对应字符编码大小。
以待传输语音信息是中文为例,一般人每秒能说出的汉字在10个以下,汉字编码为2字符/汉字,则1s的数据量为:10*2=20bit,可以见得,本实施例进行语音信息传输时,每秒传输的数据量大大减小。
可选的,在本申请另一实施例中,所述语音识别结果还包括所述待传输语音信息的语音特征,所述语音特征包括基音频率;
所述编码模块250将所述语音识别结果包含的文字信息进行语音编码包括:
将所述待传输语音信息对应的文字信息以及所述待传输语音信息的语音特征进行语音编码。
语音特征是指反映语音特征的信息。例如,语音的声强、响度或音高。
通常人在发浊声时,气流通过声门使声带产生张弛震荡式振动,产生一股准周期脉冲气流,这一气流激励声道就产生浊音,又称有声语音,它携带着语音中的大部分能量。这种声带振动的频率成为基音频率。
基音频率与声带的长短、薄厚、韧性、劲度和发音习惯等有关,能够很大程度上反应个人的特征。因此,在本实施例中结合基音频率进行编码,能够在保证内容准确传递的同时,最大程度保留声音的特征。
本实施例中,可以通过倒谱法获取语音信息的基音频率。
本实施例中将所述待传输语音信息对应的文字信息以及所述待传输语音信息的语音特征进行语音编码是将文字信息结合语音特征进行编码,也不同于传统的对声音采样进行编码,可以大大减少传输时的数据量。
在本实施例中,目标语音信息进行编码后每秒传输的语音数据为:语音编码时传输的数据量(字节/秒)=每秒说出字符数*对应字符编码大小(bit)+语音特征(依据提取的语音特征而定,如10bit/s),其中,每秒说出字符数为语音识别到的语音信息中每秒的字符数,不同的字符数(如汉字)有对应的字符编码大小,可以根据预设字符与字符编码大小的对应关系来确定对应字符编码大小。
以待传输语音信息是中文为例,一般人每秒能说出的汉字在10个以下,汉字编码为2字符/汉字,则1s的数据量为:10*2+10=30bit,可以见得,本实施例进行语音信息传输时,每秒传输的数据量大大减小。
第一传输模块260,用于将所述目标语音信息传输至所述第二终端。
一种可选实施例中,在第二终端接收到目标语音信息之后,对目标语音信息进行解码,即将接收到的文字信息(或文字信息和语音特征)还原成语音。
一种可选实施例中,若在还原后的语音中没有内容时,通过白噪声填充。其中,白噪声是一段声音,具体的,白噪声是功率谱密度在整个频域内均匀分布的噪声。
通过在还原的声音中通过白噪声填充,可以避免用户通过第二终端听还原后的语音时,在没有听到声音时以为语音中断而引起的误操作(如退出)。
通过本实施例,虽然在编码过程中丢失了如音频、音量等特征,但在网络极差的情况下,仍然能够极大的保留语音内容,避免语音通话时出现语音断断续续、丢失语音内容甚至无法通话的状况。
在本申请另一实施例中,所述装置还包括:
提醒模块,用于若所述传输速率低于所述预设传输速率,向所述第一终端或所述第二终端发送增强网络信号强度的建议消息,或者,向所述第二终端发送存在语音传输的提醒消息。
在本实施例中,在传输速率低于预设传输速率时,向第一终端或第二终端发送增强 网络信号强度的建议消息具体可以是如何使第一终端或第二终端增强网络信号的建议,从而有利于第一终端和第二终端之间传输待传输语音信息时的传输速率更高,进而提高语音通话的质量。
可选的,所述建议消息包括推荐连接网络或推荐移动路线。
本实施例中,所述推荐连接网络是向第一终端或第二终端推荐的其他可连接网络。所述推荐移动位置是指将第一终端或第二终端移动至何位置可以使第一终端或第二终端的网络信号增强。
进一步的,在本申请另一实施例中,可以通过推荐模块获取推荐连接网络,推荐模块用于:
获取第一终端或第二终端周围的可连接网络,获取所述可连接网络中网络信号强度大于所述网络信号强度阈值的网络为推荐连接网络;或者
获取第一终端或第二终端周围的可连接网络,获取所述可连接网络中网路信号强度最强的网络为推荐连接网络;或者
获取第一终端或第二终端周围的可连接网络,获取所述可连接网络中的安全网络,获取所述安全网络中网络信号强度大于所述网络信号强度阈值的网络为推荐网络;或者
获取第一终端或第二终端周围的可连接网络,获取所述可连接网络中的历史连接网络,获取所述历史网络中网络信号强度大于所述网络信号强度阈值的网络为推荐网络。
其中,所述历史连接网络是指第一终端或第二终端曾连接过的网络。
通过获取安全网络中网络信号强度大于网络信号强度阈值的网络为推荐网络,从而能获取到安全的网络,使第一终端或第二终端连接至安全的网络中,避免存在网络安全问题。
进一步的,在本申请另一实施例中,还可以通过推荐模块获取推荐移动路线,所述推荐模块还用于:
获取第一终端或第二终端所处的第一位置,以及第一终端或第二终端周围的可用连接网络;
获取所述可用连接网络的第二位置;
将所述第一位置作为起始位置,所述第二位置作为终止位置,获取所述起始位置与所述终止位置之间的移动路线为所述推荐移动路线。
当第一终端以及第二终端的距离与可连接网络越近,越能获得更好的网络信号强度。例如,距离路由器越近时,越能获得更好的网络信号强度。
在本实施例中,获取推荐移动路线,可以有利于第一终端或第二终端进行移动,从而使得第一终端或第二终端具有更好的网络信号强度,有利于第一终端和第二终端之间传输待传输语音信息时的传输速率更高,进而提高语音通话的质量。
可选的,在本申请另一实施例中,所述装置还包括第二传输模块,所述第二传输模块用于:
判断所述传输速率是否低于预设传输速率之后,若所述传输速率高于所述预设传输速率,判断所述传输速率是否低于第一传输速率;
若所述传输速率低于所述第一传输速率,通过GIA编码标准对所述待传输语音信息进行编码并传输;
若所述传输速率高于所述第一传输速率,判断所述传输速率是否低于第二传输速率;
若所述传输速率低于所述第二传输速率,通过GSM编码标准对所述待传输语音信息进行编码并传输;
若所述传输速率高于所述二传输速率,判断所述传输速率是否低于第三传输速率;
若所述传输速率低于所述三传输速率,通过G.728编码标准对所述待传输语音信息进行编码并传输;
若所述传输速率高于所述第三传输速率,判断所述传输速率是否低于第四传输速率;
若所述传输速率低于所述第四传输速率,通过G.721编码标准对所述待传输语音信息进行编码并传输;
若所述传输速率高于所述第四传输速率,判断所述传输速率是否低于第五传输速率;
若所述传输速率低于所述第五传输速率,通过G.722编码标准对所述待传输语音信息进行编码并传输;
若所述传输速率高于所述第五传输速率,通过MPE编码标准对所述待传输语音信息进行编码并传输。
编码是用代码表示信息的过程,数字编码过程中,抽取某点的声音的频率值以及该频率的能量值并通过数字量化,相对于自然界的信号,任何数字音频编码方案都是有损的,目前最高保真的编码方式就是PCM编码,通过PCM编码可以无限程度的接近原始声音,但是PCM体积庞大,不利于传输,因此在音频传输过程中,我们会对音频进行其他形式的编码,以对音频进行压缩,提高传输的流畅度。
本实施例中,基于不同的编码标准采用不同的编码算法对语音信息进行编码。
例如,通过SB-ADPCM算法实现基于G.722编码标准进行编码,通过ADPCM算法 实现基于G.721编码标准进行编码,通过LD-CELP算法实现基于G.728编码标准进行编码,通过RPE-LTP算法实现基于GSM编码标准进行编码,通过VSELPC算法实现基于GIA编码标准进行编码。
在本实施例中,在不同的传输速率状况下,采用不同的编码标准进行编码,从而在以不同的传输速率传输过程中,可以尽可能保留更全面的语音信息,提高声音的质量。
可选的,第一传输速率为13.2kbt/s,所述第二传输速率为16kbt/s,所述第三传输速率为32kbt/s,所述第四传输速率为64kbt/s,所述第五传输速率为128kbt/s。
本申请提供的一种语音传输装置,通过接收模块接收第一终端发送的语音通话传输指令,根据所述语音通话传输指令获取待传输语音信息以及接收所述待传输语音信息的第二终端;获取模块获取传输所述待传输语音信息时的传输速率;判断模块判断所述传输速率是否低于预设传输速率;若所述传输速率低于所述预设传输速率,识别模块对所述待传输语音信息进行语音识别,获取语音识别结果,所述语音识别结果包括所述待传输语音信息对应的文字信息;编码模块将所述语音识别结果包含的文字信息进行语音编码,得到目标语音信息;第一传输模块将所述目标语音信息传输至所述第二终端。由于在传输速率低于预设传输速率时,将语音识别结果包含的文字信息进行编码,保留了待传输语音信息的语音内容,减少了语音编码时编码的信息,从而有利于语音通话时进行流畅的通话,实现提高语音通话的质量的目的,避免语音通话时出现卡顿或通话中断。
上述以软件功能模块的形式实现的集成的单元,可以存储在一个非易失性可读存储介质中。上述软件功能模块存储在一个存储介质中,包括若干指令用以使得一台计算机装置(可以是个人计算机,服务器,或者网络设备等)或处理器(processor)执行本申请各个实施例所述方法的全部或部分步骤。
如图3所示,图3是本申请实现语音传输方法的较佳实施例的计算机装置的结构示意图。所述计算机装置包括至少一个发送装置31、至少一个存储器32、至少一个处理器33、至少一个接收装置34以及至少一个通信总线。其中,所述通信总线用于实现这些组件之间的连接通信。
所述计算机装置是一种能够按照事先设定或存储的指令,自动进行数值计算和/或信息处理的设备,其硬件包括但不限于微处理器、专用集成电路(Application Specific Integrated Circuit,ASIC)、可编程门阵列(Field-Programmable Gate Array,FPGA)、数字处理器(Digital Signal Processor,DSP)、嵌入式设备等。所述计算机装置还可包括网络设 备和/或用户设备。其中,所述网络设备包括但不限于单个网络服务器、多个网络服务器组成的服务器组或基于云计算(Cloud Computing)的由大量主机或网络服务器构成的云,其中,云计算是分布式计算的一种,由一群松散耦合的计算机组成的一个超级虚拟计算机。
所述计算机装置可以是,但不限于任何一种可与用户通过键盘、触摸板或声控设备等方式进行人机交互的电子产品,例如,平板电脑、智能手机、监控设备等终端。
所述计算机装置所处的网络包括,但不限于互联网、广域网、城域网、局域网、虚拟专用网络(Virtual Private Network,VPN)等。
其中,所述接收装置34和所述发送装置31可以是有线发送端口,也可以为无线设备,例如包括天线装置,用于与其他设备进行数据通信。
所述存储器32用于存储程序代码。所述存储器32可以是集成电路中没有实物形式的具有存储功能的电路,或者,所述存储器32也可以是具有实物形式的存储器,如内存条、TF卡(Trans-flash Card)、智能媒体卡(smart media card)、安全数字卡(secure digital card)、快闪存储器卡(flash card)等储存设备等等。
所述处理器33可以包括一个或者多个微处理器、数字处理器。所述处理器33可调用存储器32中存储的程序代码以执行相关的功能。例如,图2中所述的各个模块是存储在所述存储器32中的程序代码,并由所述处理器33所执行,以实现一种语音传输方法。所述处理器33又称中央处理器(CPU,Central Processing Unit),是一块超大规模的集成电路,是运算核心(Core)和控制核心(Control Unit)。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。
最后应说明的是,以上实施例仅用以说明本申请的技术方案而非限制,尽管参照较佳实施例对本申请进行了详细说明,本领域的普通技术人员应当理解,可以对本申请的技术方案进行修改或等同替换,而不脱离本申请技术方案的精神和范围。

Claims (20)

  1. 一种语音传输方法,其特征在于,所述方法包括:
    接收第一终端发送的语音通话传输指令,根据所述语音通话传输指令获取待传输语音信息以及接收所述待传输语音信息的第二终端;
    获取传输所述待传输语音信息时的传输速率;
    判断所述传输速率是否低于预设传输速率;
    若所述传输速率低于所述预设传输速率,对所述待传输语音信息进行语音识别,获取语音识别结果,所述语音识别结果包括所述待传输语音信息对应的文字信息;
    将所述语音识别结果包含的文字信息进行语音编码,得到目标语音信息;
    将所述目标语音信息传输至所述第二终端。
  2. 如权利要求1所述的方法,其特征在于,所述语音识别结果还包括所述待传输语音信息的语音特征,所述语音特征包括基音频率;
    所述将所述语音识别结果包含的文字信息进行语音编码包括:
    将所述待传输语音信息对应的文字信息以及所述待传输语音信息的语音特征进行语音编码。
  3. 如权利要求2所述的方法,其特征在于,所述判断所述传输速率是否低于预设传输速率之后,所述方法还包括:
    若所述传输速率高于所述预设传输速率,判断所述传输速率是否低于第一传输速率;
    若所述传输速率低于所述第一传输速率,通过GIA编码标准对所述待传输语音信息进行编码并传输;
    若所述传输速率高于所述第一传输速率,判断所述传输速率是否低于第二传输速率;
    若所述传输速率低于所述第二传输速率,通过GSM编码标准对所述待传输语音信息进行编码并传输;
    若所述传输速率高于所述二传输速率,判断所述传输速率是否低于第三传输速率;
    若所述传输速率低于所述三传输速率,通过G.728编码标准对所述待传输语音信息进行编码并传输;
    若所述传输速率高于所述第三传输速率,判断所述传输速率是否低于第四传输速率;
    若所述传输速率低于所述第四传输速率,通过G.721编码标准对所述待传输语音信息进行编码并传输;
    若所述传输速率高于所述第四传输速率,判断所述传输速率是否低于第五传输速率;
    若所述传输速率低于所述第五传输速率,通过G.722编码标准对所述待传输语音信息进行编码并传输;
    若所述传输速率高于所述第五传输速率,通过MPE编码标准对所述待传输语音信息进行编码并传输。
  4. 如权利要求3所述的方法,其特征在于,所述预设传输速率为8kbit/s,所述第一传输速率为13.2kbt/s,所述第二传输速率为16kbt/s,所述第三传输速率为32kbt/s,所述第四传输速率为64kbt/s,所述第五传输速率为128kbt/s。
  5. 如权利要求1至4中任一项所述的方法,其特征在于,所述对所述待传输语音信息进行语音识别包括:
    提取所述待传输语音信息的特征,得到表示所述待传输语音信息的特征向量;
    将所述特征向量输入至预设声学模型,得到所述特征向量对应的音素信息;
    将所述音素信息输入至预设语言模型,得到所述音素信息包含的元素,所述元素包括由字或词组成的字词序列;
    基于预设字典对所述字词序列进行解码,得到所述待传输语音信息对应的文字信息。
  6. 如权利要求1至4中任一项所述的方法,其特征在于,所述方法还包括:
    若所述传输速率低于所述预设传输速率,向所述第一终端或所述第二终端发送增强网络信号强度的建议消息,或者,向所述第二终端发送存在语音传输的提醒消息。
  7. 如权利要求6所述的方法,其特征在于,所述建议消息包括推荐连接网络或推荐移动路线。
  8. 一种语音传输装置,其特征在于,所述装置包括:
    接收模块,用于接收第一终端发送的语音通话传输指令,根据所述语音通话传输指令获取待传输语音信息以及接收所述待传输语音信息的第二终端;
    获取模块,用于获取传输所述待传输语音信息时的传输速率;
    判断模块,用于判断所述传输速率是否低于预设传输速率;
    识别模块,用于若所述传输速率低于所述预设传输速率,对所述待传输语音信息进行语音识别,获取语音识别结果,所述语音识别结果包括所述待传输语音信息对应的文字信息;
    编码模块,用于将所述语音识别结果包含的文字信息进行语音编码,得到目标语音信息;
    第一传输模块,用于将所述目标语音信息传输至所述第二终端。
  9. 一种计算机装置,其特征在于,所述计算机装置包括存储器及处理器,所述存储器用于存储至少一个计算机可读指令,所述处理器用于执行所述至少一个计算机可读指令以实现以下步骤:
    接收第一终端发送的语音通话传输指令,根据所述语音通话传输指令获取待传输语音信息以及接收所述待传输语音信息的第二终端;
    获取传输所述待传输语音信息时的传输速率;
    判断所述传输速率是否低于预设传输速率;
    若所述传输速率低于所述预设传输速率,对所述待传输语音信息进行语音识别,获取语音识别结果,所述语音识别结果包括所述待传输语音信息对应的文字信息;
    将所述语音识别结果包含的文字信息进行语音编码,得到目标语音信息;
    将所述目标语音信息传输至所述第二终端。
  10. 如权利要求9所述的计算机装置,其特征在于,所述语音识别结果还包括所述待传输语音信息的语音特征,所述语音特征包括基音频率;
    所述处理器执行至少一个计算机可读指令以实现将所述语音识别结果包含的文字信息进行语音编码时包括:
    将所述待传输语音信息对应的文字信息以及所述待传输语音信息的语音特征进行语音编码。
  11. 如权利要求10所述的计算机装置,其特征在于,所述判断所述传输速率是否低于预设传输速率之后,所述处理器执行所述至少一个计算机可读指令还用以实现以下步骤:
    若所述传输速率高于所述预设传输速率,判断所述传输速率是否低于第一传输速率;
    若所述传输速率低于所述第一传输速率,通过GIA编码标准对所述待传输语音信息进行编码并传输;
    若所述传输速率高于所述第一传输速率,判断所述传输速率是否低于第二传输速率;
    若所述传输速率低于所述第二传输速率,通过GSM编码标准对所述待传输语音信息进行编码并传输;
    若所述传输速率高于所述二传输速率,判断所述传输速率是否低于第三传输速率;
    若所述传输速率低于所述三传输速率,通过G.728编码标准对所述待传输语音信息进行编码并传输;
    若所述传输速率高于所述第三传输速率,判断所述传输速率是否低于第四传输速率;
    若所述传输速率低于所述第四传输速率,通过G.721编码标准对所述待传输语音信息进行编码并传输;
    若所述传输速率高于所述第四传输速率,判断所述传输速率是否低于第五传输速率;
    若所述传输速率低于所述第五传输速率,通过G.722编码标准对所述待传输语音信息进行编码并传输;
    若所述传输速率高于所述第五传输速率,通过MPE编码标准对所述待传输语音信息进行编码并传输。
  12. 如权利要求11所述的计算机装置,其特征在于,所述预设传输速率为8kbit/s,所述第一传输速率为13.2kbt/s,所述第二传输速率为16kbt/s,所述第三传输速率为32kbt/s,所述第四传输速率为64kbt/s,所述第五传输速率为128kbt/s。
  13. 如权利要求9至12中任一项所述的计算机装置,其特征在于,所述处理器执行至少一个计算机可读指令以实现所述对所述待传输语音信息进行语音识别时,具体包括:
    提取所述待传输语音信息的特征,得到表示所述待传输语音信息的特征向量;
    将所述特征向量输入至预设声学模型,得到所述特征向量对应的音素信息;
    将所述音素信息输入至预设语言模型,得到所述音素信息包含的元素,所述元素包括由字或词组成的字词序列;
    基于预设字典对所述字词序列进行解码,得到所述待传输语音信息对应的文字信息。
  14. 如权利要求9至12中任一项所述的计算机装置,其特征在于,所述处理器执行至少一个计算机可读指令还用以实现以下步骤:
    若所述传输速率低于所述预设传输速率,向所述第一终端或所述第二终端发送增强网络信号强度的建议消息,或者,向所述第二终端发送存在语音传输的提醒消息。
  15. 一种非易失性可读存储介质,其上存储有计算机指令,其特征在于,所述非易失性可读存储介质存储有至少一个计算机可读指令,所述至少一个计算机可读指令被处理器执行实现以下步骤:
    接收第一终端发送的语音通话传输指令,根据所述语音通话传输指令获取待传输语音信息以及接收所述待传输语音信息的第二终端;
    获取传输所述待传输语音信息时的传输速率;
    判断所述传输速率是否低于预设传输速率;
    若所述传输速率低于所述预设传输速率,对所述待传输语音信息进行语音识别,获取语音识别结果,所述语音识别结果包括所述待传输语音信息对应的文字信息;
    将所述语音识别结果包含的文字信息进行语音编码,得到目标语音信息;
    将所述目标语音信息传输至所述第二终端。
  16. 如权利要求15所述的非易失性可读存储介质,其特征在于,所述语音识别结果还包括所述待传输语音信息的语音特征,所述语音特征包括基音频率;
    所述至少一个计算机可读指令被处理器执行以实现将所述语音识别结果包含的文字信息进行语音编码时,包括:
    将所述待传输语音信息对应的文字信息以及所述待传输语音信息的语音特征进行语音编码。
  17. 如权利要求16所述的非易失性可读存储介质,其特征在于,所述判断所述传输速率是否低于预设传输速率之后,所述至少一个计算机可读指令被所述处理器执行还用以实现以下步骤:
    若所述传输速率高于所述预设传输速率,判断所述传输速率是否低于第一传输速率;
    若所述传输速率低于所述第一传输速率,通过GIA编码标准对所述待传输语音信息进行编码并传输;
    若所述传输速率高于所述第一传输速率,判断所述传输速率是否低于第二传输速率;
    若所述传输速率低于所述第二传输速率,通过GSM编码标准对所述待传输语音信息进行编码并传输;
    若所述传输速率高于所述二传输速率,判断所述传输速率是否低于第三传输速率;
    若所述传输速率低于所述三传输速率,通过G.728编码标准对所述待传输语音信息进行编码并传输;
    若所述传输速率高于所述第三传输速率,判断所述传输速率是否低于第四传输速率;
    若所述传输速率低于所述第四传输速率,通过G.721编码标准对所述待传输语音信息进行编码并传输;
    若所述传输速率高于所述第四传输速率,判断所述传输速率是否低于第五传输速率;
    若所述传输速率低于所述第五传输速率,通过G.722编码标准对所述待传输语音信息进行编码并传输;
    若所述传输速率高于所述第五传输速率,通过MPE编码标准对所述待传输语音信息进行编码并传输。
  18. 如权利要求17所述的非易失性可读存储介质,其特征在于,所述预设传输速率为8kbit/s,所述第一传输速率为13.2kbt/s,所述第二传输速率为16kbt/s,所述第三传输 速率为32kbt/s,所述第四传输速率为64kbt/s,所述第五传输速率为128kbt/s。
  19. 如权利要求15至18中任一项所述的非易失性可读存储介质,其特征在于,所述至少一个计算机可读指令被所述处理器执行以实现所述对所述待传输语音信息进行语音识别时,具体包括:
    提取所述待传输语音信息的特征,得到表示所述待传输语音信息的特征向量;
    将所述特征向量输入至预设声学模型,得到所述特征向量对应的音素信息;
    将所述音素信息输入至预设语言模型,得到所述音素信息包含的元素,所述元素包括由字或词组成的字词序列;
    基于预设字典对所述字词序列进行解码,得到所述待传输语音信息对应的文字信息。
  20. 如权利要求15至18中任一项所述的非易失性可读存储介质,其特征在于,所述至少一个计算机可读指令被所述处理器执行还用以实现以下步骤:
    若所述传输速率低于所述预设传输速率,向所述第一终端或所述第二终端发送增强网络信号强度的建议消息,或者,向所述第二终端发送存在语音传输的提醒消息。
PCT/CN2019/118022 2019-05-29 2019-11-13 语音传输方法、装置、计算机装置及存储介质 WO2020238058A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910459488.7 2019-05-29
CN201910459488.7A CN110364170B (zh) 2019-05-29 2019-05-29 语音传输方法、装置、计算机装置及存储介质

Publications (1)

Publication Number Publication Date
WO2020238058A1 true WO2020238058A1 (zh) 2020-12-03

Family

ID=68215394

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/118022 WO2020238058A1 (zh) 2019-05-29 2019-11-13 语音传输方法、装置、计算机装置及存储介质

Country Status (2)

Country Link
CN (1) CN110364170B (zh)
WO (1) WO2020238058A1 (zh)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110364170B (zh) * 2019-05-29 2024-01-30 平安科技(深圳)有限公司 语音传输方法、装置、计算机装置及存储介质
CN111199747A (zh) * 2020-03-05 2020-05-26 北京花兰德科技咨询服务有限公司 人工智能通信系统及通信方法
CN111245868B (zh) * 2020-03-10 2021-04-13 诺领科技(南京)有限公司 窄带物联网语音消息通信方法和系统
CN111785293B (zh) * 2020-06-04 2023-04-25 杭州海康威视系统技术有限公司 语音传输方法、装置及设备、存储介质
CN112202803A (zh) * 2020-10-10 2021-01-08 北京字节跳动网络技术有限公司 音频处理的方法、装置、终端及存储介质
CN112822297A (zh) * 2021-04-01 2021-05-18 深圳市顺易通信息科技有限公司 一种停车场服务数据传输方法及相关设备

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080162150A1 (en) * 2006-12-28 2008-07-03 Vianix Delaware, Llc System and Method for a High Performance Audio Codec
CN102790997A (zh) * 2011-05-19 2012-11-21 中兴通讯股份有限公司 一种自适应多速率amr语音数据的传输方法及装置
CN103714823A (zh) * 2013-12-19 2014-04-09 同济大学 一种基于综合语音编码的自适应水下通信方法
CN106850615A (zh) * 2017-01-24 2017-06-13 华为技术有限公司 一种编码速率控制的方法、相关装置及系统
CN109712631A (zh) * 2019-03-28 2019-05-03 南昌黑鲨科技有限公司 音频数据传输控制方法、装置、系统及可读存储介质
CN110364170A (zh) * 2019-05-29 2019-10-22 平安科技(深圳)有限公司 语音传输方法、装置、计算机装置及存储介质

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH08116385A (ja) * 1994-10-14 1996-05-07 Hitachi Ltd 個人情報端末装置および音声応答システム
CN105989844B (zh) * 2015-01-29 2019-12-13 中国移动通信集团公司 一种音频传输的自适应方法及装置
CN107066477A (zh) * 2016-12-13 2017-08-18 合网络技术(北京)有限公司 一种智能推荐视频的方法及装置
CN107770387A (zh) * 2017-10-31 2018-03-06 珠海市魅族科技有限公司 通信控制方法、装置、计算机装置及计算机可读存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080162150A1 (en) * 2006-12-28 2008-07-03 Vianix Delaware, Llc System and Method for a High Performance Audio Codec
CN102790997A (zh) * 2011-05-19 2012-11-21 中兴通讯股份有限公司 一种自适应多速率amr语音数据的传输方法及装置
CN103714823A (zh) * 2013-12-19 2014-04-09 同济大学 一种基于综合语音编码的自适应水下通信方法
CN106850615A (zh) * 2017-01-24 2017-06-13 华为技术有限公司 一种编码速率控制的方法、相关装置及系统
CN109712631A (zh) * 2019-03-28 2019-05-03 南昌黑鲨科技有限公司 音频数据传输控制方法、装置、系统及可读存储介质
CN110364170A (zh) * 2019-05-29 2019-10-22 平安科技(深圳)有限公司 语音传输方法、装置、计算机装置及存储介质

Also Published As

Publication number Publication date
CN110364170B (zh) 2024-01-30
CN110364170A (zh) 2019-10-22

Similar Documents

Publication Publication Date Title
WO2020238058A1 (zh) 语音传输方法、装置、计算机装置及存储介质
US11308978B2 (en) Systems and methods for energy efficient and low power distributed automatic speech recognition on wearable devices
CN111968679B (zh) 情感识别方法、装置、电子设备及存储介质
CN108141498B (zh) 一种翻译方法及终端
CN104781879B (zh) 用于对音频信号进行编码的方法和装置
WO2021103778A1 (zh) 语音处理方法、装置、计算机可读存储介质和计算机设备
CN106504742B (zh) 合成语音的传输方法、云端服务器和终端设备
WO2014048127A1 (zh) 语音质量监控的方法和装置
CN104766608A (zh) 一种语音控制方法及装置
KR101279857B1 (ko) 적응적 멀티 레이트 코덱 모드 디코딩 방법 및 장치
CN111816190A (zh) 用于上位机与下位机的语音交互方法和装置
CN110931000B (zh) 语音识别的方法和装置
CN113689864A (zh) 一种音频数据处理方法、装置及存储介质
JP6549009B2 (ja) 通信端末及び音声認識システム
CN110351419B (zh) 一种智能语音系统及其语音处理方法
WO2013102403A1 (zh) 一种音频信号处理方法、装置及终端
JP4437011B2 (ja) 音声符号化装置
CN111191451B (zh) 中文语句简化方法和装置
CN113257242A (zh) 自助语音服务中的语音播报中止方法、装置、设备及介质
WO2014010175A1 (ja) 符号化装置及び符号化方法
KR100462042B1 (ko) 이동 통신망을 이용한 메시지를 전송하는 방법 및 시스템
CN106531175A (zh) 一种网络话机柔和噪声产生的方法
WO2022135237A1 (zh) 语音处理方法、终端设备及存储介质
CN117037793A (zh) 一种基于全双工的对话方法、装置及存储介质
Ma et al. Design and research of MELP vocoder based on Beidou voice communication

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19931156

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19931156

Country of ref document: EP

Kind code of ref document: A1