CN110364170B - Voice transmission method, voice transmission device, computer device and storage medium - Google Patents

Voice transmission method, voice transmission device, computer device and storage medium Download PDF

Info

Publication number
CN110364170B
CN110364170B CN201910459488.7A CN201910459488A CN110364170B CN 110364170 B CN110364170 B CN 110364170B CN 201910459488 A CN201910459488 A CN 201910459488A CN 110364170 B CN110364170 B CN 110364170B
Authority
CN
China
Prior art keywords
transmission rate
voice
transmitted
terminal
voice information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910459488.7A
Other languages
Chinese (zh)
Other versions
CN110364170A (en
Inventor
邹昆伦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201910459488.7A priority Critical patent/CN110364170B/en
Publication of CN110364170A publication Critical patent/CN110364170A/en
Priority to PCT/CN2019/118022 priority patent/WO2020238058A1/en
Application granted granted Critical
Publication of CN110364170B publication Critical patent/CN110364170B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • H04L1/0001Systems modifying transmission characteristics according to link quality, e.g. power backoff
    • H04L1/0002Systems modifying transmission characteristics according to link quality, e.g. power backoff by adapting the transmission rate

Abstract

The invention provides a voice transmission method, which comprises the following steps: receiving a voice call transmission instruction sent by a first terminal, and acquiring voice information to be transmitted and a second terminal for receiving the voice information to be transmitted according to the voice call transmission instruction; acquiring the transmission rate when the voice information to be transmitted is transmitted; judging whether the transmission rate is lower than a preset transmission rate or not; if the transmission rate is lower than the preset transmission rate, performing voice recognition on the voice information to be transmitted to obtain a voice recognition result, wherein the voice recognition result comprises text information corresponding to the voice information to be transmitted; performing voice coding on the text information contained in the voice recognition result to obtain target voice information; and transmitting the target voice information to the second terminal. The invention also discloses a voice transmission device, a computer device and a computer readable storage medium. The invention can improve the quality of voice communication.

Description

Voice transmission method, voice transmission device, computer device and storage medium
Technical Field
The present invention relates to the field of communications technologies, and in particular, to a voice transmission method, a voice transmission device, a computer device, and a storage medium.
Background
With the development of computer technology and the popularization of mobile terminals, various voice call products are more and more, and when the network condition is good, the call quality is also good, and when the network condition is bad, the conditions such as voice blocking and the like caused by discontinuous transmission can occur during voice transmission, so that the quality of voice call is reduced, and the user experience is affected.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a voice transmission method, apparatus, computer apparatus, and storage medium capable of improving the quality of voice calls.
The invention provides a voice transmission method, which comprises the following steps:
receiving a voice call transmission instruction sent by a first terminal, and acquiring voice information to be transmitted and a second terminal for receiving the voice information to be transmitted according to the voice call transmission instruction;
acquiring the transmission rate when the voice information to be transmitted is transmitted;
judging whether the transmission rate is lower than a preset transmission rate or not;
if the transmission rate is lower than the preset transmission rate, performing voice recognition on the voice information to be transmitted to obtain a voice recognition result, wherein the voice recognition result comprises text information corresponding to the voice information to be transmitted;
Performing voice coding on the text information contained in the voice recognition result to obtain target voice information;
and transmitting the target voice information to the second terminal.
In an alternative implementation of the present invention, the speech recognition result further includes a speech feature of the speech information to be transmitted, where the speech feature includes a pitch frequency;
the speech coding of the text information contained in the speech recognition result comprises the following steps:
and carrying out voice coding on the text information corresponding to the voice information to be transmitted and the voice characteristics of the voice information to be transmitted.
In an optional implementation of the present invention, after the determining whether the transmission rate is lower than a preset transmission rate, the method further includes:
if the transmission rate is higher than the preset transmission rate, judging whether the transmission rate is lower than a first transmission rate or not;
if the transmission rate is lower than the first transmission rate, encoding and transmitting the voice information to be transmitted through a GIA encoding standard;
if the transmission rate is higher than the first transmission rate, judging whether the transmission rate is lower than a second transmission rate;
if the transmission rate is lower than the second transmission rate, coding and transmitting the voice information to be transmitted through a GSM coding standard;
If the transmission rate is higher than the two transmission rates, judging whether the transmission rate is lower than a third transmission rate;
if the transmission rate is lower than the three transmission rates, encoding and transmitting the voice information to be transmitted by using a G.728 encoding standard;
if the transmission rate is higher than the third transmission rate, judging whether the transmission rate is lower than a fourth transmission rate;
if the transmission rate is lower than the fourth transmission rate, encoding and transmitting the voice information to be transmitted through a G.721 encoding standard;
if the transmission rate is higher than the fourth transmission rate, judging whether the transmission rate is lower than a fifth transmission rate;
if the transmission rate is lower than the fifth transmission rate, encoding and transmitting the voice information to be transmitted through a G.722 encoding standard;
and if the transmission rate is higher than the fifth transmission rate, encoding and transmitting the voice information to be transmitted by using an MPE encoding standard.
In an alternative implementation of the present invention, the preset transmission rate is 8kbit/s, the first transmission rate is 13.2kbt/s, the second transmission rate is 16kbt/s, the third transmission rate is 32kbt/s, the fourth transmission rate is 64kbt/s, and the fifth transmission rate is 128kbt/s.
In an optional embodiment of the present invention, the performing speech recognition on the to-be-transmitted speech information includes:
extracting the characteristics of the voice information to be transmitted to obtain a characteristic vector representing the voice information to be transmitted;
inputting the feature vector into a preset acoustic model to obtain phoneme information corresponding to the feature vector;
inputting the phoneme information into a preset language model to obtain elements contained in the phoneme information, wherein the elements comprise word sequences consisting of words or words;
and decoding the word sequence based on a preset dictionary to obtain the text information corresponding to the voice information to be transmitted.
In an alternative embodiment of the present invention, the method further comprises:
and if the transmission rate is lower than the preset transmission rate, sending a suggestion message for enhancing the network signal strength to the first terminal or the second terminal, or sending a reminding message for voice transmission to the second terminal.
In an alternative embodiment of the invention, the advice message comprises a recommended connection network or a recommended route of movement.
The invention also provides a voice transmission device, which comprises:
the receiving module is used for receiving a voice call transmission instruction sent by the first terminal, acquiring voice information to be transmitted according to the voice call transmission instruction and receiving a second terminal for receiving the voice information to be transmitted;
The acquisition module is used for acquiring the transmission rate when the voice information to be transmitted is transmitted;
the judging module is used for judging whether the transmission rate is lower than a preset transmission rate or not;
the recognition module is used for carrying out voice recognition on the voice information to be transmitted if the transmission rate is lower than the preset transmission rate, and obtaining a voice recognition result, wherein the voice recognition result comprises text information corresponding to the voice information to be transmitted;
the coding module is used for carrying out voice coding on the text information contained in the voice recognition result to obtain target voice information;
and the first transmission module is used for transmitting the target voice information to the second terminal.
In an alternative embodiment of the present invention, the speech recognition result further includes a speech feature of the speech information to be transmitted, where the speech feature includes a pitch frequency;
the encoding module performing voice encoding on the text information contained in the voice recognition result comprises the following steps:
and carrying out voice coding on the text information corresponding to the voice information to be transmitted and the voice characteristics of the voice information to be transmitted.
In an alternative embodiment of the present invention, the apparatus further comprises a second transmission module, where the second transmission module is configured to:
Judging whether the transmission rate is lower than a preset transmission rate or not, and judging whether the transmission rate is lower than a first transmission rate or not if the transmission rate is higher than the preset transmission rate;
if the transmission rate is lower than the first transmission rate, encoding and transmitting the voice information to be transmitted through a GIA encoding standard;
if the transmission rate is higher than the first transmission rate, judging whether the transmission rate is lower than a second transmission rate;
if the transmission rate is lower than the second transmission rate, coding and transmitting the voice information to be transmitted through a GSM coding standard;
if the transmission rate is higher than the two transmission rates, judging whether the transmission rate is lower than a third transmission rate;
if the transmission rate is lower than the three transmission rates, encoding and transmitting the voice information to be transmitted by using a G.728 encoding standard;
if the transmission rate is higher than the third transmission rate, judging whether the transmission rate is lower than a fourth transmission rate;
if the transmission rate is lower than the fourth transmission rate, encoding and transmitting the voice information to be transmitted through a G.721 encoding standard;
If the transmission rate is higher than the fourth transmission rate, judging whether the transmission rate is lower than a fifth transmission rate;
if the transmission rate is lower than the fifth transmission rate, encoding and transmitting the voice information to be transmitted through a G.722 encoding standard;
and if the transmission rate is higher than the fifth transmission rate, encoding and transmitting the voice information to be transmitted by using an MPE encoding standard.
In an alternative embodiment of the present invention, the preset transmission rate is 8kbit/s, the first transmission rate is 13.2kbt/s, the second transmission rate is 16kbt/s, the third transmission rate is 32kbt/s, the fourth transmission rate is 64kbt/s, and the fifth transmission rate is 128kbt/s.
In an alternative embodiment of the present invention, the identifying module performs voice identification on the voice information to be transmitted includes:
extracting the characteristics of the voice information to be transmitted to obtain a characteristic vector representing the voice information to be transmitted;
inputting the feature vector into a preset acoustic model to obtain phoneme information corresponding to the feature vector;
inputting the phoneme information into a preset language model to obtain elements contained in the phoneme information, wherein the elements comprise word sequences consisting of words or words;
And decoding the word sequence based on a preset dictionary to obtain the text information corresponding to the voice information to be transmitted.
In an alternative embodiment of the present invention, the apparatus further comprises:
and the reminding module is used for sending a suggestion message for enhancing the network signal strength to the first terminal or the second terminal or sending a reminding message for voice transmission to the second terminal if the transmission rate is lower than the preset transmission rate.
In an alternative embodiment of the invention, the advice message comprises a recommended connection network or a recommended route of movement.
The present invention also provides a computer apparatus comprising a memory for storing at least one instruction and a processor for executing the at least one instruction to implement the voice transmission method described in any of the embodiments.
The present invention also provides a computer readable storage medium storing at least one instruction that when executed by a processor implements the voice transmission method described in any of the embodiments.
According to the technical scheme, the voice information to be transmitted and the second terminal for receiving the voice information to be transmitted are obtained according to the voice call transmission instruction by receiving the voice call transmission instruction sent by the first terminal; acquiring the transmission rate when the voice information to be transmitted is transmitted; judging whether the transmission rate is lower than a preset transmission rate or not; if the transmission rate is lower than the preset transmission rate, performing voice recognition on the voice information to be transmitted to obtain a voice recognition result, wherein the voice recognition result comprises text information corresponding to the voice information to be transmitted; performing voice coding on the text information contained in the voice recognition result to obtain target voice information; and transmitting the target voice information to the second terminal. When the transmission rate is lower than the preset transmission rate, the text information contained in the voice recognition result is encoded, so that the voice content of the voice information to be transmitted is reserved, the information encoded during voice encoding is reduced, smooth conversation is facilitated during voice conversation, the purpose of improving the quality of the voice conversation is achieved, and the occurrence of blocking or conversation interruption during voice conversation is avoided.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flowchart of a voice transmission method according to an embodiment of the present invention;
fig. 2 is a functional block diagram of a voice transmission device according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a computer device for implementing a voice transmission method according to a preferred embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description.
As shown in fig. 1, fig. 1 is a flowchart of a voice transmission method according to an embodiment of the present invention. The order of the steps in the flowchart may be changed and some steps may be omitted according to various needs.
S11, receiving a voice call transmission instruction sent by a first terminal, and acquiring voice information to be transmitted and a second terminal receiving the voice information to be transmitted according to the voice call transmission instruction.
In this embodiment, the first terminal and the second terminal may be the same electronic device or different electronic devices, for example, the first terminal and the second terminal are both mobile phones, or the first terminal is a mobile phone and the second terminal is a computer.
The voice call transmission instruction is an instruction for transmitting voice information between two terminals.
In this embodiment, the first terminal is a sender of voice information, i.e. a calling party, and the second terminal is a receiver of voice information, i.e. a called party.
In a possible embodiment, the second terminal for obtaining the voice information to be transmitted and receiving the voice information to be transmitted according to the voice call transmission instruction includes: and acquiring the voice information to be transmitted indicated by the voice call transmission instruction and a second terminal for receiving the voice information to be transmitted.
For example, the voice call transmission instruction includes the voice information to be transmitted and the receiver of the voice information to be transmitted, that is, the second terminal that receives the voice information to be transmitted.
S12, acquiring the transmission rate when the voice information to be transmitted is transmitted.
The transmission rate, i.e., the network transmission rate, refers to the rate at which hosts on a computer network transmit data over a digital channel. For example, a transmission rate of 16 bits/s indicates that 16 bits of data are transmitted per second.
In this embodiment, the obtaining the transmission rate when the voice information to be transmitted is transmitted includes: and acquiring the sending rate of the first terminal or the receiving rate of the second terminal when the first terminal transmits the voice information to be transmitted to the second terminal.
For example, when voice transmission is performed through communication software, a transmission rate of a calling party is acquired, and the transmission rate reflects a data transmission rate when the calling party transmits voice information to a base station/server; alternatively, when voice transmission is performed by communication such as software, the transmission rate of the called party is acquired, which reflects the data transmission rate at which the called party receives voice information.
S13, judging whether the transmission rate is lower than a preset transmission rate.
In this embodiment, it is determined whether the transmission rate is lower than the preset transmission rate, so as to determine whether both communication parties are in a poor network environment when performing voice transmission, and whether the call quality is affected.
The specific value of the preset transmission rate can be preset according to the requirement.
Optionally, the preset transmission rate is 8kbit/s.
Optionally, in another embodiment of the present invention, after the determining whether the transmission rate is lower than a preset transmission rate, the method further includes:
if the transmission rate is higher than the preset transmission rate, judging whether the transmission rate is lower than a first transmission rate or not;
if the transmission rate is lower than the first transmission rate, encoding and transmitting the voice information to be transmitted through a GIA encoding standard;
if the transmission rate is higher than the first transmission rate, judging whether the transmission rate is lower than a second transmission rate;
if the transmission rate is lower than the second transmission rate, coding and transmitting the voice information to be transmitted through a GSM coding standard;
if the transmission rate is higher than the two transmission rates, judging whether the transmission rate is lower than a third transmission rate;
If the transmission rate is lower than the three transmission rates, encoding and transmitting the voice information to be transmitted by using a G.728 encoding standard;
if the transmission rate is higher than the third transmission rate, judging whether the transmission rate is lower than a fourth transmission rate;
if the transmission rate is lower than the fourth transmission rate, encoding and transmitting the voice information to be transmitted through a G.721 encoding standard;
if the transmission rate is higher than the fourth transmission rate, judging whether the transmission rate is lower than a fifth transmission rate;
if the transmission rate is lower than the fifth transmission rate, encoding and transmitting the voice information to be transmitted through a G.722 encoding standard;
and if the transmission rate is higher than the fifth transmission rate, encoding and transmitting the voice information to be transmitted by using an MPE encoding standard.
The coding is a process of representing information by codes, in the digital coding process, the frequency value of sound at a certain point and the energy value of the frequency are extracted and are digitally quantized, any digital audio coding scheme is lossy relative to a natural signal, the current highest-fidelity coding mode is PCM coding, the original sound can be infinitely approximate to the PCM coding, but the PCM is huge in size and unfavorable for transmission, so that in the audio transmission process, other forms of coding can be carried out on the audio to compress the audio, and the transmission smoothness is improved.
In this embodiment, different coding algorithms are used to code the speech information based on different coding standards.
For example, coding based on the G.722 coding standard is realized by SB-ADPCM algorithm, coding based on the G.721 coding standard is realized by ADPCM algorithm, coding based on the G.728 coding standard is realized by LD-CELP algorithm, coding based on the GSM coding standard is realized by RPE-LTP algorithm, and coding based on the GIA coding standard is realized by VSELPC algorithm.
In this embodiment, under different transmission rate conditions, different coding standards are adopted to perform coding, so that in the transmission process with different transmission rates, more comprehensive voice information can be reserved as much as possible, and the quality of sound is improved.
Optionally, the first transmission rate is 13.2kbt/s, the second transmission rate is 16kbt/s, the third transmission rate is 32kbt/s, the fourth transmission rate is 64kbt/s, and the fifth transmission rate is 128kbt/s.
And S14, if the transmission rate is lower than the preset transmission rate, performing voice recognition on the voice information to be transmitted to obtain a voice recognition result, wherein the voice recognition result comprises text information corresponding to the voice information to be transmitted.
In this embodiment, the voice recognition refers to converting a voice signal into corresponding text information.
Specifically, voice recognition is performed on voice information to be transmitted through a voice recognition technology.
Optionally, in another embodiment of the present invention, the performing speech recognition on the to-be-transmitted speech information includes:
extracting the characteristics of the voice information to be transmitted to obtain a characteristic vector representing the voice information to be transmitted;
inputting the feature vector into a preset acoustic model to obtain phoneme information corresponding to the feature vector;
inputting the phoneme information into a preset language model to obtain elements contained in the phoneme information, wherein the elements comprise word sequences consisting of words or words;
and decoding the word sequence based on a preset dictionary to obtain the text information corresponding to the voice information to be transmitted.
The preset acoustic model and the preset language model can be selected according to requirements.
S15, carrying out voice coding on the text information contained in the voice recognition result to obtain target voice information.
In this embodiment, the text information included in the speech recognition result is encoded by speech, which is different from the conventional encoding of the voice sample, so that the data size during transmission can be greatly reduced.
The traditional coding mode is to sample and code the frequency and amplitude of sound, and the data volume transmitted during the traditional coding is calculated as follows:
data volume (byte/second) =sample rate (Hz) ×sample size (bit) ×number of channels/8
Taking a sampling rate of 16K mono as an example, 1s of sound data has a size of 16000×16×1/8=32 Kb
In this embodiment, the voice data transmitted every second after the target voice information is encoded is: the data amount (byte/second) =number of words per second(s) corresponds to the size of the word code (bit), where the number of words per second is the number of words per second in the voice information recognized by the voice, different numbers of words (e.g. kanji) have the corresponding size of the word code, and the corresponding size of the word code can be determined according to the corresponding relationship between the preset word and the size of the word code.
Taking the case that the voice information to be transmitted is Chinese as an example, the number of Chinese characters which can be spoken by a general person per second is less than 10, the Chinese characters are coded into 2 characters/Chinese character, and the data size of 1s is 10 x 2=20bit, so that the data size transmitted per second is greatly reduced when the voice information is transmitted in the embodiment.
Optionally, in another embodiment of the present invention, the speech recognition result further includes a speech feature of the speech information to be transmitted, where the speech feature includes a pitch frequency;
The speech coding of the text information contained in the speech recognition result comprises the following steps:
and carrying out voice coding on the text information corresponding to the voice information to be transmitted and the voice characteristics of the voice information to be transmitted.
Speech features refer to information reflecting speech features. Such as the intensity, loudness, or pitch of speech.
When a person produces a voiced sound, the airflow passes through the glottis to enable the vocal cords to produce relaxation oscillation vibration, a quasi-periodic pulse airflow is generated, and the airflow excites the vocal tract to produce a voiced sound, which is also called voiced sound and carries most of energy in the voice. The frequency of such vocal cord vibration becomes the pitch frequency.
The pitch frequency is related to the length, thickness, toughness, stiffness, pronunciation habit, etc. of the vocal cords, and can reflect the characteristics of the individual to a great extent. Therefore, in the embodiment, the pitch frequency is combined for encoding, so that the characteristics of sound can be reserved to the greatest extent while the accurate transmission of the content is ensured.
In this embodiment, the pitch frequency of the speech information may be obtained by a cepstrum method.
In this embodiment, the voice encoding of the text information corresponding to the voice information to be transmitted and the voice feature of the voice information to be transmitted is to encode the text information in combination with the voice feature, which is different from the conventional encoding of the voice sample, and can greatly reduce the data volume during transmission.
In this embodiment, the voice data transmitted every second after the target voice information is encoded is: the data amount (byte/second) =number of words per second (number of words per second) transmitted during speech coding corresponds to the word code size (bit) +speech feature (according to the extracted speech feature, for example, 10 bit/s), wherein the number of words per second is the number of words per second in the speech information recognized by speech, different numbers of words (for example, chinese characters) have corresponding word code sizes, and the corresponding word code sizes can be determined according to the corresponding relation between preset words and word code sizes.
Taking the case that the voice information to be transmitted is Chinese as an example, the number of Chinese characters which can be spoken by a general person per second is less than 10, the Chinese characters are coded into 2 characters/Chinese character, and the data size of 1s is 10 x 2+10=30bit, so that the data size transmitted per second is greatly reduced when the voice information is transmitted in the embodiment.
S16, transmitting the target voice information to the second terminal.
In an alternative embodiment, after the second terminal receives the target voice information, the target voice information is decoded, that is, the received text information (or text information and voice features) is restored to voice.
In an alternative embodiment, if there is no content in the recovered speech, white noise is used for filling. Wherein white noise is a segment of sound, and specifically white noise is noise in which the power spectral density is uniformly distributed throughout the frequency domain.
By filling the restored voice with white noise, misoperation (such as exit) caused by voice interruption when the user listens to the restored voice through the second terminal and does not hear the voice can be avoided.
Through the embodiment, although the characteristics such as audio frequency and volume are lost in the encoding process, under the condition of extremely poor network, the voice content can be reserved greatly, and the conditions that voice is intermittent, the voice content is lost and even the voice cannot be communicated during voice communication are avoided.
In another embodiment of the present invention, the method further comprises:
and if the transmission rate is lower than the preset transmission rate, sending a suggestion message for enhancing the network signal strength to the first terminal or the second terminal, or sending a reminding message for voice transmission to the second terminal.
In this embodiment, when the transmission rate is lower than the preset transmission rate, the suggestion message for enhancing the network signal strength may be specifically a suggestion that how to make the first terminal or the second terminal enhance the network signal, so that the transmission rate is better when the voice information to be transmitted is transmitted between the first terminal and the second terminal, and the quality of the voice call is further improved.
Optionally, the advice message includes a recommended connection network or a recommended moving route.
In this embodiment, the recommended connection network is another connectable network recommended to the first terminal or the second terminal. The recommended mobile location refers to a location where the first terminal or the second terminal is moved to enhance the network signal of the first terminal or the second terminal.
Further, in another embodiment of the present invention, the recommended connection network may be obtained by the following method further includes:
acquiring connectable networks around a first terminal or a second terminal, wherein a network with network signal strength larger than the network signal strength threshold in the connectable networks is acquired as a recommended connection network; or alternatively
Acquiring connectable networks around a first terminal or a second terminal, and acquiring a network with the strongest network signal strength in the connectable networks as a recommended connection network; or alternatively
Acquiring connectable networks around a first terminal or a second terminal, acquiring a safety network in the connectable networks, and acquiring a network with network signal strength larger than the network signal strength threshold in the safety network as a recommended network; or alternatively
And acquiring connectable networks around the first terminal or the second terminal, acquiring a history connection network in the connectable networks, and acquiring a network with network signal strength larger than the network signal strength threshold in the history network as a recommended network.
The history connection network refers to a network to which the first terminal or the second terminal is connected.
The network with the network signal strength larger than the network signal strength threshold in the safety network is obtained as the recommended network, so that the safety network can be obtained, the first terminal or the second terminal is connected to the safety network, and the network safety problem is avoided.
Further, in another embodiment of the present invention, the recommended moving route may be obtained by:
acquiring a first position where a first terminal or a second terminal is located and available connection networks around the first terminal or the second terminal;
acquiring a second position of the available connection network;
and taking the first position as a starting position, taking the second position as a termination position, and acquiring a moving route between the starting position and the termination position as the recommended moving route.
The closer the first terminal and the second terminal are to the connectable network, the better the network signal strength can be obtained. For example, the closer to the router, the better the network signal strength can be obtained.
In this embodiment, the recommended mobile route is obtained, which may be beneficial for the first terminal or the second terminal to move, so that the first terminal or the second terminal has better network signal strength, which is beneficial for higher transmission rate when the voice information to be transmitted is transmitted between the first terminal and the second terminal, and further improves the quality of the voice call.
The invention provides a voice transmission method, which is used for receiving a voice call transmission instruction sent by a first terminal, acquiring voice information to be transmitted according to the voice call transmission instruction and receiving a second terminal for receiving the voice information to be transmitted; acquiring the transmission rate when the voice information to be transmitted is transmitted; judging whether the transmission rate is lower than a preset transmission rate or not; if the transmission rate is lower than the preset transmission rate, performing voice recognition on the voice information to be transmitted to obtain a voice recognition result, wherein the voice recognition result comprises text information corresponding to the voice information to be transmitted; performing voice coding on the text information contained in the voice recognition result to obtain target voice information; and transmitting the target voice information to the second terminal. When the transmission rate is lower than the preset transmission rate, the text information contained in the voice recognition result is encoded, so that the voice content of the voice information to be transmitted is reserved, the information encoded during voice encoding is reduced, smooth conversation is facilitated during voice conversation, the purpose of improving the quality of the voice conversation is achieved, and the occurrence of blocking or conversation interruption during voice conversation is avoided.
Fig. 2 is a functional block diagram of a voice transmission device according to an embodiment of the present invention, as shown in fig. 2. The voice transmission device includes a receiving module 210, an obtaining module 220, a judging module 230, an identifying module 240, an encoding module 250 and a first transmission module 260. The module referred to in the present invention refers to a series of computer program segments capable of being executed by a processor and of performing a fixed function, which are stored in the memory of a computer device. In the present embodiment, the functions of the respective modules will be described in detail in the following embodiments.
The receiving module 210 is configured to receive a voice call transmission instruction sent by the first terminal, obtain voice information to be transmitted according to the voice call transmission instruction, and receive a second terminal of the voice information to be transmitted.
In this embodiment, the first terminal and the second terminal may be the same electronic device or different electronic devices, for example, the first terminal and the second terminal are both mobile phones, or the first terminal is a mobile phone and the second terminal is a computer.
The voice call transmission instruction is an instruction for transmitting voice information between two terminals.
In this embodiment, the first terminal is a sender of voice information, i.e. a calling party, and the second terminal is a receiver of voice information, i.e. a called party.
In a possible embodiment, the second terminal for obtaining the voice information to be transmitted and receiving the voice information to be transmitted according to the voice call transmission instruction includes: and acquiring the voice information to be transmitted indicated by the voice call transmission instruction and a second terminal for receiving the voice information to be transmitted.
For example, the voice call transmission instruction includes the voice information to be transmitted and the receiver of the voice information to be transmitted, that is, the second terminal that receives the voice information to be transmitted.
The obtaining module 220 is configured to obtain a transmission rate when the voice information to be transmitted is transmitted.
The transmission rate, i.e., the network transmission rate, refers to the rate at which hosts on a computer network transmit data over a digital channel. For example, a transmission rate of 16 bits/s indicates that 16 bits of data are transmitted per second.
In this embodiment, the obtaining the transmission rate when the voice information to be transmitted is transmitted includes: and acquiring the sending rate of the first terminal or the receiving rate of the second terminal when the first terminal transmits the voice information to be transmitted to the second terminal.
For example, when voice transmission is performed through communication software, a transmission rate of a calling party is acquired, and the transmission rate reflects a data transmission rate when the calling party transmits voice information to a base station/server; alternatively, when voice transmission is performed by communication such as software, the transmission rate of the called party is acquired, which reflects the data transmission rate at which the called party receives voice information.
A determining module 230, configured to determine whether the transmission rate is lower than a preset transmission rate.
In this embodiment, it is determined whether the transmission rate is lower than the preset transmission rate, so as to determine whether both communication parties are in a poor network environment when performing voice transmission, and whether the call quality is affected.
The specific value of the preset transmission rate can be preset according to the requirement.
Optionally, the preset transmission rate is 8kbit/s.
And the recognition module 240 is configured to perform voice recognition on the voice information to be transmitted if the transmission rate is lower than the preset transmission rate, so as to obtain a voice recognition result, where the voice recognition result includes text information corresponding to the voice information to be transmitted.
In this embodiment, the voice recognition refers to converting a voice signal into corresponding text information.
Specifically, voice recognition is performed on voice information to be transmitted through a voice recognition technology.
Optionally, in another embodiment of the present invention, the identifying module 240 performs voice recognition on the voice information to be transmitted includes:
extracting the characteristics of the voice information to be transmitted to obtain a characteristic vector representing the voice information to be transmitted;
inputting the feature vector into a preset acoustic model to obtain phoneme information corresponding to the feature vector;
inputting the phoneme information into a preset language model to obtain elements contained in the phoneme information, wherein the elements comprise word sequences consisting of words or words;
and decoding the word sequence based on a preset dictionary to obtain the text information corresponding to the voice information to be transmitted.
The preset acoustic model and the preset language model can be selected according to requirements.
The encoding module 250 is configured to perform speech encoding on text information included in the speech recognition result to obtain target speech information.
In this embodiment, the text information included in the speech recognition result is coded by speech, which is different from the traditional method of coding the voice sample, so that the data size during transmission can be greatly reduced.
The traditional coding mode is to sample and code the frequency and amplitude of sound, and the data volume transmitted during the traditional coding is calculated as follows:
data volume (byte/second) =sample rate (Hz) ×sample size (bit) ×number of channels/8
Taking a sampling rate of 16K mono as an example, 1s of sound data has a size of 16000×16×1/8=32 Kb
In this embodiment, the voice data transmitted every second after the target voice information is encoded is: the data amount (byte/second) =number of words per second(s) corresponds to the size of the word code (bit), where the number of words per second is the number of words per second in the voice information recognized by the voice, different numbers of words (e.g. kanji) have the corresponding size of the word code, and the corresponding size of the word code can be determined according to the corresponding relationship between the preset word and the size of the word code.
Taking the case that the voice information to be transmitted is Chinese as an example, the number of Chinese characters which can be spoken by a general person per second is less than 10, the Chinese characters are coded into 2 characters/Chinese character, and the data size of 1s is 10 x 2=20bit, so that the data size transmitted per second is greatly reduced when the voice information is transmitted in the embodiment.
Optionally, in another embodiment of the present invention, the speech recognition result further includes a speech feature of the speech information to be transmitted, where the speech feature includes a pitch frequency;
The encoding module 250 performs speech encoding on text information included in the speech recognition result, including:
and carrying out voice coding on the text information corresponding to the voice information to be transmitted and the voice characteristics of the voice information to be transmitted.
Speech features refer to information reflecting speech features. Such as the intensity, loudness, or pitch of speech.
When a person produces a voiced sound, the airflow passes through the glottis to enable the vocal cords to produce relaxation oscillation vibration, a quasi-periodic pulse airflow is generated, and the airflow excites the vocal tract to produce a voiced sound, which is also called voiced sound and carries most of energy in the voice. The frequency of such vocal cord vibration becomes the pitch frequency.
The pitch frequency is related to the length, thickness, toughness, stiffness, pronunciation habit, etc. of the vocal cords, and can reflect the characteristics of the individual to a great extent. Therefore, in the embodiment, the pitch frequency is combined for encoding, so that the characteristics of sound can be reserved to the greatest extent while the accurate transmission of the content is ensured.
In this embodiment, the pitch frequency of the speech information may be obtained by a cepstrum method.
In this embodiment, the voice encoding of the text information corresponding to the voice information to be transmitted and the voice feature of the voice information to be transmitted is to encode the text information in combination with the voice feature, which is different from the conventional encoding of the voice sample, and can greatly reduce the data volume during transmission.
In this embodiment, the voice data transmitted every second after the target voice information is encoded is: the data amount (byte/second) =number of words per second (number of words per second) transmitted during speech coding corresponds to the word code size (bit) +speech feature (according to the extracted speech feature, for example, 10 bit/s), wherein the number of words per second is the number of words per second in the speech information recognized by speech, different numbers of words (for example, chinese characters) have corresponding word code sizes, and the corresponding word code sizes can be determined according to the corresponding relation between preset words and word code sizes.
Taking the case that the voice information to be transmitted is Chinese as an example, the number of Chinese characters which can be spoken by a general person per second is less than 10, the Chinese characters are coded into 2 characters/Chinese character, and the data size of 1s is 10 x 2+10=30bit, so that the data size transmitted per second is greatly reduced when the voice information is transmitted in the embodiment.
The first transmission module 260 is configured to transmit the target voice information to the second terminal.
In an alternative embodiment, after the second terminal receives the target voice information, the target voice information is decoded, that is, the received text information (or text information and voice features) is restored to voice.
In an alternative embodiment, if there is no content in the recovered speech, white noise is used for filling. Wherein white noise is a segment of sound, and specifically white noise is noise in which the power spectral density is uniformly distributed throughout the frequency domain.
By filling the restored voice with white noise, misoperation (such as exit) caused by voice interruption when the user listens to the restored voice through the second terminal and does not hear the voice can be avoided.
Through the embodiment, although the characteristics such as audio frequency and volume are lost in the encoding process, under the condition of extremely poor network, the voice content can be reserved greatly, and the conditions that voice is intermittent, the voice content is lost and even the voice cannot be communicated during voice communication are avoided.
In another embodiment of the present invention, the apparatus further comprises:
and the reminding module is used for sending a suggestion message for enhancing the network signal strength to the first terminal or the second terminal or sending a reminding message for voice transmission to the second terminal if the transmission rate is lower than the preset transmission rate.
In this embodiment, when the transmission rate is lower than the preset transmission rate, the suggestion message for enhancing the network signal strength may be specifically a suggestion that how to make the first terminal or the second terminal enhance the network signal, so that the transmission rate is better when the voice information to be transmitted is transmitted between the first terminal and the second terminal, and the quality of the voice call is further improved.
Optionally, the advice message includes a recommended connection network or a recommended moving route.
In this embodiment, the recommended connection network is another connectable network recommended to the first terminal or the second terminal. The recommended mobile location refers to a location where the first terminal or the second terminal is moved to enhance the network signal of the first terminal or the second terminal.
Further, in another embodiment of the present invention, the recommended connection network may be obtained through a recommendation module, where the recommendation module is configured to:
acquiring connectable networks around a first terminal or a second terminal, wherein a network with network signal strength larger than the network signal strength threshold in the connectable networks is acquired as a recommended connection network; or alternatively
Acquiring connectable networks around a first terminal or a second terminal, and acquiring a network with the strongest network signal strength in the connectable networks as a recommended connection network; or alternatively
Acquiring connectable networks around a first terminal or a second terminal, acquiring a safety network in the connectable networks, and acquiring a network with network signal strength larger than the network signal strength threshold in the safety network as a recommended network; or alternatively
And acquiring connectable networks around the first terminal or the second terminal, acquiring a history connection network in the connectable networks, and acquiring a network with network signal strength larger than the network signal strength threshold in the history network as a recommended network.
The history connection network refers to a network to which the first terminal or the second terminal is connected.
The network with the network signal strength larger than the network signal strength threshold in the safety network is obtained as the recommended network, so that the safety network can be obtained, the first terminal or the second terminal is connected to the safety network, and the network safety problem is avoided.
Further, in another embodiment of the present invention, the recommended moving route may be further obtained through a recommendation module, where the recommendation module is further configured to:
acquiring a first position where a first terminal or a second terminal is located and available connection networks around the first terminal or the second terminal;
acquiring a second position of the available connection network;
and taking the first position as a starting position, taking the second position as a termination position, and acquiring a moving route between the starting position and the termination position as the recommended moving route.
The closer the first terminal and the second terminal are to the connectable network, the better the network signal strength can be obtained. For example, the closer to the router, the better the network signal strength can be obtained.
In this embodiment, the recommended mobile route is obtained, which may be beneficial for the first terminal or the second terminal to move, so that the first terminal or the second terminal has better network signal strength, which is beneficial for higher transmission rate when the voice information to be transmitted is transmitted between the first terminal and the second terminal, and further improves the quality of the voice call.
Optionally, in another embodiment of the present invention, the apparatus further includes a second transmission module, where the second transmission module is configured to:
judging whether the transmission rate is lower than a preset transmission rate or not, and judging whether the transmission rate is lower than a first transmission rate or not if the transmission rate is higher than the preset transmission rate;
if the transmission rate is lower than the first transmission rate, encoding and transmitting the voice information to be transmitted through a GIA encoding standard;
if the transmission rate is higher than the first transmission rate, judging whether the transmission rate is lower than a second transmission rate;
if the transmission rate is lower than the second transmission rate, coding and transmitting the voice information to be transmitted through a GSM coding standard;
if the transmission rate is higher than the two transmission rates, judging whether the transmission rate is lower than a third transmission rate;
if the transmission rate is lower than the three transmission rates, encoding and transmitting the voice information to be transmitted by using a G.728 encoding standard;
if the transmission rate is higher than the third transmission rate, judging whether the transmission rate is lower than a fourth transmission rate;
if the transmission rate is lower than the fourth transmission rate, encoding and transmitting the voice information to be transmitted through a G.721 encoding standard;
If the transmission rate is higher than the fourth transmission rate, judging whether the transmission rate is lower than a fifth transmission rate;
if the transmission rate is lower than the fifth transmission rate, encoding and transmitting the voice information to be transmitted through a G.722 encoding standard;
and if the transmission rate is higher than the fifth transmission rate, encoding and transmitting the voice information to be transmitted by using an MPE encoding standard.
The coding is a process of representing information by codes, in the digital coding process, the frequency value of sound at a certain point and the energy value of the frequency are extracted and are digitally quantized, any digital audio coding scheme is lossy relative to a natural signal, the current highest-fidelity coding mode is PCM coding, the original sound can be infinitely approximate to the PCM coding, but the PCM is huge in size and unfavorable for transmission, so that in the audio transmission process, other forms of coding can be carried out on the audio to compress the audio, and the transmission smoothness is improved.
In this embodiment, different coding algorithms are used to code the speech information based on different coding standards.
For example, coding based on the G.722 coding standard is realized by SB-ADPCM algorithm, coding based on the G.721 coding standard is realized by ADPCM algorithm, coding based on the G.728 coding standard is realized by LD-CELP algorithm, coding based on the GSM coding standard is realized by RPE-LTP algorithm, and coding based on the GIA coding standard is realized by VSELPC algorithm.
In this embodiment, under different transmission rate conditions, different coding standards are adopted to perform coding, so that in the transmission process with different transmission rates, more comprehensive voice information can be reserved as much as possible, and the quality of sound is improved.
Optionally, the first transmission rate is 13.2kbt/s, the second transmission rate is 16kbt/s, the third transmission rate is 32kbt/s, the fourth transmission rate is 64kbt/s, and the fifth transmission rate is 128kbt/s.
The invention provides a voice transmission device, which is characterized in that a receiving module is used for receiving a voice call transmission instruction sent by a first terminal, and acquiring voice information to be transmitted and a second terminal for receiving the voice information to be transmitted according to the voice call transmission instruction; the acquisition module acquires the transmission rate when the voice information to be transmitted is transmitted; the judging module judges whether the transmission rate is lower than a preset transmission rate or not; if the transmission rate is lower than the preset transmission rate, the recognition module carries out voice recognition on the voice information to be transmitted to obtain a voice recognition result, wherein the voice recognition result comprises text information corresponding to the voice information to be transmitted; the encoding module carries out voice encoding on the text information contained in the voice recognition result to obtain target voice information; and the first transmission module transmits the target voice information to the second terminal. When the transmission rate is lower than the preset transmission rate, the text information contained in the voice recognition result is encoded, so that the voice content of the voice information to be transmitted is reserved, the information encoded during voice encoding is reduced, smooth conversation is facilitated during voice conversation, the purpose of improving the quality of the voice conversation is achieved, and the occurrence of blocking or conversation interruption during voice conversation is avoided.
The integrated units implemented in the form of software functional modules described above may be stored in a computer readable storage medium. The software functional modules described above are stored in a storage medium and include instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to perform some of the steps of the methods described in the various embodiments of the invention.
Fig. 3 is a schematic structural diagram of a computer device according to a preferred embodiment of the present invention for implementing a voice transmission method. The computer means comprise at least one transmitting means 31, at least one memory 32, at least one processor 33, at least one receiving means 34 and at least one communication bus. Wherein the communication bus is used to enable connection communication between these components.
The computer apparatus is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and its hardware includes, but is not limited to, a microprocessor, an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a programmable gate array (Field-Programmable Gate Array, FPGA), a digital processor (Digital Signal Processor, DSP), an embedded device, and the like. The computer apparatus may also include a network device and/or a user device. Wherein the network device includes, but is not limited to, a single network server, a server group consisting of a plurality of network servers, or a Cloud based Cloud Computing (Cloud Computing) consisting of a large number of hosts or network servers, wherein Cloud Computing is one of distributed Computing, and is a super virtual computer consisting of a group of loosely coupled computers.
The computer device may be, but is not limited to, any electronic product that can perform man-machine interaction with a user through a keyboard, a touch pad, or a voice control device, for example, a terminal such as a tablet computer, a smart phone, or a monitoring device.
The network in which the computer device is located includes, but is not limited to, the internet, a wide area network, a metropolitan area network, a local area network, a virtual private network (Virtual Private Network, VPN), and the like.
The receiving means 34 and the transmitting means 31 may be wired transmission ports, or may be wireless devices, for example, including antenna means for performing data communication with other devices.
The memory 32 is used for storing program codes. The Memory 32 may be a circuit with a Memory function, such as a RAM (Random-Access Memory), a FIFO (First In First Out, first-in first-out Memory), etc., which is not in physical form in the integrated circuit. Alternatively, the memory 32 may be a physical memory, such as a memory bank, a TF Card (Trans-flash Card), a smart media Card (smart media Card), a secure digital Card (secure digital Card), a flash memory Card (flash Card), or the like.
The processor 33 may comprise one or more microprocessors, digital processors. The processor 33 may call program code stored in the memory 32 to perform the relevant functions. For example, the various modules depicted in fig. 2 are program code stored in the memory 32 and executed by the processor 33 to implement a voice transmission method. The processor 33 is also called a central processing Unit (CPU, central Processing Unit), which is a very large scale integrated circuit and is an operation Core (Core) and a Control Unit (Control Unit).
In the several embodiments provided in the present invention, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical function division, and there may be other manners of division when actually implemented.
The modules described as separate components may or may not be physically separate, and components shown as modules may or may not be physical units, may be located in one place, or may be distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional module in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units can be realized in a form of hardware or a form of hardware and a form of software functional modules.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned. Furthermore, it is evident that the word "comprising" does not exclude other elements or steps, and that the singular does not exclude a plurality. A plurality of units or means recited in the system claims can also be implemented by means of software or hardware by means of one unit or means. The terms second, etc. are used to denote a name, but not any particular order.
Finally, it should be noted that the above-mentioned embodiments are merely for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made to the technical solution of the present invention without departing from the spirit and scope of the technical solution of the present invention.

Claims (9)

1. A method of voice transmission, the method comprising:
receiving a voice call transmission instruction sent by a first terminal, and acquiring voice information to be transmitted and a second terminal for receiving the voice information to be transmitted according to the voice call transmission instruction;
acquiring the transmission rate when the voice information to be transmitted is transmitted;
judging whether the transmission rate is lower than a preset transmission rate or not;
if the transmission rate is lower than the preset transmission rate, performing voice recognition on the voice information to be transmitted to obtain a voice recognition result, wherein the voice recognition result comprises text information corresponding to the voice information to be transmitted and voice characteristics of the voice information to be transmitted, and the voice characteristics comprise pitch frequency;
performing speech coding on the text information contained in the speech recognition result to obtain target speech information, wherein the method comprises the following steps: performing voice coding on the text information corresponding to the voice information to be transmitted and the fundamental tone frequency to obtain the target voice information, wherein the data volume transmitted during voice coding is determined based on the number of words per second, the size of the characters and the voice characteristics;
Transmitting the target voice information to the second terminal;
the method further comprises the steps of: determining a recommended moving route, including acquiring a first position where the first terminal or the second terminal is located, and available connection networks around the first terminal or the second terminal; acquiring a second position of the available connection network; taking the first position as a starting position, taking the second position as a termination position, and acquiring a moving route between the starting position and the termination position as the recommended moving route;
and if the transmission rate is lower than the preset transmission rate, sending the recommended moving route to the first terminal or the second terminal.
2. The method of claim 1, wherein after said determining whether said transmission rate is lower than a preset transmission rate, said method further comprises:
if the transmission rate is higher than the preset transmission rate, judging whether the transmission rate is lower than a first transmission rate or not;
if the transmission rate is lower than the first transmission rate, encoding and transmitting the voice information to be transmitted through a GIA encoding standard;
if the transmission rate is higher than the first transmission rate, judging whether the transmission rate is lower than a second transmission rate;
If the transmission rate is lower than the second transmission rate, coding and transmitting the voice information to be transmitted through a GSM coding standard;
if the transmission rate is higher than the second transmission rate, judging whether the transmission rate is lower than a third transmission rate;
if the transmission rate is lower than the third transmission rate, encoding and transmitting the voice information to be transmitted through a G.728 encoding standard;
if the transmission rate is higher than the third transmission rate, judging whether the transmission rate is lower than a fourth transmission rate;
if the transmission rate is lower than the fourth transmission rate, encoding and transmitting the voice information to be transmitted through a G.721 encoding standard;
if the transmission rate is higher than the fourth transmission rate, judging whether the transmission rate is lower than a fifth transmission rate;
if the transmission rate is lower than the fifth transmission rate, encoding and transmitting the voice information to be transmitted through a G.722 encoding standard;
and if the transmission rate is higher than the fifth transmission rate, encoding and transmitting the voice information to be transmitted by using an MPE encoding standard.
3. The method of claim 2, wherein the predetermined transmission rate is 8kbit/s, the first transmission rate is 13.2kbit/s, the second transmission rate is 16kbit/s, the third transmission rate is 32kbit/s, the fourth transmission rate is 64kbit/s, and the fifth transmission rate is 128kbit/s.
4. A method according to any one of claims 1 to 3, wherein said speech recognition of said speech information to be transmitted comprises:
extracting the characteristics of the voice information to be transmitted to obtain a characteristic vector representing the voice information to be transmitted;
inputting the feature vector into a preset acoustic model to obtain phoneme information corresponding to the feature vector;
inputting the phoneme information into a preset language model to obtain elements contained in the phoneme information, wherein the elements comprise word sequences consisting of words or words;
and decoding the word sequence based on a preset dictionary to obtain the text information corresponding to the voice information to be transmitted.
5. A method according to any one of claims 1 to 3, wherein the method further comprises:
and if the transmission rate is lower than the preset transmission rate, sending a suggestion message for enhancing the network signal strength to the first terminal or the second terminal, or sending a reminding message for voice transmission to the second terminal.
6. The method of claim 5, wherein the advice message comprises a recommended connection network.
7. A voice transmission device, the device comprising:
The receiving module is used for receiving a voice call transmission instruction sent by the first terminal, acquiring voice information to be transmitted according to the voice call transmission instruction and receiving a second terminal for receiving the voice information to be transmitted;
the acquisition module is used for acquiring the transmission rate when the voice information to be transmitted is transmitted;
the judging module is used for judging whether the transmission rate is lower than a preset transmission rate or not;
the recognition module is used for carrying out voice recognition on the voice information to be transmitted if the transmission rate is lower than the preset transmission rate, so as to obtain a voice recognition result, wherein the voice recognition result comprises text information corresponding to the voice information to be transmitted and voice characteristics of the voice information to be transmitted, and the voice characteristics comprise pitch frequency;
the encoding module is used for performing voice encoding on the text information contained in the voice recognition result to obtain target voice information, and comprises the following steps: performing voice coding on the text information corresponding to the voice information to be transmitted and the fundamental tone frequency to obtain the target voice information, wherein the data volume transmitted during voice coding is determined based on the number of words per second, the size of the characters and the voice characteristics;
The first transmission module is used for transmitting the target voice information to the second terminal;
the recommendation module is used for determining a recommended moving route and comprises the steps of acquiring a first position of the first terminal or the second terminal and available connection networks around the first terminal or the second terminal; acquiring a second position of the available connection network; taking the first position as a starting position, taking the second position as a termination position, and acquiring a moving route between the starting position and the termination position as the recommended moving route;
and the recommending module is used for sending the recommended moving route to the first terminal or the second terminal if the transmission rate is lower than the preset transmission rate.
8. A computer device comprising a memory for storing at least one instruction and a processor for executing the at least one instruction to implement the voice transmission method of any one of claims 1 to 6.
9. A computer-readable storage medium having stored thereon computer instructions, characterized by: the computer instructions, when executed by a processor, implement the voice transmission method of any one of claims 1 to 6.
CN201910459488.7A 2019-05-29 2019-05-29 Voice transmission method, voice transmission device, computer device and storage medium Active CN110364170B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910459488.7A CN110364170B (en) 2019-05-29 2019-05-29 Voice transmission method, voice transmission device, computer device and storage medium
PCT/CN2019/118022 WO2020238058A1 (en) 2019-05-29 2019-11-13 Voice transmission method and apparatus, computer device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910459488.7A CN110364170B (en) 2019-05-29 2019-05-29 Voice transmission method, voice transmission device, computer device and storage medium

Publications (2)

Publication Number Publication Date
CN110364170A CN110364170A (en) 2019-10-22
CN110364170B true CN110364170B (en) 2024-01-30

Family

ID=68215394

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910459488.7A Active CN110364170B (en) 2019-05-29 2019-05-29 Voice transmission method, voice transmission device, computer device and storage medium

Country Status (2)

Country Link
CN (1) CN110364170B (en)
WO (1) WO2020238058A1 (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110364170B (en) * 2019-05-29 2024-01-30 平安科技(深圳)有限公司 Voice transmission method, voice transmission device, computer device and storage medium
CN111199747A (en) * 2020-03-05 2020-05-26 北京花兰德科技咨询服务有限公司 Artificial intelligence communication system and communication method
CN111245868B (en) * 2020-03-10 2021-04-13 诺领科技(南京)有限公司 Narrowband Internet of things voice message communication method and system
CN111785293B (en) * 2020-06-04 2023-04-25 杭州海康威视系统技术有限公司 Voice transmission method, device and equipment and storage medium
CN112202803A (en) * 2020-10-10 2021-01-08 北京字节跳动网络技术有限公司 Audio processing method, device, terminal and storage medium
CN112822297A (en) * 2021-04-01 2021-05-18 深圳市顺易通信息科技有限公司 Parking lot service data transmission method and related equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH08116385A (en) * 1994-10-14 1996-05-07 Hitachi Ltd Individual information terminal equipment and voice response system
CN103714823A (en) * 2013-12-19 2014-04-09 同济大学 Integrated speech coding-based adaptive underwater communication method
WO2016119560A1 (en) * 2015-01-29 2016-08-04 中国移动通信集团公司 Self-adaptive audio transmission method and device
CN106850615A (en) * 2017-01-24 2017-06-13 华为技术有限公司 A kind of method of code rate control, relevant apparatus and system
CN107066477A (en) * 2016-12-13 2017-08-18 合网络技术(北京)有限公司 A kind of method and device of intelligent recommendation video
CN107770387A (en) * 2017-10-31 2018-03-06 珠海市魅族科技有限公司 Communication control method, device, computer installation and computer-readable recording medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080162150A1 (en) * 2006-12-28 2008-07-03 Vianix Delaware, Llc System and Method for a High Performance Audio Codec
CN102790997B (en) * 2011-05-19 2017-05-10 中兴通讯股份有限公司 Method and device for transmission of adaptive multi-rate (AMR) voice data
CN109712631B (en) * 2019-03-28 2019-06-28 南昌黑鲨科技有限公司 Audio data transfer control method, device, system and readable storage medium storing program for executing
CN110364170B (en) * 2019-05-29 2024-01-30 平安科技(深圳)有限公司 Voice transmission method, voice transmission device, computer device and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH08116385A (en) * 1994-10-14 1996-05-07 Hitachi Ltd Individual information terminal equipment and voice response system
CN103714823A (en) * 2013-12-19 2014-04-09 同济大学 Integrated speech coding-based adaptive underwater communication method
WO2016119560A1 (en) * 2015-01-29 2016-08-04 中国移动通信集团公司 Self-adaptive audio transmission method and device
CN107066477A (en) * 2016-12-13 2017-08-18 合网络技术(北京)有限公司 A kind of method and device of intelligent recommendation video
CN106850615A (en) * 2017-01-24 2017-06-13 华为技术有限公司 A kind of method of code rate control, relevant apparatus and system
CN107770387A (en) * 2017-10-31 2018-03-06 珠海市魅族科技有限公司 Communication control method, device, computer installation and computer-readable recording medium

Also Published As

Publication number Publication date
WO2020238058A1 (en) 2020-12-03
CN110364170A (en) 2019-10-22

Similar Documents

Publication Publication Date Title
CN110364170B (en) Voice transmission method, voice transmission device, computer device and storage medium
US6208959B1 (en) Mapping of digital data symbols onto one or more formant frequencies for transmission over a coded voice channel
CN108197572B (en) Lip language identification method and mobile terminal
CN111667814A (en) Multi-language voice synthesis method and device
CN110288972B (en) Speech synthesis model training method, speech synthesis method and device
US6681208B2 (en) Text-to-speech native coding in a communication system
CN101304391A (en) Voice call method and system based on instant communication system
EP4002731A1 (en) Voice processing method and apparatus, computer-readable storage medium and computer device
CN111435592B (en) Voice recognition method and device and terminal equipment
KR101279857B1 (en) Adaptive multi rate codec mode decoding method and apparatus thereof
CN106504742A (en) The transmission method of synthesis voice, cloud server and terminal device
CN112071300B (en) Voice conversation method, device, computer equipment and storage medium
CN111104506A (en) Method and device for determining reply result of human-computer interaction and electronic equipment
CN110740212A (en) Call answering method and device based on intelligent voice technology and electronic equipment
CN112712793A (en) ASR (error correction) method based on pre-training model under voice interaction and related equipment
CN114999442A (en) Self-adaptive character-to-speech method based on meta learning and related equipment thereof
JP4437011B2 (en) Speech encoding device
CN111611352A (en) Dialog generation method and device, electronic equipment and readable storage medium
CN113259063B (en) Data processing method, data processing device, computer equipment and computer readable storage medium
KR100462042B1 (en) Method and Device for providing menu in mobile terminal
CN113114417B (en) Audio transmission method and device, electronic equipment and storage medium
CN109215670B (en) Audio data transmission method and device, computer equipment and storage medium
CN117037793A (en) Full duplex-based dialogue method, device and storage medium
CN117292697A (en) Voice data compression method and device, electronic equipment and readable storage medium
CN115223566A (en) Voice transmission method, system and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant