CN102710539A - Method and device for transferring voice messages - Google Patents
Method and device for transferring voice messages Download PDFInfo
- Publication number
- CN102710539A CN102710539A CN2012101335145A CN201210133514A CN102710539A CN 102710539 A CN102710539 A CN 102710539A CN 2012101335145 A CN2012101335145 A CN 2012101335145A CN 201210133514 A CN201210133514 A CN 201210133514A CN 102710539 A CN102710539 A CN 102710539A
- Authority
- CN
- China
- Prior art keywords
- voice
- information
- text information
- text
- voice recognition
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 40
- 238000006243 chemical reaction Methods 0.000 claims description 8
- 230000007613 environmental effect Effects 0.000 claims description 4
- 230000003213 activating effect Effects 0.000 claims 1
- 238000004891 communication Methods 0.000 abstract description 9
- 230000005540 biological transmission Effects 0.000 description 17
- 238000010586 diagram Methods 0.000 description 7
- 238000001514 detection method Methods 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 5
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000004806 packaging method and process Methods 0.000 description 2
- 238000007792 addition Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Landscapes
- Telephonic Communication Services (AREA)
Abstract
The invention discloses a method and a device for transferring voice messages. The method includes: staring a voice recognition module when quality of voice communication is lowered; performing voice recognition through the voice recognition module by a terminal to voice signals collected by a local voice input device, and generating corresponding text messages to send to an opposite terminal; or sending the voice signals to a voice recognition cloud end through the voice recognition module by the terminal, and obtaining corresponding text messages from the voice recognition cloud end to send to the opposite terminal. By the method and the device, effectiveness and timeliness of voice message transferring can be improved, and quality of user experience is enhanced.
Description
Technical Field
The present invention relates to the field of communications, and in particular, to a method and apparatus for transmitting voice information.
Background
In the prior art, an instant messaging technology is a basic technology of the internet, and currently, common instant messaging software generally integrates a plurality of real-time communication modes such as texts, voices and videos so as to meet diversified communication requirements of users.
For two-way real-time communication, high quality voice calls are more demanding on the network and terminal devices than text-based. On one hand, the packet loss, delay and jitter of the network can seriously affect the call quality, and in addition, the microphone, the earphone, the loudspeaker and the noise environment of the terminal can also affect the call quality. Therefore, how to improve the voice call quality in the instant messaging system under the complex network and terminal environment is a problem to be solved.
Disclosure of Invention
The invention provides a method and a device for transmitting voice information, which aim to solve the problem of low voice call quality of an instant messaging system in the prior art.
The invention provides a voice information transmission method, which comprises the following steps:
starting a voice recognition module under the condition that the voice call quality is determined to be reduced;
the terminal carries out voice recognition on voice signals collected by local voice input equipment through a voice recognition module, generates corresponding text information and sends the text information to an opposite terminal; or the terminal sends the voice signal to the voice recognition cloud end through the voice recognition module, acquires corresponding text information from the voice recognition cloud end and sends the text information to the opposite end.
The invention also provides a voice information transmission device, comprising:
the starting module is used for starting the voice recognition module under the condition that the voice call quality is determined to be reduced;
the voice recognition module is used for carrying out voice recognition on voice signals collected by the local voice input equipment, generating corresponding text information and sending the text information to the opposite terminal; or sending the voice signal to a voice recognition cloud end, and acquiring corresponding text information from the voice recognition cloud end and sending the text information to the opposite end.
The invention has the following beneficial effects:
when the network or terminal environment can not ensure good voice call quality, the voice recognition technology is utilized to convert the voice into corresponding text information for transmission, the problem of low voice call quality of the instant messaging system in the prior art is solved, the effectiveness and timeliness of voice information transmission can be improved, and the user experience quality is improved.
Drawings
FIG. 1 is a flow chart of a method of voice messaging in accordance with an embodiment of the present invention;
fig. 2 is a detailed process flow diagram of a method of transmitting voice information according to an embodiment of the present invention;
fig. 3 is a schematic diagram of a transmitting end and a receiving end according to an embodiment of the present invention;
FIG. 4 is a flow chart of example 1 of an embodiment of the present invention;
FIG. 5 is a flow chart of example 2 of an embodiment of the present invention;
FIG. 6 is a schematic diagram of a scenario for example 3 of an embodiment of the present invention;
FIG. 7 is a flow chart of example 3 of an embodiment of the present invention;
fig. 8 is a schematic structural diagram of a voice information transmitting apparatus according to an embodiment of the present invention.
Detailed Description
In order to solve the problem of low voice call quality of an instant messaging system in the prior art, the invention provides a voice information transmission method and a voice information transmission device, which can automatically meet basic communication requirements for voice call application in the instant messaging system no matter the quality of a network is reduced or a fault or a problem which is not beneficial to real-time voice communication occurs in a terminal environment, and greatly improve the experience quality of a user. The present invention will be described in further detail below with reference to the drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention.
Method embodiment
According to an embodiment of the present invention, there is provided a method for transmitting voice information, fig. 1 is a flowchart of the method for transmitting voice information according to an embodiment of the present invention, and as shown in fig. 1, the method for transmitting voice information according to an embodiment of the present invention includes the following processes:
step 101, starting a voice recognition module under the condition that the voice call quality is determined to be reduced;
step 101 specifically includes the following processing: under the condition that the terminal determines that the current network condition and/or the terminal environment of the opposite terminal cause the reduction of the voice call quality, automatically starting a voice recognition module; or, the voice recognition module is manually started according to the operation of the user.
In step 101, the terminal determining that the current network condition causes the voice call quality to be degraded specifically includes the following processing:
1. acquiring a network quality index carried in feedback information sent by an opposite terminal, wherein the network quality index carries information about whether a packet loss rate, network jitter and/or a delay value exceed a preset first threshold value; in practical applications, the first threshold may include a plurality of thresholds respectively corresponding to a packet loss rate, a network jitter, and a delay value.
2. If the network quality index carries information that the packet loss rate, the network jitter and/or the delay value exceed a preset first threshold value, determining that the current network condition causes the voice call quality to be reduced;
in step 101, the terminal determining that the voice call quality is reduced due to the terminal environment of the opposite terminal specifically includes the following processing:
1. acquiring feedback information sent by an opposite terminal, and determining that voice output equipment of the opposite terminal cannot work normally according to the feedback information, and determining that the voice call quality is reduced due to the terminal environment of the opposite terminal; or
2. And acquiring feedback information sent by the opposite terminal, and determining that the environmental noise value of the opposite terminal exceeds a preset second threshold according to the feedback information, so that the voice call quality is reduced due to the terminal environment of the opposite terminal. Specifically, the environmental noise value of the opposite end may be obtained by detecting a signal-to-noise ratio of the input voice signal and sending feedback information.
Preferably, before the voice recognition module is started, prompt information can be output to prompt a user to select whether to start the voice recognition module; and under the condition that the user selects no, forbidding to start the voice recognition module so as to save resources, and if the user selects yes, starting the voice recognition module.
102, the terminal performs voice recognition on a voice signal acquired by local voice input equipment through a voice recognition module, generates corresponding text information and sends the text information to an opposite terminal; or the terminal sends the voice signal to the voice recognition cloud end through the voice recognition module, acquires corresponding text information from the voice recognition cloud end and sends the text information to the opposite end.
Specifically, the speech recognition module can perform segmented speech recognition on the speech signal collected by the local speech input device.
In step 102, after generating the corresponding text information, time information corresponding to the text information may also be recorded, where the time information includes: start time, duration;
in step 102, sending the text message to the peer specifically includes: and sending the text information carrying the time information to an opposite terminal through an independent text channel or a voice stream channel, wherein the text information carries a voice recognition generation attribute.
After step 102 is performed, the peer needs to receive and present the text information.
Specifically, if the opposite terminal judges that the attribute of the text information is generated by voice recognition, the text information can be converted into voice information through a text-to-voice conversion module, and the converted voice information is played according to the time information; wherein, playing the converted voice information according to the time information specifically comprises: 1. judging whether the voice packet in the time period corresponding to the text information is still to be broadcasted or not according to the time information; 2. and under the condition that the voice packet is judged to be played, judging whether the packet loss rate of the voice packet is greater than a preset third threshold value, if so, replacing the voice packet with the converted voice information, playing the voice information, and if not, ending the operation.
The opposite end can also directly display the text information in a text mode.
It should be noted that, in the case that the opposite end is a forwarding device, the text information or the converted voice information is forwarded.
The above technical solutions of the embodiments of the present invention are described in detail below with reference to the accompanying drawings.
Fig. 2 is a detailed processing flowchart of the method for transmitting voice information according to the embodiment of the present invention, and as shown in fig. 2, the method includes the following steps:
step 201, judging whether the network quality can ensure the call quality, if not, executing step 204, otherwise, executing step 202;
step 202, judging whether the terminal environment of the opposite terminal can ensure the call quality, if not, executing step 204, otherwise, executing step 203;
step 203, judging whether the user selects to manually start the voice recognition module, if so, executing step 204, otherwise, ending the operation;
step 204, starting a voice recognition module;
step 205, performing voice recognition on a voice signal acquired by local voice input equipment to generate corresponding text information;
step 206, sending the text message to the opposite terminal;
step 207, the opposite end receives and displays the text information.
Fig. 3 is a schematic diagram of a transmitting end and a receiving end according to an embodiment of the present invention, and as shown in fig. 3, the transmitting end determines whether to perform voice recognition on collected voice information through network quality detection, terminal environment detection, and user setting detection, converts the voice information into text information and transmits the text information to an opposite end if voice recognition is required, and directly transmits the voice information (voice code) if voice recognition is not required. If the receiving end receives the text data, the text data can be directly displayed, and the text data can also be converted into the voice data to be played.
The above technical solutions of the embodiments of the present invention are described in detail below with reference to examples.
Example 1
The client A acquires the text information corresponding to the voice data segment through the voice recognition module under the condition that the current network condition is not good, and sends the text information to the client B, and the client B can show the text information to a user after receiving the text information and tries to convert the text information into voice for output. Fig. 4 is a flowchart of example 1 of the embodiment of the present invention, and as shown in fig. 4, includes the following processes:
step 401, the client a and the client B perform voice communication, the client B counts packet loss rate, if the packet loss rate is higher than a set threshold, the operation jumps to step 402, otherwise, the operation is ended.
Step 402, the client B sends feedback information to the client a.
In step 403, the client a receives and parses the feedback information, and starts a speech recognition module.
Step 404, the client a transmits the collected voice signal to the voice recognition module, and analyzes the voice signal to obtain corresponding text information.
In step 405, client a transmits the generated text information to client B in a package through a text transmission channel. The packed text information includes: text information itself, corresponding start time, duration, "speech recognition generated" attributes.
In step 406, the client B receives the text packet and parses out the text information, the start time, the duration, and the attribute value.
In step 407, the client B displays the text information in the text dialog window.
In step 408, if the text information attribute value is 'speech recognition generation', skipping 409 is performed, otherwise, the operation is ended.
Step 409, according to the starting time and duration of the text message, searching whether the received voice data packet in the corresponding time period is to be played, if not, skipping 410, otherwise, ending the operation.
And step 410, judging whether the packet loss rate of the voice data packet is greater than a set threshold value, if so, skipping 411, otherwise, ending the operation.
Step 411, discarding all voice data packets in the text information time period, and performing text-to-voice conversion on the text information and then replacing the text information.
Example 2
When the user of the client A hears the notification that the voice cannot be heard from the user of the opposite side, the voice recognition module is actively started, the text information corresponding to the voice data segment is obtained through the voice recognition module and is sent to the client B, and the text information can be displayed to the user after the text information is received by the client B. Fig. 5 is a flowchart of example 2 of an embodiment of the present invention, and as shown in fig. 5, includes the following processes:
step 501, when the speech between the client a and the client B starts to talk, and the user of the client B cannot hear the speech of the other party, the voice is sent to be "inaudible";
in step 502, if the user of the client a hears the voice of the user of the client B as "inaudible", skipping 503, otherwise, ending the operation.
In step 503, the user at client a selects to start the speech recognition function.
Step 504, the client a transmits the collected voice signal to the voice recognition module, and analyzes the voice signal to obtain corresponding text information, and the client a transmits the text information to the client B through a text transmission channel.
In step 505, the client B receives and parses the text packet.
In step 506, client B displays the text information in a text dialog window.
Example 3
Fig. 6 is a schematic view of a scenario of example 3 according to an embodiment of the present invention, and as shown in fig. 6, an Instant Messaging (IM) client a calls a fixed telephone C through a voice gateway server B and performs a voice call with the fixed telephone C. Under the condition that the current network condition is not good, the client A acquires text information corresponding to the voice data segment through the voice recognition module and sends the text information to the voice gateway server B, and the voice gateway server B tries to convert the text information into voice information after receiving the text information and forwards the voice information to the fixed telephone C. Fig. 7 is a flowchart of example 3 of the embodiment of the present invention, as shown in fig. 7, including the following processes:
and step 701, the client A and the fixed telephone C carry out voice call through a voice gateway server B, the voice gateway server B counts the packet receiving packet loss rate received from the client A, if the packet loss rate is higher than a set threshold value, the client A jumps to 702, otherwise, the operation is ended.
Step 702, the voice gateway server B sends feedback information to the client a.
Step 703, the client a receives and analyzes the feedback information, and starts the speech recognition module.
Step 704, the client a transmits the collected voice signal to the voice recognition module, and analyzes the voice signal to obtain corresponding text information.
Step 705, the client a packages the generated text information to the gateway B through the text transmission channel. The packaging of the text information comprises: the text information itself, the corresponding start time, and the duration.
Step 706, the voice gateway server B receives the text packet and parses out the text information, the start time, and the duration.
Step 707, the voice gateway server B searches the received voice data packet in the corresponding time period according to the starting time and the duration of the text message, if not, jumps to step 708, otherwise, ends the operation.
Step 708, determining whether the packet loss rate of the voice data packet is greater than a set threshold, if so, skipping 709, otherwise, ending the operation.
And 709, discarding all voice data packets in the time period corresponding to the text information, and replacing the text information after performing text-to-voice conversion on the text information. And forwarded to the fixed telephone C.
In summary, with the technical solution of the embodiments of the present invention, when the network or terminal environment cannot ensure good voice call quality, the voice recognition technology is used to convert the voice into the corresponding text information for transmission, so as to solve the problem of low voice call quality of the instant messaging system in the prior art, improve the effectiveness and timeliness of voice information transmission, and improve the quality of user experience.
Device embodiment
According to an embodiment of the present invention, there is provided a voice information transmitting apparatus, fig. 8 is a schematic structural diagram of the voice information transmitting apparatus according to the embodiment of the present invention, and as shown in fig. 8, the voice information transmitting apparatus according to the embodiment of the present invention includes: the starting module 80 and the speech recognition module 82 are described in detail below for the respective modules of the embodiment of the present invention.
A starting module 80, configured to start the voice recognition module 82 in a case where it is determined that the voice call quality is degraded;
the starting module 80 is specifically configured to: the voice recognition module 82 is automatically started under the condition that the terminal determines that the current network condition and/or the terminal environment of the opposite terminal cause the voice call quality to be reduced; or, the voice recognition module 82 is manually started according to the operation of the user;
the starting module 80 specifically includes: a network condition determining submodule and a terminal environment determining submodule, wherein:
the network condition determining submodule is used for acquiring a network quality index carried in feedback information sent by an opposite terminal, wherein the network quality index carries information about whether a packet loss rate, network jitter and/or a delay value exceed a preset first threshold value; in practical applications, the first threshold may include a plurality of thresholds respectively corresponding to a packet loss rate, a network jitter, and a delay value; if the network quality index carries information that the packet loss rate, the network jitter and/or the delay value exceed a preset first threshold value, determining that the current network condition causes the voice call quality to be reduced;
the terminal environment determining submodule is used for acquiring feedback information sent by the opposite terminal, and determining that the voice output equipment of the opposite terminal cannot work normally according to the feedback information, so that the terminal environment of the opposite terminal is determined to cause the voice call quality to be reduced; or obtaining feedback information sent by the opposite terminal, and determining that the environmental noise value of the opposite terminal exceeds a preset second threshold value according to the feedback information, and determining that the terminal environment of the opposite terminal causes the voice call quality to be reduced.
The voice recognition module 82 is used for performing voice recognition on voice signals acquired by local voice input equipment, generating corresponding text information and sending the text information to an opposite terminal; or sending the voice signal to a voice recognition cloud end, and acquiring corresponding text information from the voice recognition cloud end and sending the text information to the opposite end.
The speech recognition module 82 is specifically configured to: carrying out segmented voice recognition on voice signals collected by local voice input equipment;
the speech recognition module 82 is further configured to: recording time information corresponding to the text information, wherein the time information comprises: start time, duration; sending text information carrying time information to an opposite terminal through an independent text channel or a voice stream channel, wherein the text information carries a voice recognition generation attribute;
preferably, the above apparatus further comprises: the device comprises a prompt module, a display module and a forwarding module. Wherein,
a prompt module, configured to output a prompt message to prompt a user to select whether to start the speech recognition module 82 before the start module 80 starts the speech recognition module 82; in the case that the user chooses no, the voice recognition module 82 is prohibited from being started;
the display module is used for receiving and displaying the text information sent by the voice recognition module 82;
wherein, the display module specifically comprises:
the voice display sub-module is used for judging that the attribute of the text information is generated by voice recognition, converting the text information into voice information through the text-to-voice conversion module and playing the converted voice information according to the time information;
the text display sub-module is used for directly displaying text information in a text mode;
the voice presentation sub-module is specifically configured to: judging whether the voice packet in the time period corresponding to the text information is still to be broadcasted or not according to the time information; under the condition that the voice packet is judged to be played, judging whether the packet loss rate of the voice packet is greater than a preset third threshold value or not, if so, replacing the voice packet with the converted voice information, and playing the voice information;
and the forwarding module is used for forwarding the text information or the converted voice information under the condition that the opposite end is the forwarding device.
The above technical solutions of the embodiments of the present invention are described in detail below with reference to the accompanying drawings.
Fig. 2 is a detailed processing flowchart of the method for transmitting voice information according to the embodiment of the present invention, and as shown in fig. 2, the method includes the following steps:
step 201, the determining module 80 determines whether the network quality can ensure the call quality, if not, step 204 is executed, otherwise, step 202 is executed;
step 202, the determining module 80 determines whether the terminal environment of the opposite terminal can ensure the call quality, if not, step 204 is executed, otherwise, step 203 is executed;
step 203, the starting module 82 judges whether the user selects to manually start the voice recognition module, if so, step 204 is executed, otherwise, the operation is ended;
step 204, the starting module 82 starts a voice recognition module;
step 205, the voice recognition module 84 performs voice recognition on the voice signal collected by the local voice input device to generate corresponding text information;
step 206, the voice recognition module 84 sends the text message to the opposite terminal;
in step 207, the opposite-end display module 86 receives and displays the text message.
Fig. 3 is a schematic diagram of a transmitting end and a receiving end according to an embodiment of the present invention, and as shown in fig. 3, the transmitting end determines whether to perform voice recognition on collected voice information through network quality detection, terminal environment detection, and user setting detection, converts the voice information into text information and transmits the text information to an opposite end if voice recognition is required, and directly transmits the voice information (voice code) if voice recognition is not required. If the receiving end receives the text data, the text data can be directly displayed, and the text data can also be converted into the voice data to be played.
The above technical solutions of the embodiments of the present invention are described in detail below with reference to examples.
Example 1
The client A acquires the text information corresponding to the voice data segment through the voice recognition module under the condition that the current network condition is not good, and sends the text information to the client B, and the client B can show the text information to a user after receiving the text information and tries to convert the text information into voice for output. Fig. 4 is a flowchart of example 1 of the embodiment of the present invention, and as shown in fig. 4, includes the following processes:
step 401, the client a and the client B perform voice communication, the client B counts packet loss rate, if the packet loss rate is higher than a set threshold, the operation jumps to step 402, otherwise, the operation is ended.
Step 402, the client B sends feedback information to the client a.
In step 403, the client a receives and parses the feedback information, and starts a speech recognition module.
Step 404, the client a transmits the collected voice signal to the voice recognition module, and analyzes the voice signal to obtain corresponding text information.
In step 405, client a transmits the generated text information to client B in a package through a text transmission channel. The packed text information includes: text information itself, corresponding start time, duration, "speech recognition generated" attributes.
In step 406, the client B receives the text packet and parses out the text information, the start time, the duration, and the attribute value.
In step 407, the client B displays the text information in the text dialog window.
In step 408, if the text information attribute value is 'speech recognition generation', skipping 409 is performed, otherwise, the operation is ended.
Step 409, according to the starting time and duration of the text message, searching whether the received voice data packet in the corresponding time period is to be played, if not, skipping 410, otherwise, ending the operation.
And step 410, judging whether the packet loss rate of the voice data packet is greater than a set threshold value, if so, skipping 411, otherwise, ending the operation.
Step 411, discarding all voice data packets in the text information time period, and performing text-to-voice conversion on the text information and then replacing the text information.
Example 2
When the user of the client A hears the notification that the voice cannot be heard from the user of the opposite side, the voice recognition module is actively started, the text information corresponding to the voice data segment is obtained through the voice recognition module and is sent to the client B, and the text information can be displayed to the user after the text information is received by the client B. Fig. 5 is a flowchart of example 2 of an embodiment of the present invention, and as shown in fig. 5, includes the following processes:
step 501, when the speech between the client a and the client B starts to talk, and the user of the client B cannot hear the speech of the other party, the voice is sent to be "inaudible";
in step 502, if the user of the client a hears the voice of the user of the client B as "inaudible", skipping 503, otherwise, ending the operation.
In step 503, the user at client a selects to start the speech recognition function.
Step 504, the client a transmits the collected voice signal to the voice recognition module, and analyzes the voice signal to obtain corresponding text information, and the client a transmits the text information to the client B through a text transmission channel.
In step 505, the client B receives and parses the text packet.
In step 506, client B displays the text information in a text dialog window.
Example 3
Fig. 6 is a schematic view of a scenario of example 3 according to an embodiment of the present invention, and as shown in fig. 6, an Instant Messaging (IM) client a calls a fixed telephone C through a voice gateway server B and performs a voice call with the fixed telephone C. Under the condition that the current network condition is not good, the client A acquires text information corresponding to the voice data segment through the voice recognition module and sends the text information to the voice gateway server B, and the voice gateway server B tries to convert the text information into voice information after receiving the text information and forwards the voice information to the fixed telephone C. Fig. 7 is a flowchart of example 3 of the embodiment of the present invention, as shown in fig. 7, including the following processes:
and step 701, the client A and the fixed telephone C carry out voice call through a voice gateway server B, the voice gateway server B counts the packet receiving packet loss rate received from the client A, if the packet loss rate is higher than a set threshold value, the client A jumps to 702, otherwise, the operation is ended.
Step 702, the voice gateway server B sends feedback information to the client a.
Step 703, the client a receives and analyzes the feedback information, and starts the speech recognition module.
Step 704, the client a transmits the collected voice signal to the voice recognition module, and analyzes the voice signal to obtain corresponding text information.
Step 705, the client a packages the generated text information to the gateway B through the text transmission channel. The packaging of the text information comprises: the text information itself, the corresponding start time, and the duration.
Step 706, the voice gateway server B receives the text packet and parses out the text information, the start time, and the duration.
Step 707, the voice gateway server B searches the received voice data packet in the corresponding time period according to the starting time and the duration of the text message, if not, jumps to step 708, otherwise, ends the operation.
Step 708, determining whether the packet loss rate of the voice data packet is greater than a set threshold, if so, skipping 709, otherwise, ending the operation.
And 709, discarding all voice data packets in the time period corresponding to the text information, and replacing the text information after performing text-to-voice conversion on the text information. And forwarded to the fixed telephone C.
In summary, with the technical solution of the embodiments of the present invention, when the network or terminal environment cannot ensure good voice call quality, the voice recognition technology is used to convert the voice into the corresponding text information for transmission, so as to solve the problem of low voice call quality of the instant messaging system in the prior art, improve the effectiveness and timeliness of voice information transmission, and improve the quality of user experience.
Although the preferred embodiments of the present invention have been disclosed for illustrative purposes, those skilled in the art will appreciate that various modifications, additions and substitutions are possible, and the scope of the invention should not be limited to the embodiments described above.
Claims (10)
1. A method for transmitting voice information, comprising:
starting a voice recognition module under the condition that the voice call quality is determined to be reduced;
the terminal performs voice recognition on voice signals acquired by local voice input equipment through the voice recognition module, generates corresponding text information and sends the text information to the opposite terminal; or the terminal sends the voice signal to a voice recognition cloud end through the voice recognition module, acquires corresponding text information from the voice recognition cloud end and sends the text information to an opposite terminal.
2. The method of claim 1, wherein, in the event that a decrease in voice call quality is determined, activating the voice recognition module specifically comprises:
the method comprises the steps that under the condition that a terminal determines that the current network condition and/or the terminal environment of an opposite terminal cause reduction of voice call quality, a voice recognition module is automatically started; or
And manually starting the voice recognition module according to the operation of the user.
3. The method of claim 2,
the specific steps that the terminal determines that the voice call quality is reduced due to the current network condition include:
acquiring a network quality index carried in feedback information sent by the opposite terminal, wherein the network quality index carries information about whether a packet loss rate, network jitter and/or a delay value exceed a preset first threshold value;
if the network quality index carries information that the packet loss rate, the network jitter and/or the delay value exceed a preset first threshold value, determining that the current network condition causes the voice call quality to be reduced;
the specific steps that the terminal determines that the voice call quality is reduced due to the terminal environment of the opposite terminal include:
acquiring feedback information sent by the opposite terminal, and determining that the voice output equipment of the opposite terminal cannot work normally according to the feedback information, and determining that the voice call quality is reduced due to the terminal environment of the opposite terminal; or,
and acquiring feedback information sent by the opposite terminal, and determining that the environment noise value of the opposite terminal exceeds a preset second threshold value according to the feedback information, so that the voice call quality is reduced due to the terminal environment of the opposite terminal.
4. The method of claim 2,
before automatically starting the speech recognition module, the method further comprises:
outputting prompt information to prompt a user to select whether to start the voice recognition module;
under the condition that the user selects no, forbidding to start the voice recognition module;
after generating the corresponding text information, the method further comprises:
recording time information corresponding to the text information, wherein the time information comprises: start time, duration;
sending the text information to the opposite terminal specifically includes:
and sending the text information carrying the time information to the opposite terminal through an independent text channel or a voice stream channel, wherein the text information carries a voice recognition generation attribute.
5. The method of claim 4, wherein the method further comprises: the opposite terminal receives and displays the text information;
the receiving and displaying the text information by the opposite terminal specifically includes:
if the opposite terminal judges that the attribute of the text information is generated by the voice recognition, converting the text information into voice information through a text-to-voice conversion module, and playing the converted voice information according to the time information; or
And the opposite end directly displays the text information in a text mode.
6. The method of claim 5, wherein playing the converted voice information according to the time information specifically comprises:
judging whether the voice packet in the time period corresponding to the text information is still to be broadcasted or not according to the time information;
and under the condition that a voice packet is judged to be played, judging whether the packet loss rate of the voice packet is greater than a preset third threshold value, if so, replacing the voice packet with the converted voice information, and playing the voice information.
7. The method of claim 5, wherein the method further comprises:
and forwarding the text information or the converted voice information under the condition that the opposite terminal is forwarding equipment.
8. A voice information transmitting apparatus, comprising:
the starting module is used for starting the voice recognition module under the condition that the voice call quality is determined to be reduced;
the voice recognition module is used for carrying out voice recognition on voice signals collected by local voice input equipment, generating corresponding text information and sending the text information to an opposite terminal; or sending the voice signal to a voice recognition cloud end, and acquiring corresponding text information from the voice recognition cloud end and sending the text information to an opposite end.
9. The apparatus of claim 8,
the starting module is specifically configured to: the method comprises the steps that under the condition that a terminal determines that the current network condition and/or the terminal environment of an opposite terminal cause reduction of voice call quality, a voice recognition module is automatically started; or, the voice recognition module is manually started according to the operation of the user;
the starting module specifically comprises:
a network condition determining submodule, configured to obtain a network quality indicator carried in feedback information sent by the peer end, where the network quality indicator carries information about whether a packet loss rate, a network jitter, and/or a delay value exceeds a preset first threshold; if the network quality index carries information that the packet loss rate, the network jitter and/or the delay value exceed a preset first threshold value, determining that the current network condition causes the voice call quality to be reduced;
a terminal environment determining submodule, configured to acquire feedback information sent by the opposite terminal, and determine that voice output equipment of the opposite terminal cannot work normally according to the feedback information, and then determine that the terminal environment of the opposite terminal causes reduction in voice call quality; or obtaining feedback information sent by the opposite terminal, and determining that the environmental noise value of the opposite terminal exceeds a preset second threshold value according to the feedback information, and determining that the terminal environment of the opposite terminal causes the voice call quality to be reduced.
10. The apparatus of claim 9,
the speech recognition module is specifically configured to: carrying out segmented voice recognition on voice signals collected by the local voice input equipment;
the speech recognition module is further configured to: recording time information corresponding to the text information, wherein the time information comprises: start time, duration; sending the text information carrying the time information to the opposite terminal through an independent text channel or a voice stream channel, wherein the text information carries a voice recognition generation attribute;
the device further comprises: the prompting module is used for outputting prompting information before the starting module starts the voice recognition module and prompting a user to select whether to start the voice recognition module; under the condition that the user selects no, forbidding to start the voice recognition module;
the display module is used for receiving and displaying the text information sent by the voice recognition module;
the display module specifically comprises:
the voice display sub-module is used for judging that the attribute of the text information is generated by the voice recognition, converting the text information into voice information through a text-to-voice conversion module and playing the converted voice information according to the time information;
the text display sub-module is used for directly displaying the text information in a text mode;
the voice presentation sub-module is specifically configured to:
judging whether the voice packet in the time period corresponding to the text information is still to be broadcasted or not according to the time information; under the condition that a voice packet is judged to be played, judging whether the packet loss rate of the voice packet is greater than a preset third threshold value or not, if so, replacing the voice packet with the converted voice information, and playing the voice information;
the device further comprises: and the forwarding module is used for forwarding the text information or the converted voice information under the condition that the opposite end is forwarding equipment.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2012101335145A CN102710539A (en) | 2012-05-02 | 2012-05-02 | Method and device for transferring voice messages |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2012101335145A CN102710539A (en) | 2012-05-02 | 2012-05-02 | Method and device for transferring voice messages |
Publications (1)
Publication Number | Publication Date |
---|---|
CN102710539A true CN102710539A (en) | 2012-10-03 |
Family
ID=46903105
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2012101335145A Pending CN102710539A (en) | 2012-05-02 | 2012-05-02 | Method and device for transferring voice messages |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102710539A (en) |
Cited By (37)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2013182129A2 (en) * | 2013-03-22 | 2013-12-12 | 中兴通讯股份有限公司 | Cloud note implementation method and device |
WO2014067283A1 (en) * | 2012-11-01 | 2014-05-08 | 华为技术有限公司 | Network failure detecting method, node, and monitoring center |
CN104113471A (en) * | 2014-07-18 | 2014-10-22 | 广州三星通信技术研究有限公司 | Information processing method and device during abnormal communicating junctions |
CN104468479A (en) * | 2013-09-17 | 2015-03-25 | 北京三星通信技术研究有限公司 | Terminal communication method, device and system, and terminal |
CN104639728A (en) * | 2013-11-14 | 2015-05-20 | 阿尔卡特朗讯公司 | Method and equipment used for improving speed or video communication quality |
CN104700836A (en) * | 2013-12-10 | 2015-06-10 | 阿里巴巴集团控股有限公司 | Voice recognition method and voice recognition system |
CN105162944A (en) * | 2015-06-23 | 2015-12-16 | 上海斐讯数据通信技术有限公司 | Conversation system and conversation method |
CN105376513A (en) * | 2015-12-02 | 2016-03-02 | 小米科技有限责任公司 | Information transmission method and device |
CN105493425A (en) * | 2013-08-29 | 2016-04-13 | 统一有限责任两合公司 | Maintaining audio communication in a congested communication channel |
CN105723448A (en) * | 2014-01-21 | 2016-06-29 | 三星电子株式会社 | Electronic device and voice recognition method thereof |
CN105827504A (en) * | 2015-11-30 | 2016-08-03 | 维沃移动通信有限公司 | Voice information transmission method, mobile terminal and system |
US9419866B2 (en) | 2012-11-01 | 2016-08-16 | Huawei Technologies Co., Ltd. | Method, node, and monitoring center detecting network fault |
CN105933531A (en) * | 2013-11-15 | 2016-09-07 | 珠海市魅族科技有限公司 | Terminal communication method, terminal and server |
CN106340297A (en) * | 2016-09-21 | 2017-01-18 | 广东工业大学 | Speech recognition method and system based on cloud computing and confidence calculation |
CN106448665A (en) * | 2016-10-28 | 2017-02-22 | 努比亚技术有限公司 | Voice processing device and method |
CN106604243A (en) * | 2016-11-29 | 2017-04-26 | 珠海市魅族科技有限公司 | Interaction control method and device |
CN106653028A (en) * | 2016-11-17 | 2017-05-10 | 福建天泉教育科技有限公司 | Voice communication method and system |
CN106686190A (en) * | 2015-11-06 | 2017-05-17 | 北京奇虎科技有限公司 | Call content record method, client and server |
WO2017128991A1 (en) * | 2016-01-26 | 2017-08-03 | 阿里巴巴集团控股有限公司 | Instant communication method and instant communication system based on voice recognition |
CN107332871A (en) * | 2017-05-18 | 2017-11-07 | 百度在线网络技术(北京)有限公司 | Advertisement sending method and device |
CN107786686A (en) * | 2017-10-26 | 2018-03-09 | 王梅 | A kind of system and method for being used to export multi-medium data |
CN108173740A (en) * | 2017-11-30 | 2018-06-15 | 维沃移动通信有限公司 | A kind of method and apparatus of voice communication |
US10069965B2 (en) | 2013-08-29 | 2018-09-04 | Unify Gmbh & Co. Kg | Maintaining audio communication in a congested communication channel |
CN108831475A (en) * | 2018-05-24 | 2018-11-16 | 广州市千钧网络科技有限公司 | A kind of text message extracting method and system |
CN109151148A (en) * | 2018-10-15 | 2019-01-04 | Oppo广东移动通信有限公司 | Recording method, device, terminal and the computer readable storage medium of dialog context |
CN109308893A (en) * | 2018-10-25 | 2019-02-05 | 珠海格力电器股份有限公司 | Information transmission method and device, storage medium, and electronic device |
CN109756797A (en) * | 2017-11-07 | 2019-05-14 | 阿里巴巴集团控股有限公司 | Double upper linked methods, optical network management equipment and optical transmission system |
CN110349581A (en) * | 2019-05-30 | 2019-10-18 | 平安科技(深圳)有限公司 | Voice and text conversion transmission method, system, computer equipment and storage medium |
CN110913070A (en) * | 2019-11-22 | 2020-03-24 | 维沃移动通信有限公司 | Call method and terminal equipment |
CN110933238A (en) * | 2014-05-23 | 2020-03-27 | 三星电子株式会社 | System and method for providing voice-message call service |
CN110995921A (en) * | 2019-11-19 | 2020-04-10 | 维沃移动通信有限公司 | Call processing method, electronic device and computer readable storage medium |
CN112202803A (en) * | 2020-10-10 | 2021-01-08 | 北京字节跳动网络技术有限公司 | Audio processing method, device, terminal and storage medium |
CN112201232A (en) * | 2020-08-28 | 2021-01-08 | 星络智能科技有限公司 | Voice output control method, electronic device and computer readable storage medium |
US10917511B2 (en) | 2014-05-23 | 2021-02-09 | Samsung Electronics Co., Ltd. | System and method of providing voice-message call service |
CN112422747A (en) * | 2020-11-20 | 2021-02-26 | 维沃移动通信有限公司 | Call method and device |
CN112910747A (en) * | 2021-01-13 | 2021-06-04 | 贵州拓视实业有限公司 | Indoor voice forwarding processing method and device |
CN113923198A (en) * | 2021-09-17 | 2022-01-11 | 上海华信长安网络科技有限公司 | Method and device for improving low VoIP voice call quality |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101079836A (en) * | 2006-12-21 | 2007-11-28 | 腾讯科技(深圳)有限公司 | An instant communication method and system based on asymmetric media |
CN101452705A (en) * | 2007-12-07 | 2009-06-10 | 希姆通信息技术(上海)有限公司 | Voice character conversion nd cued speech character conversion method and device |
US20110195758A1 (en) * | 2010-02-10 | 2011-08-11 | Palm, Inc. | Mobile device having plurality of input modes |
-
2012
- 2012-05-02 CN CN2012101335145A patent/CN102710539A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101079836A (en) * | 2006-12-21 | 2007-11-28 | 腾讯科技(深圳)有限公司 | An instant communication method and system based on asymmetric media |
CN101452705A (en) * | 2007-12-07 | 2009-06-10 | 希姆通信息技术(上海)有限公司 | Voice character conversion nd cued speech character conversion method and device |
US20110195758A1 (en) * | 2010-02-10 | 2011-08-11 | Palm, Inc. | Mobile device having plurality of input modes |
Cited By (52)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9419866B2 (en) | 2012-11-01 | 2016-08-16 | Huawei Technologies Co., Ltd. | Method, node, and monitoring center detecting network fault |
WO2014067283A1 (en) * | 2012-11-01 | 2014-05-08 | 华为技术有限公司 | Network failure detecting method, node, and monitoring center |
WO2013182129A3 (en) * | 2013-03-22 | 2014-02-20 | 中兴通讯股份有限公司 | Cloud note implementation method and device |
WO2013182129A2 (en) * | 2013-03-22 | 2013-12-12 | 中兴通讯股份有限公司 | Cloud note implementation method and device |
CN105493425B (en) * | 2013-08-29 | 2019-04-30 | 统一有限责任两合公司 | Voice communication is maintained in crowded communication channel |
US10069965B2 (en) | 2013-08-29 | 2018-09-04 | Unify Gmbh & Co. Kg | Maintaining audio communication in a congested communication channel |
CN105493425A (en) * | 2013-08-29 | 2016-04-13 | 统一有限责任两合公司 | Maintaining audio communication in a congested communication channel |
CN104468479A (en) * | 2013-09-17 | 2015-03-25 | 北京三星通信技术研究有限公司 | Terminal communication method, device and system, and terminal |
CN104639728A (en) * | 2013-11-14 | 2015-05-20 | 阿尔卡特朗讯公司 | Method and equipment used for improving speed or video communication quality |
CN105933531A (en) * | 2013-11-15 | 2016-09-07 | 珠海市魅族科技有限公司 | Terminal communication method, terminal and server |
US10140989B2 (en) | 2013-12-10 | 2018-11-27 | Alibaba Group Holding Limited | Method and system for speech recognition processing |
CN104700836A (en) * | 2013-12-10 | 2015-06-10 | 阿里巴巴集团控股有限公司 | Voice recognition method and voice recognition system |
CN104700836B (en) * | 2013-12-10 | 2019-01-29 | 阿里巴巴集团控股有限公司 | A kind of audio recognition method and system |
US10249301B2 (en) | 2013-12-10 | 2019-04-02 | Alibaba Group Holding Limited | Method and system for speech recognition processing |
CN105723448A (en) * | 2014-01-21 | 2016-06-29 | 三星电子株式会社 | Electronic device and voice recognition method thereof |
CN112700774A (en) * | 2014-01-21 | 2021-04-23 | 三星电子株式会社 | Electronic equipment and voice recognition method thereof |
US11011172B2 (en) | 2014-01-21 | 2021-05-18 | Samsung Electronics Co., Ltd. | Electronic device and voice recognition method thereof |
US11984119B2 (en) | 2014-01-21 | 2024-05-14 | Samsung Electronics Co., Ltd. | Electronic device and voice recognition method thereof |
CN110933238A (en) * | 2014-05-23 | 2020-03-27 | 三星电子株式会社 | System and method for providing voice-message call service |
US10917511B2 (en) | 2014-05-23 | 2021-02-09 | Samsung Electronics Co., Ltd. | System and method of providing voice-message call service |
CN104113471A (en) * | 2014-07-18 | 2014-10-22 | 广州三星通信技术研究有限公司 | Information processing method and device during abnormal communicating junctions |
CN104113471B (en) * | 2014-07-18 | 2018-06-05 | 广州三星通信技术研究有限公司 | Information processing method and device when communication connection is abnormal |
CN105162944A (en) * | 2015-06-23 | 2015-12-16 | 上海斐讯数据通信技术有限公司 | Conversation system and conversation method |
CN105162944B (en) * | 2015-06-23 | 2018-04-06 | 上海斐讯数据通信技术有限公司 | A kind of phone system and method |
CN106686190A (en) * | 2015-11-06 | 2017-05-17 | 北京奇虎科技有限公司 | Call content record method, client and server |
CN105827504A (en) * | 2015-11-30 | 2016-08-03 | 维沃移动通信有限公司 | Voice information transmission method, mobile terminal and system |
CN105376513B (en) * | 2015-12-02 | 2019-09-10 | 小米科技有限责任公司 | The method and device of information transmission |
CN105376513A (en) * | 2015-12-02 | 2016-03-02 | 小米科技有限责任公司 | Information transmission method and device |
WO2017128991A1 (en) * | 2016-01-26 | 2017-08-03 | 阿里巴巴集团控股有限公司 | Instant communication method and instant communication system based on voice recognition |
CN106340297A (en) * | 2016-09-21 | 2017-01-18 | 广东工业大学 | Speech recognition method and system based on cloud computing and confidence calculation |
CN106448665A (en) * | 2016-10-28 | 2017-02-22 | 努比亚技术有限公司 | Voice processing device and method |
CN106653028A (en) * | 2016-11-17 | 2017-05-10 | 福建天泉教育科技有限公司 | Voice communication method and system |
CN106604243A (en) * | 2016-11-29 | 2017-04-26 | 珠海市魅族科技有限公司 | Interaction control method and device |
CN107332871A (en) * | 2017-05-18 | 2017-11-07 | 百度在线网络技术(北京)有限公司 | Advertisement sending method and device |
CN107786686A (en) * | 2017-10-26 | 2018-03-09 | 王梅 | A kind of system and method for being used to export multi-medium data |
CN107786686B (en) * | 2017-10-26 | 2021-06-25 | 安徽尚融信息科技股份有限公司 | System and method for outputting multimedia data |
CN109756797A (en) * | 2017-11-07 | 2019-05-14 | 阿里巴巴集团控股有限公司 | Double upper linked methods, optical network management equipment and optical transmission system |
CN108173740A (en) * | 2017-11-30 | 2018-06-15 | 维沃移动通信有限公司 | A kind of method and apparatus of voice communication |
CN108831475A (en) * | 2018-05-24 | 2018-11-16 | 广州市千钧网络科技有限公司 | A kind of text message extracting method and system |
CN109151148B (en) * | 2018-10-15 | 2021-04-16 | Oppo广东移动通信有限公司 | Call content recording method, device, terminal and computer readable storage medium |
CN109151148A (en) * | 2018-10-15 | 2019-01-04 | Oppo广东移动通信有限公司 | Recording method, device, terminal and the computer readable storage medium of dialog context |
CN109308893A (en) * | 2018-10-25 | 2019-02-05 | 珠海格力电器股份有限公司 | Information transmission method and device, storage medium, and electronic device |
WO2020237886A1 (en) * | 2019-05-30 | 2020-12-03 | 平安科技(深圳)有限公司 | Voice and text conversion transmission method and system, and computer device and storage medium |
CN110349581A (en) * | 2019-05-30 | 2019-10-18 | 平安科技(深圳)有限公司 | Voice and text conversion transmission method, system, computer equipment and storage medium |
CN110995921A (en) * | 2019-11-19 | 2020-04-10 | 维沃移动通信有限公司 | Call processing method, electronic device and computer readable storage medium |
CN110913070B (en) * | 2019-11-22 | 2021-11-23 | 维沃移动通信有限公司 | Call method and terminal equipment |
CN110913070A (en) * | 2019-11-22 | 2020-03-24 | 维沃移动通信有限公司 | Call method and terminal equipment |
CN112201232A (en) * | 2020-08-28 | 2021-01-08 | 星络智能科技有限公司 | Voice output control method, electronic device and computer readable storage medium |
CN112202803A (en) * | 2020-10-10 | 2021-01-08 | 北京字节跳动网络技术有限公司 | Audio processing method, device, terminal and storage medium |
CN112422747A (en) * | 2020-11-20 | 2021-02-26 | 维沃移动通信有限公司 | Call method and device |
CN112910747A (en) * | 2021-01-13 | 2021-06-04 | 贵州拓视实业有限公司 | Indoor voice forwarding processing method and device |
CN113923198A (en) * | 2021-09-17 | 2022-01-11 | 上海华信长安网络科技有限公司 | Method and device for improving low VoIP voice call quality |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102710539A (en) | Method and device for transferring voice messages | |
US9854008B2 (en) | Real time communication method, terminal device, real time communication server and real time communication system | |
US6249808B1 (en) | Wireless delivery of message using combination of text and voice | |
MX2012009253A (en) | Simultaneous conference calls with a speech-to-text conversion function. | |
EP3217638B1 (en) | Transferring information from a sender to a recipient during a telephone call under noisy environment | |
RU2658602C2 (en) | Maintaining audio communication in an overloaded communication channel | |
US9877178B2 (en) | System and method for delivering wireless emergency alerts to residential phones | |
WO2017105751A1 (en) | Sending a transcript of a voice conversation during telecommunication | |
CN107846520B (en) | Single-pass detection method and device | |
US8432796B2 (en) | Method, computer program product, and apparatus for providing automatic gain control via signal sampling and categorization | |
US20070117588A1 (en) | Rejection of a call received over a first network while on a call over a second network | |
CN104683402B (en) | Communication means and user equipment | |
CN112422747A (en) | Call method and device | |
KR200462920Y1 (en) | Internet protocol equipment for improving sound quality of melodyring | |
WO2012028062A1 (en) | Method and system for transmitting instant information during call | |
CN114125397A (en) | Audio and video communication method, device and system | |
CN104683960A (en) | Method and device for implementing voice communication | |
CN104980596A (en) | Method and device capable of triggering calling and content in parallel, and terminal | |
CN110034858A (en) | Data package retransmission method, device, mobile terminal and storage medium | |
US11930053B2 (en) | Chirp signal filtering for digital gateway | |
CN103905675B (en) | Regulate method and the device of voip phone system telephone echo | |
CN113286110A (en) | Video call method and device, electronic equipment and storage medium | |
CN101707685A (en) | Method, system and terminal thereof for processing video call | |
CN116032904A (en) | VoIP call quality detection method and device | |
KR100932269B1 (en) | Message call service provision terminal, server and method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C12 | Rejection of a patent application after its publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20121003 |