CN104616652A

CN104616652A - Voice transmission method and device

Info

Publication number: CN104616652A
Application number: CN201510016680.0A
Authority: CN
Inventors: 陈志军; 侯文迪; 王百超
Original assignee: Xiaomi Inc
Current assignee: Beijing Xiaomi Technology Co Ltd; Xiaomi Inc
Priority date: 2015-01-13
Filing date: 2015-01-13
Publication date: 2015-05-13

Abstract

The invention relates to a voice transmission method and device. The voice transmission method includes: starting to receive voice signals to be transmitted to an opposite terminal; when preset voice division duration is reached, sending currently received voice clips to a server which will sends the voice clips to the opposite terminal in real time. The voice transmission method and device improve the voice transmission efficiency.

Description

Voice transmission method and device

Technical field

The disclosure relates to Internet technology, particularly relates to voice transmission method and device.

Background technology

In correlation technique, immediate communication tool can send voice and chat.Such as, user A wants by immediate communication tool and user B voice-enabled chat, after usual user A finishes the voice of oneself (hypothesis the voice of a minute), the client of A can by disposable for these voice client being sent to user B, user B listens to these voice (listen to also can one minute used time) by its client, in this case the double time has been used in the transmission of voice, the i.e. twice of a minute, voice transfer efficiency is very low, A and B is too slow by the speed of immediate communication tool chatting.

Summary of the invention

For overcoming Problems existing in correlation technique, the disclosure provides a kind of voice transmission method and device, to improve voice transfer efficiency.

According to the first aspect of disclosure embodiment, a kind of voice transmission method is provided, comprises:

Start to receive the voice signal to opposite end to be transmitted;

Every default voice divide duration, and the audio fragment be currently received is sent to server, and described server is used for described audio fragment to be sent to opposite end in real time.

According to the second aspect of disclosure embodiment, a kind of voice transmission method is provided, comprises:

Receive the audio fragment that the first client sends in real time, described audio fragment is the first client when receiving the voice signal to the second client to be transmitted, obtains when every default voice divide duration;

By described audio fragment real-time Transmission to the second client.

According to the third aspect of disclosure embodiment, a kind of speech transmission device is provided, comprises:

Signal receiving module, receives the voice signal to opposite end to be transmitted for starting;

Transmission process module, divides duration for every default voice, the audio fragment be currently received is sent to server, and described server is used for described audio fragment to be sent to opposite end in real time.

According to the fourth aspect of disclosure embodiment, a kind of speech transmission device is provided, comprises:

Signal receiving module, for receiving the audio fragment that the first client sends in real time, described audio fragment is the first client when receiving the voice signal to the second client to be transmitted, obtains when every default voice divide duration;

Signal transmitting module, for by described audio fragment real-time Transmission to the second client.

According to the 5th aspect of disclosure embodiment, a kind of server is provided, comprises:

Processor;

For the storer of storage of processor executable instruction;

Wherein, described processor is configured to: receive the audio fragment that the first client sends in real time, and described audio fragment is the first client when receiving the voice signal to the second client to be transmitted, obtains when every default voice divide duration; By described audio fragment real-time Transmission to the second client.

According to the 6th aspect of disclosure embodiment, a kind of terminal is provided, comprises:

Processor;

For the storer of storage of processor executable instruction;

Wherein, described processor is configured to: start to receive the voice signal to opposite end to be transmitted; Every default voice divide duration, and the audio fragment be currently received is sent to server, and described server is used for described audio fragment to be sent to opposite end in real time.

The technical scheme that embodiment of the present disclosure provides can comprise following beneficial effect: by the voice signal received is divided into multiple audio fragment and real-time Transmission, improve voice transfer efficiency relative to voice signal overall transfer.

Should be understood that, it is only exemplary and explanatory that above general description and details hereinafter describe, and can not limit the disclosure.

Accompanying drawing explanation

Accompanying drawing to be herein merged in instructions and to form the part of this instructions, shows embodiment according to the invention, and is used from instructions one and explains principle of the present invention.

Fig. 1 is the application scenarios figure of the voice transmission method according to an exemplary embodiment;

Fig. 2 is the process flow diagram of a kind of voice transmission method according to an exemplary embodiment;

Fig. 3 is the process flow diagram of the another kind of voice transmission method according to an exemplary embodiment;

Fig. 4 is that the voice in the voice transmission method according to an exemplary embodiment divide schematic diagram;

Fig. 5 is the process flow diagram of another voice transmission method according to an exemplary embodiment;

Fig. 6 is the process flow diagram of another voice transmission method according to an exemplary embodiment;

Fig. 7 is the structural representation of a kind of speech transmission device according to an exemplary embodiment;

Fig. 8 is the structural representation of the another kind of speech transmission device according to an exemplary embodiment;

Fig. 9 is the structural representation of another speech transmission device according to an exemplary embodiment;

Figure 10 is the block diagram of a kind of server according to an exemplary embodiment;

Figure 11 is the block diagram of a kind of intelligent terminal according to an exemplary embodiment.

Embodiment

Here will be described exemplary embodiment in detail, its sample table shows in the accompanying drawings.When description below relates to accompanying drawing, unless otherwise indicated, the same numbers in different accompanying drawing represents same or analogous key element.Embodiment described in following exemplary embodiment does not represent all embodiments consistent with the present invention.On the contrary, they only with as in appended claims describe in detail, the example of apparatus and method that aspects more of the present invention are consistent.

Fig. 1 is the application scenarios figure of the voice transmission method according to an exemplary embodiment, as shown in Figure 1, two mobile phones are installed all respectively an instant communication client, suppose that the client on one of them mobile phone 11 is the first customer end A, client on another mobile phone 12 is the second customer end B, server 13 is also show in Fig. 1, this server 13 is instant communication servers, first customer end A and the second customer end B connect with server 13 all respectively, and such first customer end A and the second customer end B just carry out communication by server 13.

It should be noted that, above-mentioned Fig. 1 is a kind of exemplary scenario, actual implement in be not limited thereto, such as instant communication client also can be to operate on other portable terminal devices such as panel computer etc.The voice transmission method of disclosure embodiment is applied to the speech communication process between the first customer end A and the second customer end B; See Fig. 2, this Fig. 2 is for executive agent is to describe the flow process of voice transmission method with one of them instant communication client, such as, for from the first client to the second client transmissions voice signal, the following flow process of the first client executing as voice signal transmitting terminal:

201, start to receive the voice signal to opposite end to be transmitted;

202, every default voice divide duration, and the audio fragment be currently received is sent to server, and described server is used for described audio fragment to be sent to opposite end in real time.

If take server as executive agent, this server performs the flow process shown in Fig. 3:

301, receive the audio fragment that the first client sends in real time, described audio fragment is the first client when receiving the voice signal to the second client to be transmitted, obtains when every default voice divide duration;

302, by described audio fragment real-time Transmission to the second client.

Wherein, first client receives the voice signal to the second client to be transmitted, this scene is such as, user Xiao Zhang wants to send voice signal by instant communication client to user Xiao Li, tell Xiao Li some things, Xiao Zhang probably needs to speak and this thing has been said for 1 minute, and that is exactly the voice signal of 1 minute, and the first customer end A is by this voice signal of reception.During concrete enforcement, Xiao Zhang can pass through its mobile phone log-in instant communication client account, the instant communication client (i.e. the first client) of the rear Xiao Zhang of login is set up network with server and is connected, and Xiao Zhang sends voice signal select good friend Xiao Li from its address list.

In the present embodiment, the first client is when receiving the voice signal that Xiao Zhang speaks, and voice signal can be divided into multiple audio fragment by this first client, and is to server by each audio fragment real-time Transmission.

Fig. 4 illustrates the division of voice signal, supposes that the voice of 1 minute that Xiao Zhang says have been divided into six parts altogether, comprises T1, T2, T3 ... T6, every portion is called one " audio fragment ", and namely T1 is an audio fragment, and T3 is also an audio fragment etc.In concrete enforcement, first client divides audio fragment like this: the initial time of Xiao Zhang being spoken is set to 0, i.e. voice starting point, and the first client will be voice timing, when duration of speaking reaches the end time point a1 of T1, client is using the voice in the T1 period as an audio fragment, and coding is sent to server; Simultaneously, Xiao Zhang continues speak (speaking of Xiao Zhang can not stop), first client continues timing, when duration of speaking reaches the end time point a2 of T2, client using the voice in the T2 period as an audio fragment, coding is sent to server, by that analogy follow-up, repeats no more.This process be just equivalent to first reception limit, client limit send, voice signal is cut into several period, in batches be sent to server, and be no longer wait for user said disposable transmission.

Server is after each audio fragment receiving the first customer end A transmission, whether the second customer end B can inquiring about lower user Xiao Li connects (namely whether Xiao Li is online) with server, if B has connection, then each audio fragment is transferred to the second client by server.When transmitting, server can come to send to the second client according to the order received from the first client, and such as in the audio fragment of six shown in Fig. 4, first server receives T1 and then when the second client sends, also first send T1.

Optionally, in order to ensure succession when audio fragment transmits further, the present embodiment can also arrange sequence identification in the packet of each audio fragment, this sequence identification is for representing the position of audio fragment in the multiple audio fragment of voice signal, to make server according to this sequence identification transmission of audio fragment to opposite end, audio fragment is transmitted in order and plays.

The voice transmission method of disclosure embodiment, on the one hand, client in the method need not wait user to say disposable transmission, but phonetic segmentation can be become multiple audio fragment send in batches, the time of such opposite end received speech signal will shift to an earlier date, such as, server will directly be forwarded to the second customer end B after receiving the T1 fragment of A transmission, B can play-over this T1 audio frequency, at this moment the time that these voice of B uppick distance A loquiturs the fastest only T second, that is the time of B uppick voice signal is advanced by greatly relative to traditional approach; On the other hand, the sound transmission course between A and B, need not set up direct connection between the two, both remain and connect with server respectively, transfer is carried out by server, lower to the requirement of network condition, avoid the disconnection problem directly connecting real time phone call and occur.

In addition, the voice transfer of this mode, the audibility as the second client of receiving end is also relatively good.Than described above, server is after being sent to B by the voice of T1 duration, the audio frequency that server continuous reception is follow-up also continues to send to B, after B hears out T1 data, the voice data of T2 is also sent to, client automatic seamless start the audio frequency playing T2, namely follow-up audio fragment can be connected on previous audio fragment by the second customer end B automatically, user is not aware of the segmentation transmission of voice signal, sound it being coherent voice, and, second customer end B also can be the same with common voice transfer mode on the UI of interface, such as, what user clicked that the button listening to voice then starts to link up listens to voice signal.Can be seen by above-mentioned, the user as receiving end not only need not wait transmitting terminal user to say just can uppick sound, and the voice of uppick feel more coherent, and effect is fine.

Optionally, the first client, when voice signal is divided into multiple audio fragment, can be divide duration according to the voice preset, and is divided into by voice signal multiple respectively to dividing the audio fragment of duration by voice.

In one embodiment, it is equal that the voice that multiple audio fragment is corresponding respectively divide duration.Still for six of Fig. 4, T1 to T6 durations that audio fragment is corresponding, predetermined is all 10 seconds, so namely the first client timing from start time 0, reached for 10 seconds as first audio fragment T1, from a1, timing reached for 10 seconds as second audio fragment T2, etc.If the duration of remaining T6 fragment was less than for 10 seconds, also it can be used as that audio fragment is directly disposable is sent to server.

In another embodiment, multiple audio fragment can be distinguished corresponding different voice and divide duration, and such as these voice preset divide duration and can comprise plural duration.Still for Fig. 4, such as, the duration of T1 fragment was 5 seconds, and the duration of T2 fragment was 10 seconds, and the duration of T3 fragment is 11 seconds etc., is all fine, sends in batches as long as this voice signal of 1 minute is divided into multiple audio fragment.

Above-mentioned voice divide duration, can be stored in the first client, divide audio fragment by the first client when receiving voice signal according to this duration.These voice divide the set of time of duration, and how long the duration by each fragment is set to, and can be that client is preset, also can be that client receives from server, wait various ways.

In disclosure embodiment, the voice the preset division duration that the first client divides audio fragment is accordingly adjustable, can carry out suitable prolongation according to voice transfer network condition or reduce.Flow process shown in Figure 5:

501, server obtains voice transfer network condition;

Wherein, this voice transfer network comprises: the network between server and the first client, or the network between server and the second client, server can perceive the state that these networks connect, such as network is poor, data transmission ratio is comparatively slow, or network state is better, and data transmission ratio faster etc.

502, server sends duration steering order according to described voice transfer network condition to the first client;

Wherein, after server perceives network condition in 501, can send duration steering order accordingly to the first client, this instruction is used to indicate the first client and extends according to network state or reduce voice division duration.Such as, if the network state between server and the second client is poor, data transmission ratio is slower, postpone in order to avoid the second customer end B receiving voice causes and block the situation of timing, voice can be divided duration proper extension, the voice being initially such as 10 seconds, as an audio fragment, can extend to 20 seconds as an audio fragment.Or in the reasonable situation of network state, server also can indicate and shorten voice and divide duration, within considerable enough hour at that time, is similar to real time phone call, at that time considerable enough large time be similar to disposable transmission.

If discovering server is not online as the second customer end B of receiving end, then server now can be thought and is equivalent to the non-constant of network condition, the first client can be indicated to extend voice and divide duration T to enough large, approximate disposable transmission; Or server can indicate the first client no longer to adopt the mode of the division audio fragment of the present embodiment, but adopt traditional approach to carry out voice transmission, this situation also can be considered as a kind of network state and voice division duration of special case, namely network state is now exactly that the second customer end B is not online, be considered as the non-constant of network state, default voice division duration is now considered as infinity and namely no longer adopts audio fragment dividing mode.Concrete, for Fig. 4, when the duration that user speaks also does not arrive the end time a1 of T1, server just detects that the second client does not connect, and server is notified of this situation of the first client, then the first client can extend T1 to enough large, approximate disposable transmission.Certainly, server also can according to mode recited above, and voice signal remains and is divided into multiple fragment and is also fine.

In this step, the mode that server sends duration steering order is also more flexibly, and such as, server can only indicate the first client to extend, how long can by the first client decision but extend; Or, the time that server also can directly indicate the first client to extend, server can obtain the first client initial speech when the first client registers divides duration, and server network condition can determine to extend how long be relatively applicable to current network state, then indicate the first client to extend the duration determined, such as, indicate and voice are divided duration extend 3 seconds.

503, the first client receives duration steering order, and adjusts voice division duration accordingly.

The adjustment of this first client can see 502, and after adjustment duration, the first client, when subsequently received voice signal, divides voice signal by dividing duration according to these new voice.

In addition, the first client, before receiving the voice signal to opposite end to be transmitted, can also receive to be used to indicate and enable the enable command that voice divide transmission mode.Such as, first client can provide user the option selected, for selecting for user the voice transmission method whether opening the present embodiment, if opening, adopting the method voice signal to be divided into audio fragment and sending, if do not enabled, still adopt traditional approach to send.This enable command refers to and gets user to the selection of enabling the method.

First client also can as the receiving end of voice signal, and the voice signal that the second client that reception server forwards is replied, performs the flow process shown in Fig. 6; Certainly, when the first client is as transmitting terminal, this flow process is by the second client executing.

601, receive the audio fragment that described server sends in real time, described audio fragment is that described opposite end is sent to described server;

602, described audio fragment is play.

Disclosure embodiment provides a kind of speech transmission device, and this device can be instant communication client; About the device in this embodiment, wherein the concrete mode of modules executable operations has been described in detail in about the embodiment of the method, will not elaborate explanation herein.As shown in Figure 7, this device comprises: signal receiving module 71 and transmission process module 72, wherein,

Signal receiving module 71, receives the voice signal to opposite end to be transmitted for starting;

Transmission process module 72, divides duration for every default voice, the audio fragment be currently received is sent to server, and described server is used for described audio fragment to be sent to opposite end in real time.

Fig. 8 illustrates the structure of another kind of device, and on the basis shown in Fig. 7, the transmission process module 72 of this device can comprise: duration controls submodule 721 and voice divide submodule 722; Wherein,

Duration controls submodule 721, divides duration for storing default voice;

Voice divide submodule 722, when the voice preset stored divide duration, obtain audio fragment for controlling submodule at every described duration.

Further, this duration controls submodule 721, and also for receiving the duration steering order dividing duration for adjusting default voice that described server sends, described duration steering order is that described server is determined according to voice transfer network condition; And according to described duration steering order, the described voice that adjustment is preset divide duration.

This device also comprises: enable indicating module 73, for start to receive the voice signal to opposite end to be transmitted at described signal receiving module before, also comprises: receive to be used to indicate and enables the enable command that voice divide transmission mode.

Further, this device also comprises: voice playing module 74; Signal receiving module 71, also for receiving the audio fragment that described server sends in real time, described audio fragment is that described opposite end is sent to described server.Voice playing module 74, for playing the audio fragment that described signal receiving module receives.

Fig. 9 illustrates a kind of structure of speech transmission device, and this plant running is at server side, and this device comprises: signal receiving module 91 and signal transmitting module 92; Wherein,

Signal receiving module 91, for receiving the audio fragment that the first client sends in real time, described audio fragment is the first client when receiving the voice signal to the second client to be transmitted, obtains when every default voice divide duration;

Signal transmitting module 92, for by described audio fragment real-time Transmission to the second client.

Figure 10 is the block diagram of a kind of server 1900 according to an exemplary embodiment.Such as, server 1900 may be provided in a server.With reference to Figure 10, device 1900 comprises processing components 1922, and it comprises one or more processor further, and the memory resource representated by storer 1932, can such as, by the instruction of the execution of processing element 1922, application program for storing.The application program stored in storer 1932 can comprise each module corresponding to one group of instruction one or more.In addition, processing components 1922 is configured to perform instruction, to perform the method for above-mentioned server side.

Device 1900 can also comprise the power management that a power supply module 1926 is configured to actuating unit 1900, and a wired or wireless network interface 1950 is configured to device 1900 to be connected to network, and input and output (I/O) interface 1958.Device 1900 can operate the operating system based on being stored in storer 1932, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM or similar.

In the exemplary embodiment, additionally provide a kind of non-transitory computer-readable recording medium comprising instruction, such as, comprise the storer of instruction, above-mentioned instruction can perform said method by the processor 820 of device.Such as, described non-transitory computer-readable recording medium can be ROM, random access memory (RAM), CD-ROM, tape, floppy disk and optical data storage devices etc.

Figure 11 is the block diagram of a kind of device 1100 according to an exemplary embodiment.Such as, device 1100 can be mobile phone, tablet device, personal digital assistant etc.

With reference to Figure 11, device 1100 can comprise following one or more assembly: processing components 1102, storer 1104, power supply module 1106, multimedia groupware 1108, audio-frequency assembly 1110, the interface 1112 of I/O (I/O), sensor module 1114, and communications component 1116.

The integrated operation of the usual control device 1100 of processing components 1102, such as with display, call, data communication, camera operation and record operate the operation be associated.Processing components 1102 can comprise one or more processor 1120 to perform instruction, to complete all or part of step of the method for above-mentioned end side.In addition, processing components 1102 can comprise one or more module, and what be convenient between processing components 1102 and other assemblies is mutual.Such as, processing element 1102 can comprise multi-media module, mutual with what facilitate between multimedia groupware 1108 and processing components 1102.

Storer 1104 is configured to store various types of data to be supported in the operation of equipment 1100.The example of these data comprises for any application program of operation on device 1100 or the instruction of method, contact data, telephone book data, message, picture, video etc.Storer 1104 can be realized by the volatibility of any type or non-volatile memory device or their combination, as static RAM (SRAM), Electrically Erasable Read Only Memory (EEPROM), Erasable Programmable Read Only Memory EPROM (EPROM), programmable read only memory (PROM), ROM (read-only memory) (ROM), magnetic store, flash memory, disk or CD.

The various assemblies that electric power assembly 1106 is device 1100 provide electric power.Electric power assembly 1106 can comprise power-supply management system, one or more power supply, and other and the assembly generating, manage and distribute electric power for device 1100 and be associated.

Multimedia groupware 1108 is included in the screen providing an output interface between described device 1100 and user.In certain embodiments, screen can comprise liquid crystal display (LCD) and touch panel (TP).If screen comprises touch panel, screen may be implemented as touch-screen, to receive the input signal from user.Touch panel comprises one or more touch sensor with the gesture on sensing touch, slip and touch panel.Described touch sensor can the border of not only sensing touch or sliding action, but also detects the duration relevant to described touch or slide and pressure.In certain embodiments, multimedia groupware 1108 comprises a front-facing camera and/or post-positioned pick-up head.When equipment 1100 is in operator scheme, during as screening-mode or video mode, front-facing camera and/or post-positioned pick-up head can receive outside multi-medium data.Each front-facing camera and post-positioned pick-up head can be fixing optical lens systems or have focal length and optical zoom ability.

Audio-frequency assembly 1110 is configured to export and/or input audio signal.Such as, audio-frequency assembly 1110 comprises a microphone (MIC), and when device 1100 is in operator scheme, during as call model, logging mode and speech recognition mode, microphone is configured to receive external audio signal.The sound signal received can be stored in storer 1104 further or be sent via communications component 1116.In certain embodiments, audio-frequency assembly 1110 also comprises a loudspeaker, for output audio signal.

I/O interface 1112 is for providing interface between processing components 1102 and peripheral interface module, and above-mentioned peripheral interface module can be keyboard, some striking wheel, button etc.These buttons can include but not limited to: home button, volume button, start button and locking press button.

Sensor module 1114 comprises one or more sensor, for providing the state estimation of various aspects for device 1100.Such as, sensor module 1114 can detect the opening/closing state of equipment 1100, the relative positioning of assembly, such as described assembly is display and the keypad of device 1100, the position of all right pick-up unit 1100 of sensor module 1114 or device 1100 assemblies changes, the presence or absence that user contacts with device 1100, the temperature variation of device 1100 orientation or acceleration/deceleration and device 1100.Sensor module 1114 can comprise proximity transducer, be configured to without any physical contact time detect near the existence of object.Sensor module 1114 can also comprise optical sensor, as CMOS or ccd image sensor, for using in imaging applications.In certain embodiments, this sensor module 1114 can also comprise acceleration transducer, gyro sensor, Magnetic Sensor, pressure transducer or temperature sensor.

Communications component 1116 is configured to the communication being convenient to wired or wireless mode between device 1100 and other equipment.Device 1100 can access the wireless network based on communication standard, as WiFi, 2G or 3G, or their combination.In one exemplary embodiment, communication component 1116 receives from the broadcast singal of external broadcasting management system or broadcast related information via broadcast channel.In one exemplary embodiment, described communication component 1116 also comprises near-field communication (NFC) module, to promote junction service.Such as, can based on radio-frequency (RF) identification (RFID) technology in NFC module, Infrared Data Association (IrDA) technology, ultra broadband (UWB) technology, bluetooth (BT) technology and other technologies realize.

In the exemplary embodiment, device 1100 can be realized, for performing the method for above-mentioned end side by one or more application specific integrated circuit (ASIC), digital signal processor (DSP), digital signal processing appts (DSPD), programmable logic device (PLD) (PLD), field programmable gate array (FPGA), controller, microcontroller, microprocessor or other electronic components.

In the exemplary embodiment, additionally provide a kind of non-transitory computer-readable recording medium comprising instruction, such as, comprise the storer 1104 of instruction, above-mentioned instruction can have been performed the method for above-mentioned end side by the processor 1102 of device 1100.Such as, described non-transitory computer-readable recording medium can be ROM, random access memory (RAM), CD-ROM, tape, floppy disk and optical data storage devices etc.

Those skilled in the art, at consideration instructions and after putting into practice invention disclosed herein, will easily expect other embodiment of the present invention.The application is intended to contain any modification of the present invention, purposes or adaptations, and these modification, purposes or adaptations are followed general principle of the present invention and comprised the undocumented common practise in the art of the disclosure or conventional techniques means.Instructions and embodiment are only regarded as exemplary, and true scope of the present invention and spirit are pointed out by claim below.

Should be understood that, the present invention is not limited to precision architecture described above and illustrated in the accompanying drawings, and can carry out various amendment and change not departing from its scope.Scope of the present invention is only limited by appended claim.

Claims

1. a voice transmission method, is characterized in that, comprising:

Start to receive the voice signal to opposite end to be transmitted;

2. method according to claim 1, is characterized in that, described default voice divide duration and comprise two or more duration.

3. method according to claim 1, is characterized in that, described method also comprises:

Receive the duration steering order dividing duration for adjusting voice that described server sends, described duration steering order is that described server is determined according to voice transfer network condition;

According to described duration steering order, adjust described default voice and divide duration.

4. method according to claim 1, is characterized in that, described method also comprises:

In the packet of described audio fragment, arrange sequence identification, described sequence identification is for representing the position of described audio fragment in multiple audio fragments of described voice signal.

5. method according to claim 1, is characterized in that, described method also comprises: receive to be used to indicate and enable the enable command that voice divide transmission mode.

6. method according to claim 1, is characterized in that, described method also comprises:

Receive the audio fragment that described server sends in real time, described audio fragment is that described opposite end is sent to described server;

Play described audio fragment.

7. a voice transmission method, is characterized in that, comprising:

By described audio fragment real-time Transmission to the second client.

8. method according to claim 7, is characterized in that, described method also comprises:

Obtain voice transfer network condition, described voice transfer network comprises the network between self and the first client or the network between self and the second client;

Send duration steering order according to described voice transfer network condition to described first client, the voice preset that described first client is used for dividing audio fragment according to this according to described duration steering order adjustment divide duration.

9. a speech transmission device, is characterized in that, comprising:

10. device according to claim 9, is characterized in that, described transmission process module, comprising:

Duration controls submodule, divides duration for storing default voice;

Voice divide submodule, when the voice preset stored divide duration, obtain audio fragment for controlling submodule at every described duration.

11. devices according to claim 10, is characterized in that,

Described duration controls submodule, and also for receiving the duration steering order dividing duration for adjusting default voice that described server sends, described duration steering order is that described server is determined according to voice transfer network condition; And according to described duration steering order, the described voice that adjustment is preset divide duration.

12. devices according to claim 9, is characterized in that, described device also comprises:

Enable indicating module, for start to receive the voice signal to opposite end to be transmitted at described signal receiving module before, also comprise: receive to be used to indicate and enables the enable command that voice divide transmission mode.

13. devices according to claim 9, is characterized in that,

Described signal receiving module, also for receiving the audio fragment that described server sends in real time, described audio fragment is that described opposite end is sent to described server;

Also comprise: voice playing module, for playing the audio fragment that described signal receiving module receives.

14. 1 kinds of speech transmission devices, is characterized in that, comprising:

15. 1 kinds of servers, is characterized in that, comprising:

Processor;

For the storer of storage of processor executable instruction;

16. 1 kinds of terminals, is characterized in that, comprising:

Processor;

For the storer of storage of processor executable instruction;