CN107070854A

CN107070854A - A kind of method of transmitting audio data, equipment and device

Info

Publication number: CN107070854A
Application number: CN201611128953.1A
Authority: CN
Inventors: 惠庆华; 李明
Original assignee: Xian Huawei Technologies Co Ltd
Current assignee: Xian Huawei Technologies Co Ltd
Priority date: 2016-12-09
Filing date: 2016-12-09
Publication date: 2017-08-18
Also published as: WO2018103661A1

Abstract

This application discloses a kind of method of transmitting audio data, equipment and device, belong to wireless communication technology field.Methods described includes：Access network equipment receives the first speech data, and the first speech data is the caller speech data encoded based on the first voice encoding and decoding mode that calling terminal is sent；Access network equipment sends second speech data to equipment of the core network, and second speech data is the caller speech data encoded based on the second voice encoding and decoding mode；Access network equipment sends the 3rd speech data to terminal called, 3rd speech data is the caller speech data encoded based on the 3rd voice encoding and decoding mode, wherein, in the first voice encoding and decoding mode, the second voice encoding and decoding mode and the 3rd voice encoding and decoding mode at least two differ.Using the present invention, the problem of voice conversation quality is reduced caused by being limited in the prior art due to voice conversation service by the voice coding/decoding capability of a certain logical functional entity can be solved.

Description

A kind of method of transmitting audio data, equipment and device

Technical field

The application is related to wireless network communication technique field, more particularly to a kind of method of transmitting audio data, equipment and Device.

Background technology

Voice conversation service is one of basic service of wireless network, and with the development of the communication technology, multiple voice compiles solution Code mode is widely used, and to lift voice conversation quality, meets demand of the user to high-quality speech session.For example, at present Conventional voice encoding and decoding mode is included but are not limited to：AMR-NB (Adaptive Multi-Rate Narrowband, arrowband Adaptation rate voice), AMR-WB (Adaptive Multi-Rate Wideband, wideband adaptive rate speech), EVS- NB (Enhanced Voice Services Narrowband, enhancing speech business arrowband encoding and decoding), EVS-WB (Enhanced Voice Services Wideband, enhancing speech business wideband codec), EVS-SWB (Enhanced Voice Services Super Wideband, enhancing speech business ultrabroad band encoding and decoding), EVS-FB (Enhanced Voice Services Fullband, enhancing speech business Whole frequency band encoding and decoding), the voice quality grade provided according to correspondence by height to It is low, sequentially it is followed successively by：EVS-FB>EVS-SWB>EVS-WB>EVS-NB>AMR-WB>AMR-NB.

At present, the flow of transmitting audio data is as follows in existing voice session service：Caller speech data：Calling terminal → Access network equipment → equipment of the core network → access network equipment → terminal called；Called speech data：Terminal called → access network is set Standby → equipment of the core network → access network equipment → calling terminal.During the implementing of voice data transmission, usually require that All logical functional entities (equipment of the core network, access network equipment, calling terminal and terminal called) in end-to-end link are supported Identical voice encoding and decoding mode, the biography based on the identical voice encoding and decoding mode and then the corresponding speech data of execution It is defeated.

However, in actual applications, operator often fails to equipment of the core network because of update cost or other reasonses Upgrading is updated, and then causes the voice coding/decoding capability corresponding to equipment of the core network can not also obtain updating upgrading, core Net equipment can not support section or all voice quality grades voice encoding and decoding mode.

When equipment of the core network only supports voice encoding and decoding mode (e.g., the AMR-NB) of low speech quality grade, even if main Make terminal and terminal called to support other voice conversation quality more preferably voice encoding and decoding modes, can only also select tonequality of speaking in a low voice The voice encoding and decoding mode transmitting audio data of grade is measured, so, voice conversation quality can be reduced.

The content of the invention

In order to solve problem of the prior art, the embodiment of the present application provides a kind of method of transmitting audio data, equipment And device.The technical scheme is as follows：

First aspect includes there is provided a kind of method of transmitting audio data, methods described：

Caller speech data can be encoded and generate the first speech data by calling terminal according to the first voice encoding and decoding mode, Access network equipment is then sent to, access network equipment is received after the first speech data, the first speech data can be decoded, Then according to the second voice encoding and decoding mode coding generation second speech data, equipment of the core network is then sent to, in addition, may be used also According to the 3rd voice encoding and decoding mode coding the 3rd speech data of generation, to be then sent to terminal called.Wherein, the first voice At least two differ in code encoding/decoding mode, the second voice encoding and decoding mode and the 3rd voice encoding and decoding mode.

Scheme shown in the embodiment of the present application, the caller speech data that access network equipment directly can report calling terminal Terminal called is sent to, transmission path needs not move through equipment of the core network, you can complete speech data from calling terminal to called end Transmission between end.In other words, in the scheme described in the embodiment of the present application, speech data is in each stage of transmission Transmission is relatively independent.It can be seen that, the voice conversation service realized by the scheme described in the embodiment of the present application, in voice data transmission Each stage can be independent the best voice code encoding/decoding mode that can be supported of selection terminal, not by other third party's equipment The influence (not influenceed by the voice coding/decoding capability of equipment of the core network particularly) of voice coding/decoding capability, it is ensured that voice meeting The uniformity of call business quality and terminal capability, effectively increases voice conversation quality.

In a kind of possible implementation, methods described also includes：Terminal called can by called speech data according to 3rd voice encoding and decoding mode coding the 4th speech data of generation, is then sent to access network equipment, access network equipment is received After 4th speech data, the 4th speech data can be decoded, then according to the second voice encoding and decoding mode coding generation the Five speech datas, are then sent to equipment of the core network, further, it is also possible to according to the first voice encoding and decoding mode coding generation the 6th Speech data, is then sent to terminal called by the 6th speech data.

Scheme shown in the embodiment of the present application, terminal called is sending process and the master of called speech data to calling terminal Make the transmitting procedure of speech data essentially identical, conversion and the speech data of voice encoding and decoding mode are realized by access network equipment Forwarding.

In a kind of possible implementation, methods described also includes：Access network equipment can be whole according to itself and caller The voice encoding and decoding mode supported jointly is held, the voice encoding and decoding mode that voice data transmission link between the two is used is true It is set to the first voice encoding and decoding mode, similarly, access network equipment can be respectively by it between equipment of the core network and itself and quilt The voice encoding and decoding mode for making the voice data transmission link between terminal use is defined as the second voice encoding and decoding mode and Three voice encoding and decoding modes.

Scheme shown in the embodiment of the present application, can determine that access network is set by way of voice encoding and decoding mode is consulted The standby voice encoding and decoding mode between other each equipment, it is ensured that used between access network equipment and other each equipment Voice encoding and decoding mode be the voice quality grade highest voice encoding and decoding mode supported jointly of interaction both sides, it is ensured that language Higher voice conversation quality during sound data transfer.

In a kind of possible implementation, the voice quality grade of the second voice encoding and decoding mode is less than or equal to first The voice quality grade of voice encoding and decoding mode and the 3rd voice encoding and decoding mode.

Scheme shown in the embodiment of the present application, can realize access network on the basis of existing communication framework is not changed and set The transmission of the standby speech data between equipment of the core network, it is ensured that scheme and existing communication framework described in the embodiment of the present application Compatibility, it is ensured that the normal execution for other the every business realized based on equipment of the core network side, such as：Based on equipment of the core network Other supplementary services related to voice conversation such as session timing, session charging and session traffic monitoring that side is realized.And access The second voice encoding and decoding mode selected between net equipment and equipment of the core network does not interfere with calling/called terminal room speech data and passed Voice conversation quality during defeated, so the junior voice encoding and decoding mode of voice quality can be selected, and it is most of Equipment of the core network and access network equipment are satisfied by the junior voice encoding and decoding mode of voice quality, so can preassign Second voice encoding and decoding mode is the voice encoding and decoding mode of voice data transmission link between access network equipment and equipment of the core network. And voice conversation quality more depends on the first voice encoding and decoding mode and the 3rd voice coder solution during voice data transmission Code mode, so the voice quality of the first voice encoding and decoding mode and the 3rd voice encoding and decoding mode is higher ranked.

In a kind of possible implementation, whether access network equipment may determine that terminal called with voice has been locally created Data transmission link, if so, then obtained the 3rd speech data transmission can will be encoded according to the 3rd voice encoding and decoding mode To terminal called, if terminal called is set up with other access network equipments (i.e. target access network equipment) voice data transmission chain Road, then access network equipment, which can will encode obtained the first speech data according to the first voice encoding and decoding mode and be sent to target, connects Log equipment, then target access network equipment can be decoded to the first speech data, then be consulted based on it with terminal called The 4th voice encoding and decoding mode speech data is carried out to encode and obtain the 7th speech data, then the 7th speech data is sent To terminal called.Or can also by access network equipment first according to the 4th voice encoding and decoding mode to caller encoded speech data, Then caller speech data is sent to by terminal called by target access network equipment.

In a kind of possible implementation, whether access network equipment may determine that terminal called with voice has been locally created Data transmission link, if so, then obtained the 3rd speech data transmission can will be encoded according to the 3rd voice encoding and decoding mode To terminal called, if terminal called is set up with other access network equipments (i.e. target access network equipment) voice data transmission chain Road, then the 4th voice encoding and decoding mode that access network equipment can be consulted based on target access network equipment and terminal called, to master The 7th speech data that encoded speech data is obtained is, the 7th speech data is then sent to target access network equipment, and then 7th speech data can be sent to terminal called by target access network equipment.

In scheme shown in the embodiment of the present application, the environment that can apply to include an access network equipment, it can also answer For in the environment including multiple access network equipments., can be by institute when in applied to the environment including multiple access network equipments State speech data to send to target access network equipment, sent the speech data to described by the target access network equipment Terminal called.Access network equipment/target access network equipment can be changed to voice encoding and decoding mode, go for first Voice encoding and decoding mode two kinds of situations consistent with the 4th voice encoding and decoding mode and inconsistent, the first voice encoding and decoding mode and It is independent of each other between four voice encoding and decoding modes, namely ensure that voice encoding and decoding mode during voice data transmission is selected Independence, selects optimal voice encoding and decoding mode to carry out the transmission of speech data, it is ensured that the matter of voice conversation service all the time Amount.

Second aspect includes receiver, transmitter there is provided a kind of access network equipment, the access network equipment, by performing The method for instructing to realize the transmitting audio data that above-mentioned first aspect is provided.

The third aspect includes at least one module there is provided a kind of device of transmitting audio data, the device, and this at least one Individual module is used for the method for realizing the transmitting audio data that above-mentioned first aspect is provided.

The technique effect technology corresponding with first aspect that above-mentioned the embodiment of the present application second is obtained to the third aspect The technique effect that means are obtained is approximate, repeats no more herein.

The beneficial effect brought of technical scheme that the embodiment of the present application is provided is：

Scheme shown in the embodiment of the present application, first, the voice number that access network equipment directly can report calling terminal According to terminal called is sent to, speech data can be completed between calling terminal to terminal called without first passing through equipment of the core network Transmission.In addition, during voice data transmission, what the transmitting terminal of speech data can be consulted both with receiving terminal based on it Voice encoding and decoding mode is encoded to speech data, and then selection that can be independent in each stage of voice data transmission is received The best voice code encoding/decoding mode that can be supported of hair terminal device conversates the transmission of message, not by other third party's equipment The influence (not influenceed by the voice coding/decoding capability of equipment of the core network particularly) of voice coding/decoding capability, it is ensured that voice meeting The uniformity of call business quality and transmitting-receiving two-end capacity of equipment, can effectively improve voice conversation quality.

Brief description of the drawings

Fig. 1 is a kind of system framework schematic diagram of transmitting audio data in the embodiment of the present application；

Fig. 2 is a kind of structural representation of access network equipment in the embodiment of the present application；

Fig. 3 is a kind of step flow chart for the method for transmitting caller speech data in the embodiment of the present application；

Fig. 4 is a kind of step flow chart for the method for transmitting called speech data in the embodiment of the present application；

Fig. 5 is a kind of system framework schematic diagram of transmitting audio data in the embodiment of the present application；

Fig. 6 is a kind of structured flowchart of the device of transmitting audio data in the embodiment of the present application.

Embodiment

Below in conjunction with the accompanying drawing in the embodiment of the present application, the technical scheme in the embodiment of the present application is carried out clear, complete Site preparation is described, it is clear that described embodiment is some embodiments of the present application, rather than whole embodiments.Based on this Shen Please in embodiment, the every other implementation that those of ordinary skill in the art are obtained under the premise of creative work is not made Example, belongs to the scope of the application protection.

Reference picture 1, shows a kind of structural representation of the system of transmitting audio data in the embodiment of the present application.In this Shen Please be in a kind of application scenarios of embodiment, the system of transmitting audio data can at least include following equipment：Equipment of the core network, connect Log equipment (including base station), calling terminal, terminal called.Wherein, access network equipment can be connected with equipment of the core network, and with Calling terminal, terminal called connection.Wherein, access network equipment can with but be not limited only to be：Support single wireless network standard Access network equipment or the access network equipment for supporting mixed wireless network standard.As shown in Fig. 2 access network equipment can include connecing Device 110 and transmitter 120, in addition to processor 130 are received, receiver 110 and transmitter 120 can connect with processor 130 respectively Connect.Receiver 110 can be used for receive signal, receiver 110 can include but is not limited to antenna, one or more oscillators, In the hardware such as coupler, LNA (low noise amplifier, low-noise amplifier), duplexer, analog-digital converter, frequency converter One or more.Transmitter 120 can be used for send signal, transmitter 120 can include but is not limited to antenna, one or many The hardware such as individual oscillator, coupler, power amplifier (power amplifier, PA), duplexer, digital analog converter, frequency converter In one or more.In this application, receiver 110 and transmitter 120 can be used for the relevant treatment of transmitting audio data, Processor 130 can include one or more processing units；Processor can be general processor, including central processing unit (Central Processing Unit, abbreviation CPU), network processing unit (Network Processor, abbreviation NP) etc.；May be used also Be digital signal processor (DSP), application specific integrated circuit (ASIC), field programmable gate array (FPGA) or other can compile Journey logical device etc..Specifically, program can include program code, and program code includes computer-managed instruction.

The method for the transmitting audio data that all embodiments of the application are provided, can apply in various communication standards, respectively Plant one kind that communication standard can include but is not limited in following standards：Global system for mobile communications (Global System for Mobile Communication, abbreviation GSM), WCDMA (Wideband Code Division Multiple Access, abbreviation WCDMA) system, TD SDMA (Time Division-Synchronous Code Division Multiple Access, abbreviation TD-SCDMA) system, CDMA (Code Division Multiple Access, abbreviation CDMA) system, Long Term Evolution (Long Term Evolution, abbreviation LTE).

In the alternative embodiment of the application, access network equipment can at least include：Receiver, transmitter and processing Device.Wherein, receiver, transmitter can be provided commonly for the transmission of speech data；Processor can be used for realizing encoding and decoding speech The conversion process of mode.It should be noted that access network equipment can be a specific hardware device or based on soft The functional module that part is realized, or, the equipment realized jointly based on hardware device and functional module.

In the alternative embodiment of the application, access network equipment can at least include：Gateway device.Wherein, gateway is set It can include in standby：Transceiver and processor.Wherein, transceiver can be used for the transmitting-receiving process for realizing speech data；Processor It can be used for the conversion process for realizing voice encoding and decoding mode.It should be noted that gateway device can be one specific hard Part equipment or the functional module realized based on software, or, based on setting that hardware device and functional module are realized jointly It is standby.

In the alternative embodiment of the application, the method for transmitting audio data can with but be not limited only to be based on above-mentioned access Net equipment or gateway device are realized.

Reference picture 3, shows a kind of step flow chart of the method for transmitting audio data in the embodiment of the present application.In this Shen Please be in embodiment, with reference to above-mentioned Fig. 1, when the access network equipment in the system of transmitting audio data is (calling terminal and a quilt Terminal is made to connect same access network equipment) when, the specific handling process of the method for transmitting audio data can include following step Suddenly (the present embodiment is illustrated by taking the execution flow of access network equipment as an example)：

Step 301, access network equipment receives the first speech data.

Wherein, the first speech data is the caller voice encoded based on the first voice encoding and decoding mode that calling terminal is sent Data.

In the embodiment of the present application, the conversational equipment that calling subscribe's correspondence is used can be considered as calling terminal, be called and use The conversational equipment that family correspondence is used can be considered as terminal called.Session between calling subscribe and called subscriber can be simple It is divided into two big flows：Session request Establishing process and voice data transmission flow.

Wherein, implementing for session request Establishing process can be as follows：

When calling subscribe need with other called subscribers conversate when, can be dialed by calling terminal (such as mobile phone) by Cry terminal corresponding session number, initiate voice conversation request.Terminal called is after voice conversation request is received, Ke Yitong Cross any one appropriate advice method such as jingle bell and notify called subscriber.Called subscriber is led in the jingle bell for receiving terminal called transmission After knowing, it can realize that terminal called is asked voice conversation by triggering the confirmation option on terminal called (or confirming button) Confirmation.After terminal called completes the confirmation asked voice conversation, calling terminal completes session link with terminal called Foundation, and then can be conversated with dialogue-based link.With reference to above-mentioned Fig. 1, in the embodiment of the present application, session link can be with Refer to：Calling terminal → access network equipment → equipment of the core network.

Wherein, implementing for voice data transmission flow can be as follows：

After terminal called completes the confirmation to session request, in connection shape between calling terminal and terminal called State, calling subscribe can be spoken by calling terminal, complete the input of caller speech data.Calling terminal can be by calling subscribe The speech data of input reports access network equipment.For example, can be according to the first voice encoding and decoding mode to caller speech data Encoded, so as to obtain the first speech data, the first speech data is then reported into access network equipment.

The step can specifically be realized by receiver 110.

Step 302, access network equipment sends second speech data to equipment of the core network, and the 3rd voice is sent to terminal called Data.

Wherein, second speech data is the caller speech data encoded based on the second voice encoding and decoding mode, the 3rd voice Data are the caller speech data encoded based on the 3rd voice encoding and decoding mode, the first voice encoding and decoding mode, the second voice coder At least two differ in decoding process and the 3rd voice encoding and decoding mode.

In the embodiment of the present application, in order to ensure the transmitting audio data scheme of the embodiment of the present application and existing communication frame Caller speech data can be encoded and obtain the second voice by the compatibility of structure, access network equipment according to the second voice encoding and decoding mode Data, then report to equipment of the core network by second speech data, it is ensured that session timing, the meeting realized based on equipment of the core network side Talk about the normal execution of the other supplementary services related to voice conversation such as charging and session traffic monitoring.Further, since core net Do not made any change at equipment, based on existing processing, equipment of the core network, can be with after the processing of other supplementary services has been carried out Access network equipment will be back to based on second speech data.

As before, the flow of transmitting audio data can also comprise the following steps：In calling terminal by the first speech data Offer after access network equipment, access network equipment can be encoded based on the 3rd voice encoding and decoding mode to caller speech data The 3rd speech data is obtained, then sends the 3rd speech data to terminal called, and then, called subscriber can be by called whole Termination hears above-mentioned caller speech data, realizes once the transmission of complete caller speech data.

It can be seen that, the transfer process (calling terminal → access network equipment → core of the speech data used compared to prior art Heart net equipment → access network equipment → terminal called), in the embodiment of the present application, access network equipment is receiving calling terminal hair Terminal called can be forwarded directly to after the caller speech data sent, is set without being sent to core net in speech data It is standby, and receive equipment of the core network return speech data after forwarded again.

Access network equipment, can be first to language after the speech data encoded based on a certain voice encoding and decoding mode is received Sound data enter row decoding, and miscellaneous equipment is sent to after then speech data is encoded according to another voice encoding and decoding mode, Row decoding can not be entered to speech data, the current voice encoding and decoding mode of speech data is directly transcoded into another voice coder Decoding process, is then sent to miscellaneous equipment by the speech data after transcoding.

It should be noted that above-mentioned first voice encoding and decoding mode, the second voice encoding and decoding mode and the 3rd voice coder solution At least two differ in code mode, wherein, after speech ciphering equipment is received, voice coder can be realized by access network equipment The conversion process of decoding process.The speed upgraded in view of renewal of the equipment is different, generally the first voice encoding and decoding mode It is different with the second voice encoding and decoding mode, and access network equipment is when speech data is sent to terminal called, at least including as follows Two kinds of situations：1st, the 3rd voice encoding and decoding mode is consistent with the first voice encoding and decoding mode；2nd, the 3rd voice encoding and decoding mode with First voice encoding and decoding mode is inconsistent.Wherein, the 3rd voice encoding and decoding mode is to be adopted between access network equipment and terminal called Voice encoding and decoding mode；First voice encoding and decoding mode is the encoding and decoding speech side between access network equipment and terminal called Formula.

First, when the 3rd voice encoding and decoding mode is consistent with the first voice encoding and decoding mode, access network equipment can be direct Speech data is forwarded to terminal called, that is, sending first (the 3rd) speech data to terminal called.

Second, when the 3rd voice encoding and decoding mode and inconsistent the first voice encoding and decoding mode, access network equipment is by language Sound data are sent to can specifically include during terminal called：Access network equipment is by speech data according to the 3rd voice encoding and decoding mode Recompile and obtain the 3rd speech data, then send the 3rd speech data to terminal called.

Further, in the embodiment of the present application, access network equipment is supported to realize the conversion of voice encoding and decoding mode：Access Net equipment can receive the speech data of any one voice encoding and decoding mode coding, meanwhile, can be any one by what is received The speech data for planting voice encoding and decoding mode coding is converted to the speech data of any one other voice encoding and decoding mode coding. It can be seen that, in the embodiment of the present application, because access network equipment can realize the conversion to voice encoding and decoding mode, therefore, even if Calling terminal is different with the voice encoding and decoding mode that terminal called is supported, can also be by access network equipment to encoding and decoding speech Mode is changed, and then ensures the normal execution of voice conversation service, speech data calling terminal and terminal called it Between transmission no longer limited by voice encoding and decoding mode.It is called eventually especially since speech data is transferred to from calling terminal The process at end avoids equipment of the core network, so the voice coding/decoding capability of equipment of the core network will not be caused to voice conversation service Any influence and limitation.

The step can specifically be realized by transmitter 120.

Optionally, the transmitting procedure and the transmitting procedure principle of caller voice for being called voice are basically identical, corresponding processing Flow can be as shown in Figure 4：

Step 401, access network equipment receives the 4th speech data.

Wherein, the 4th speech data is the called voice encoded based on the 3rd voice encoding and decoding mode that terminal called is sent Data.

Wherein, established between calling terminal and terminal called after session connection, called subscriber can pass through terminal called Speak, complete the input of called speech data.The called speech data that terminal called can input called subscriber, which is reported, to be connect Log equipment.Specifically, can encode obtain the 4th voice to called speech data according to the 3rd voice encoding and decoding mode Data, then report access network equipment by the 4th speech data.

The step can specifically be realized by receiver 110.

Step 402, access network equipment sends the 5th speech data to equipment of the core network, and the 6th voice is sent to calling terminal Data.

Wherein, the 5th speech data is the called speech data encoded based on the second voice encoding and decoding mode, the 6th voice Data are the called speech data encoded based on the first voice encoding and decoding mode.

Wherein, called speech data is obtained the 4th voice number by terminal called according to the 3rd voice encoding and decoding mode coding According to after the 4th speech data then reported into access network equipment, access network equipment can be by called speech data according to Two voice encoding and decoding modes coding obtains the 5th speech data, and the 5th speech data is reported into equipment of the core network, in addition, connecing Log equipment can also be directly based upon the first voice encoding and decoding mode and called speech data encode obtaining the 6th voice number According to, it is then sent to calling terminal, and then, calling subscribe can hear above-mentioned called voice number by calling terminal According to.

The step can specifically be realized by transmitter 120.

Optionally, the encoding and decoding speech side used between calling terminal, terminal called, equipment of the core network and access network equipment Formula is voluntarily consulted two-by-two by the said equipment, and corresponding processing can be as follows：Access network equipment is according to calling terminal and access The voice encoding and decoding mode that net equipment is supported jointly, access network equipment and calling terminal are defined as by the first voice encoding and decoding mode Between voice data transmission link use voice encoding and decoding mode；Access network equipment is set according to equipment of the core network and access network The standby voice encoding and decoding mode supported jointly, by the second voice encoding and decoding mode be defined as access network equipment and equipment of the core network it Between voice data transmission link use voice encoding and decoding mode；Access network equipment is common according to terminal called and access network equipment With the voice encoding and decoding mode supported, the 3rd voice encoding and decoding mode is defined as to the language between access network equipment and terminal called The voice encoding and decoding mode that sound data transmission link is used.

In the embodiment of the present application, calling terminal can be supported itself when voice conversation service is set up in request Voice encoding and decoding mode is carried to be set up in request in voice conversation service, and active reporting is to access network equipment.Access network is set afterwards The voice encoding and decoding mode that the standby voice encoding and decoding mode that can be supported according to itself and calling terminal are supported determines that the two is total to With the common factor for the voice encoding and decoding mode supported, there may be multiple common factor results in the common factor.In the optional reality of the application Apply in example, the common factor result conduct of a corresponding voice quality grade highest can be selected from above-mentioned multiple common factor results The three the first voice encoding and decoding modes.It is of course also possible to select other common factors for meeting actual demand condition according to the actual requirements As a result as the first voice encoding and decoding mode, the present embodiment is not restricted to this.Similarly, with the first voice encoding and decoding mode really Constant current journey is similar, and access network equipment can hold consultation with equipment of the core network, terminal called respectively, determine the second encoding and decoding speech Mode and the 3rd voice encoding and decoding mode, are no longer described in detail herein.

The step can specifically be realized by processor 130.

Specifically, AMR-NB, AMR-WB, EVS-NB etc. can be a class voice encoding and decoding modes, per class voice coder Decoding process can be a submode set, including a variety of submodes divided according to data transmission bauds, for example, (such as place an order Position is kbps or kBit/s)：AMR-NB includes：12.2nd, 10.2,7.95,7.40,6.70,5.90,5.15,4.75 etc. 8 seed Mode；AMR-WB includes 23.85,23.05,19.85,18.25,15,85,14.25,12.65,8.85,6.60 etc. 9 seed sides Formula；EVS-NB includes 24.4,13.2,9.6,8,7.2,5.9 etc. 6 kinds of modes；EVS-WB include 128.0,96.0,64.0,48.0, 32.0th, 24.4,16.4,13.2,9.6,8.0,7.2,5.9 etc. 12 kinds of submodes, EVS-SWB include 128.0,96.0,64.0, 48.0th, 32.0,24.4,16.4,13.2,9.6 etc. 9 kinds of submodes；EVS-FB include 128.0,96.0,64.0,48.0,32.0, 24.4th, 16.4 etc. 7 kinds of submodes.

In the embodiment of the present application, if access network equipment and calling/called terminal support a certain class voice encoding and decoding mode Whole submodes, but equipment of the core network only supports which part submode.For example, equipment of the core network is only supported in AMR-WB 12.65-6.6 part submode, access network equipment and calling/called terminal support 23.85-6.6 part in AMR-WB Mode, now the embodiment of the present application be still applicable, i.e. the second voice coding modes can be AMR-WB 12.65kbps, the first He 3rd voice encoding and decoding mode uses AMR-WB 23.85kbps.

Understood based on the above-mentioned introduction to voice encoding and decoding mode, a corresponding language is being selected from multiple common factor results During the common factor result of sound session service optimal quality, can with but be not limited only to be based on EVS-FB>EVS-SWB>EVS-WB>EVS-NB >AMR-WB>AMR-NB sequence is selected, for example, when common factor result is respectively：During EVS-FB and AMR-WB two ways, EVS-FB can be based on>AMR-WB ordering rule, selects EVS-FB modes as the voice encoding and decoding mode of negotiation, to ensure The voice encoding and decoding mode of negotiation is the voice quality grade highest voice that equipment can be supported during transmitting audio data Code encoding/decoding mode.

Optionally, it is contemplated that the voice encoding and decoding mode between equipment of the core network and access network equipment does not influence whole voice The height of voice quality in session service, can select the junior voice encoding and decoding mode of voice quality as the second voice Code encoding/decoding mode, accordingly, the voice quality grade of the second voice encoding and decoding mode are less than or equal to the first encoding and decoding speech side The voice quality grade of formula and the 3rd voice encoding and decoding mode.

Further, it is second that can directly specify the voice encoding and decoding mode between access network equipment and equipment of the core network Voice encoding and decoding mode.Wherein, the selection of the second voice encoding and decoding mode can be determined based on following principle：Second voice coder solution The voice encoding and decoding mode for the low speech quality grade that code mode is all supported for current any equipment of the core network, for example, the first language Sound code encoding/decoding mode can be AMR-NB voice encoding and decoding modes most basic at present.

In above-mentioned application scenarios, mainly for shown in Fig. 1, calling terminal and terminal called and same access network equipment The scene of connection is illustrated.In the another application scene of the embodiment of the present application, calling terminal and terminal called can be with The different access network equipment of connection.

For example, when calling terminal and terminal called are respectively at different serving cells, calling terminal and terminal called The access network equipment under different service cells can be connected respectively.

In another example, when different radio communication standards are respectively adopted in calling terminal and terminal called, calling terminal and by Terminal is made to connect the different corresponding access network equipments of radio communication standard respectively.Wherein, conventional radio communication system Formula is included but are not limited to：2G (2-Generation wireless telephone technology, Generation Mobile Telecommunication System Technology), 3G (3rd-Generation mobile communication, 3rd generation mobile communication technology), 4G (4th- Generation mobile communication, forth generation mobile communication technology) and 5G (5th-Generation mobile Communication, the 5th third-generation mobile communication technology) etc..

, can be with it should be noted that the situation of different communication standards is respectively adopted for calling terminal and terminal called It is connected respectively with calling terminal and terminal called using two access network equipments, it would however also be possible to employ support the same of combined system Access network equipment connects calling terminal and terminal called.For the feelings using the same access network equipment for supporting combined system Condition, the idiographic flow of corresponding voice data processing method is referred to the description of above-described embodiment, will not be repeated here.

Below, the scene for connecting different access network equipments respectively for calling terminal and terminal called is carried out specifically It is bright.

Reference picture 5, shows the structural representation of another voice data processing system in the embodiment of the present application.Should With under scene, voice data processing system can at least include following equipment：Equipment of the core network, access network equipment, target access Net equipment, calling terminal, terminal called and base station.Wherein, equipment of the core network is set with access network equipment and target access network respectively Standby connection, access network equipment is connected with calling terminal by base station and is connected with target access network equipment, target access network equipment It is connected by base station with terminal called.

In the alternative embodiment of the application, when calling terminal and terminal called connect different access network equipments respectively When, it is above-mentioned to send speech data to specifically be as follows the step of terminal called：If terminal called is built with access network equipment Vertical to have voice data transmission link, access network equipment then sends the 3rd speech data to terminal called, if terminal called and mesh Tag splice log equipment, which is set up, voice data transmission link, and access network equipment then sends the first voice number to target access network equipment According to the first speech data is used to indicate that target access network equipment sends the 7th speech data to terminal called.

Wherein, the 7th speech data is the caller speech data encoded based on the 4th voice encoding and decoding mode, the 4th voice Code encoding/decoding mode is the encoding and decoding speech side that target access network equipment is supported jointly according to terminal called and target access network equipment What formula was determined, the voice encoding and decoding mode that the voice data transmission link between target access network equipment and terminal called is used.

In the embodiment of the present application, access network equipment, can be with after the first speech data of calling terminal upload is received Judge voice data transmission link whether has been set up between access network equipment and terminal called, if voice data transmission link is deposited Then access network equipment can be decoded to the first speech data, then by caller speech data according to the 3rd encoding and decoding speech side Formula encode obtaining the 3rd speech data, is then sent to terminal called.And if voice data transmission link is not present, i.e., Terminal called is set up with other access network equipments (i.e. target access network equipment) voice data transmission link, then will can receive To the first speech data be transmitted directly to target access network equipment.Target access network equipment is receiving the first voice number afterwards According to rear, it can be decoded, then obtain the 7th speech data according to the 4th voice encoding and decoding mode coding, and by the 7th Speech data is sent to terminal called.

Additionally, it is appreciated that before access network equipment sends caller speech data to target access network equipment, can also be first Caller speech data encode according to the 4th voice encoding and decoding mode to obtain the 7th speech data, so, target access network 7th speech data directly can be sent to terminal called by equipment after the 7th speech data is received.Wherein, target is accessed The flow that net equipment sends speech data to terminal called be referred under the corresponding application scenarios of above-mentioned Fig. 1 access network equipment to Terminal called sends the flow of speech data, and the present embodiment will not be repeated here.

It should be noted that in actual applications, when access network equipment sends speech data to target access network equipment, The following two kinds situation can at least be included：1st, the first voice encoding and decoding mode is consistent with the 4th voice encoding and decoding mode；2nd, first Voice encoding and decoding mode and the 4th voice encoding and decoding mode are inconsistent.Wherein, the first voice encoding and decoding mode is：Access network equipment Voice encoding and decoding mode between calling terminal；4th voice encoding and decoding mode is：Target access network equipment and terminal called Between voice encoding and decoding mode.When the first voice encoding and decoding mode is consistent with the 4th voice encoding and decoding mode, access network is set It is standby speech data to be directly sent to by terminal called by target access network equipment, without carrying out voice encoding and decoding mode again Conversion, and when the first voice encoding and decoding mode and inconsistent the 4th voice encoding and decoding mode, can by access network equipment or Person's target access network equipment completes the conversion of voice encoding and decoding mode.

Similarly, the transfer process for being called speech data can be as follows：Terminal called sends to target access network equipment and is based on The called speech data of 4th voice encoding and decoding mode coding, target access network equipment sends to access network equipment and is based on the first language The called speech data of the voice encoding and decoding mode of sound code encoding/decoding mode/the 4th coding, and sent to equipment of the core network and be based on the 5th The called speech data of voice encoding and decoding mode coding, access network equipment sends to calling terminal and is based on the first encoding and decoding speech side The called speech data of formula coding.

Wherein, the 5th voice encoding and decoding mode is the language that target access network equipment is supported jointly according to itself and equipment of the core network What sound code encoding/decoding mode was determined, the voice that the voice data transmission link between target access network equipment and equipment of the core network is used Code encoding/decoding mode.

In addition, in above process, due to not made any change at equipment of the core network, based on existing processing, core net Equipment can be carried out caller speech data according to the 5th voice encoding and decoding mode after the processing of other supplementary services has been carried out Coding, is then forwarded to target access network equipment, similarly, can enter called speech data according to the second voice encoding and decoding mode Row coding, is then forwarded to access network equipment.

Above-mentioned steps specifically can jointly be realized by processor 130 and transmitter 120.

The voice data processing method of the embodiment of the present application, the caller that access network equipment directly can report calling terminal Speech data is sent to terminal called, avoids equipment of the core network, complete caller speech data from calling terminal to terminal called it Between transmission.In other words, in the method for the transmitting audio data of the embodiment of the present application, speech data is in voice conversation service Each stage in transmission it is relatively independent：The language that speech data is used when being transmitted between access network equipment and calling terminal Sound code encoding/decoding mode depends on the voice encoding and decoding mode for consulting to determine between access network equipment and calling terminal；Speech data exists The voice encoding and decoding mode used when being transmitted between access network equipment and terminal called depends on access network equipment with being called eventually Consult the voice encoding and decoding mode determined between end.It can be seen that, realized by the voice data processing method of the embodiment of the present application Voice conversation service, the best voice that selection terminal that can be independent in each stage of voice data transmission can be supported compiles solution Code mode carries out the transmission of speech data, do not influenceed by the voice coding/decoding capability of other third party devices (particularly not by The influence of the voice coding/decoding capability of equipment of the core network), it is ensured that voice conversation service quality and the uniformity of terminal capability, have Effect improves the quality of the conversation of voice conversation service.

Further, the method for the transmitting audio data of the embodiment of the present application, access network equipment can press speech data According to setting the second voice encoding and decoding mode coding after report to equipment of the core network.Wherein, it is general, can be by the second voice coder Decoding process is considered as the voice encoding and decoding mode for the low speech quality grade that existing arbitrary equipment is all supported.It can be seen that, at this Apply in embodiment, access network equipment can recode according to the second voice encoding and decoding mode to speech data, and then, connect It can be interacted between log equipment and equipment of the core network by the voice encoding and decoding mode of low speech quality grade, the application The method of the transmitting audio data of embodiment can be realized on the basis of existing communication framework is not changed access network equipment with The transmission of speech data between equipment of the core network, it is ensured that the method for the transmitting audio data of the embodiment of the present application is led to existing Believe the compatibility of framework, it is ensured that the normal execution for other the every business realized based on equipment of the core network side, such as：Based on core Other supplementary services related to voice conversation such as session timing, session charging and session traffic monitoring that net equipment side is realized.

In addition, the voice encoding and decoding mode between each equipment can be based on the equipment energy carried in existing communication process Force information (voice coding/decoding capability) is determined, it is not necessary to is increased extra flow, is realized simple and convenient.

Above-mentioned all optional technical schemes, can form the alternative embodiment of the application, herein no longer using any combination Repeat one by one.

Fig. 6 is the block diagram of the device for the transmitting audio data that the embodiment of the present disclosure is provided.The device of the transmitting audio data Can being implemented in combination with as some or all of in device by software, hardware or both.The embodiment of the present disclosure is provided The device of transmitting audio data can realize the flow described in embodiment of the present disclosure Fig. 3, the device bag of the transmitting audio data Include the first receiving module 601, the first sending module 602, the second receiving module 603, the second sending module 604, determining module 605：

First receiving module 601, for receiving the first speech data, first speech data is what calling terminal was sent The caller speech data encoded based on the first voice encoding and decoding mode；

First sending module 602, for sending second speech data to equipment of the core network, the second speech data is base The caller speech data encoded in the second voice encoding and decoding mode, the 3rd speech data, described the are sent to terminal called Three speech datas are the caller speech data encoded based on the 3rd voice encoding and decoding mode, wherein, first voice coder At least two differ in decoding process, second voice encoding and decoding mode and the 3rd voice encoding and decoding mode.

Optionally, described device also includes：

Second receiving module 603, for receiving the 4th speech data, the 4th speech data is sent out for the terminal called The called speech data encoded based on the 3rd voice encoding and decoding mode sent；

Second sending module 604, for sending the 5th speech data, the 5th speech data to the equipment of the core network For the called speech data encoded based on second voice encoding and decoding mode, the 6th voice is sent to the calling terminal Data, the 6th speech data is the called speech data encoded based on first voice encoding and decoding mode.

Optionally, described device also includes determining module 605, is used for：

The voice encoding and decoding mode supported jointly according to the calling terminal and the access network equipment, by first language The language that the voice data transmission link that sound code encoding/decoding mode is defined as between the access network equipment and the calling terminal is used Sound code encoding/decoding mode；

The voice encoding and decoding mode supported jointly according to the equipment of the core network and the access network equipment, by described second The voice data transmission link that voice encoding and decoding mode is defined as between the access network equipment and the equipment of the core network is used Voice encoding and decoding mode；

The voice encoding and decoding mode supported jointly according to the terminal called and the access network equipment, by the 3rd language The language that the voice data transmission link that sound code encoding/decoding mode is defined as between the access network equipment and the terminal called is used Sound code encoding/decoding mode.

Optionally, the voice quality grade of second voice encoding and decoding mode is less than or equal to the first encoding and decoding speech side The voice quality grade of formula and the 3rd voice encoding and decoding mode.

Optionally, the access network equipment also includes determining module 605, wherein：

The determining module 605 is used to judging setting up that have voice data transmission link is the access network with terminal called Equipment or target access network equipment, when the determining module 605 is judged as that terminal called has language with access network equipment foundation Sound data transmission link, first sending module 602 is used to send the 3rd speech data to the terminal called；

When the determining module 605 is judged as that terminal called has voice data transmission with target access network equipment foundation Link, first sending module 602 is used to send first speech data, described first to the target access network equipment Speech data is used to indicate that the target access network equipment sends the 7th speech data, the 7th voice to the terminal called Data are the caller speech data encoded based on the 4th voice encoding and decoding mode, wherein, the 4th encoding and decoding speech side Formula is the voice coder solution that the target access network equipment is supported jointly according to the terminal called and the target access network equipment Code mode determines, the voice that the voice data transmission link between the target access network equipment and the terminal called is used Code encoding/decoding mode.

The determining module is used to judging that have voice data transmission link to be that the access network is set with terminal called foundation Standby or target access network equipment, when the determining module is judged as that terminal called has speech data with access network equipment foundation Transmission link, first sending module is used to send the 3rd speech data to the terminal called；

When the determining module is judged as that terminal called has voice data transmission chain with target access network equipment foundation Road, first sending module is used to send the 7th speech data, the 7th speech data to the target access network equipment For indicating the target access network equipment to terminal called transmission the 7th speech data, the 7th speech data For the caller speech data encoded based on the 4th voice encoding and decoding mode, wherein, the 4th voice encoding and decoding mode is The encoding and decoding speech side that the target access network equipment is supported jointly according to the terminal called and the target access network equipment What formula was determined, the voice coder solution that the voice data transmission link between the target access network equipment and the terminal called is used Code mode.

It should be noted that above-mentioned determining module 605 can be realized by processor 130, the first receiving module 601, Two receiving modules 603 can realize by receiver 110, or, realized with reference to processor 130, the first sending module 602, Two sending modules 604 can realize by transmitter 120, or, realized with reference to processor 130.

In the embodiment of the present application, access network equipment can avoid equipment of the core network, the speech data that calling terminal is reported Terminal called is transmitted directly to, the transmission of speech data is completed.And then the transmission of speech data can be caused not set by core net The limitation of standby voice coding/decoding capability, it is ensured that the independence during voice data transmission, can select terminal-pair to prop up The best voice code encoding/decoding mode held carries out the transmission of speech data, it is ensured that the voice encoding and decoding mode of selection should with terminal-pair Voice coding/decoding capability uniformity, it is ensured that voice conversation quality.

It should be noted that：The device for the transmitting audio data that above-described embodiment is provided is realizing the transmission of speech data When, only with the division progress of above-mentioned each functional module for example, in practical application, as needed can divide above-mentioned functions With by different functional module completions, i.e., the internal structure of device is divided into different functional modules, to complete above description All or part of function.In addition, device and the side of transmitting audio data of the transmitting audio data that above-described embodiment is provided Method embodiment belongs to same design, and it implements process and refers to embodiment of the method, repeats no more here.

One of ordinary skill in the art will appreciate that realizing that all or part of step of above-described embodiment can be by hardware To complete, the hardware of correlation can also be instructed to complete by program, described program can be stored in a kind of computer-readable In storage medium, storage medium mentioned above can be read-only storage, disk or CD etc..

The foregoing is only a kind of embodiment of the application, not to limit the application, it is all in spirit herein and Within principle, any modification, equivalent substitution and improvements made etc. should be included within the protection domain of the application.

Claims

1. a kind of method of transmitting audio data, it is characterised in that methods described includes：

Access network equipment receives the first speech data, and first speech data is calling terminal transmission based on the first voice coder The caller speech data of decoding process coding；

The access network equipment sends second speech data to equipment of the core network, and the second speech data is based on the second voice The caller speech data of code encoding/decoding mode coding；

The access network equipment sends the 3rd speech data to terminal called, and the 3rd speech data is based on the 3rd voice coder The caller speech data of decoding process coding, wherein, first voice encoding and decoding mode, second encoding and decoding speech At least two differ in mode and the 3rd voice encoding and decoding mode.

2. according to the method described in claim 1, it is characterised in that methods described also includes：

The access network equipment receives the 4th speech data, the 4th speech data for the terminal called send based on institute State the called speech data of the 3rd voice encoding and decoding mode coding；

The access network equipment sends the 5th speech data to the equipment of the core network, and the 5th speech data is based on described The called speech data of second voice encoding and decoding mode coding；

The access network equipment sends the 6th speech data to the calling terminal, and the 6th speech data is based on described the The called speech data of one voice encoding and decoding mode coding.

3. method according to claim 1 or 2, it is characterised in that methods described also includes：

The voice encoding and decoding mode that the access network equipment is supported jointly according to the calling terminal and the access network equipment, will First voice encoding and decoding mode is defined as the voice data transmission chain between the access network equipment and the calling terminal The voice encoding and decoding mode that road is used；

The voice encoding and decoding mode that the access network equipment is supported jointly according to the equipment of the core network and the access network equipment, The speech data second voice encoding and decoding mode being defined as between the access network equipment and the equipment of the core network is passed The voice encoding and decoding mode that transmission link is used；

The voice encoding and decoding mode that the access network equipment is supported jointly according to the terminal called and the access network equipment, will 3rd voice encoding and decoding mode is defined as the voice data transmission chain between the access network equipment and the terminal called The voice encoding and decoding mode that road is used.

4. the method according to any one of claims 1 to 3, it is characterised in that the language of second voice encoding and decoding mode Voice quality grade of the sound quality grade less than or equal to the first voice encoding and decoding mode and the 3rd voice encoding and decoding mode.

5. the method according to any one of Claims 1-4, it is characterised in that the access network equipment is sent out to terminal called The 3rd speech data is sent, including：

If terminal called is set up with the access network equipment voice data transmission link, the access network equipment is then to described Terminal called sends the 3rd speech data；

Methods described also includes：

If terminal called is set up with target access network equipment voice data transmission link, the access network equipment is then to described Target access network equipment sends first speech data, and first speech data is used to indicate the target access network equipment The 7th speech data is sent to the terminal called, the 7th speech data is to be encoded based on the 4th voice encoding and decoding mode The caller speech data, wherein, the 4th voice encoding and decoding mode is the target access network equipment according to described called What the voice encoding and decoding mode that terminal and the target access network equipment are supported jointly was determined, the target access network equipment and institute State the voice encoding and decoding mode that the voice data transmission link between terminal called is used.

6. the method according to any one of Claims 1-4, it is characterised in that the access network equipment is sent out to terminal called The 3rd speech data is sent, including：

Methods described also includes：

If terminal called is set up with target access network equipment voice data transmission link, the access network equipment is then to described Target access network equipment sends the 7th speech data, and the 7th speech data is used to indicate the target access network equipment to institute State terminal called and send the 7th speech data, the 7th speech data is to be encoded based on the 4th voice encoding and decoding mode The caller speech data, wherein, the 4th voice encoding and decoding mode is the target access network equipment according to described called What the voice encoding and decoding mode that terminal and the target access network equipment are supported jointly was determined, the target access network equipment and institute State the voice encoding and decoding mode that the voice data transmission link between terminal called is used.

7. a kind of access network equipment, it is characterised in that the access network equipment includes receiver, transmitter, wherein：

The receiver, for receiving the first speech data, first speech data is calling terminal transmission based on first The caller speech data of voice encoding and decoding mode coding；

The transmitter, for sending second speech data to equipment of the core network, the second speech data is based on the second language The caller speech data of sound code encoding/decoding mode coding, the 3rd speech data, the 3rd voice number are sent to terminal called According to the caller speech data to be encoded based on the 3rd voice encoding and decoding mode, wherein, first voice encoding and decoding mode, At least two differ in second voice encoding and decoding mode and the 3rd voice encoding and decoding mode.

8. access network equipment according to claim 7, it is characterised in that the receiver, is additionally operable to receive the 4th voice Data, the 4th speech data is being called based on the 3rd voice encoding and decoding mode coding that the terminal called is sent Speech data；

The transmitter, be additionally operable to the equipment of the core network send the 5th speech data, the 5th speech data be based on The called speech data of the second voice encoding and decoding mode coding, the 6th speech data is sent to the calling terminal, 6th speech data is the called speech data encoded based on first voice encoding and decoding mode.

9. the access network equipment according to claim 7 or 8, it is characterised in that the access network equipment also includes processor, For：

The voice encoding and decoding mode supported jointly according to the calling terminal and the access network equipment, by first voice coder The voice coder that the voice data transmission link that decoding process is defined as between the access network equipment and the calling terminal is used Decoding process；

The voice encoding and decoding mode supported jointly according to the equipment of the core network and the access network equipment, by second voice The language that the voice data transmission link that code encoding/decoding mode is defined as between the access network equipment and the equipment of the core network is used Sound code encoding/decoding mode；

The voice encoding and decoding mode supported jointly according to the terminal called and the access network equipment, by the 3rd voice coder The voice coder that the voice data transmission link that decoding process is defined as between the access network equipment and the terminal called is used Decoding process.

10. the access network equipment according to any one of claim 7 to 9, it is characterised in that the second encoding and decoding speech side The voice quality grade of formula is less than or equal to voice quality of the first voice encoding and decoding mode and the 3rd voice encoding and decoding mode etc. Level.

11. the access network equipment according to any one of claim 7 to 10, it is characterised in that the access network equipment is also wrapped Processor is included, wherein：

The processor is used to judging setting up that have voice data transmission link is the access network equipment or mesh with terminal called Tag splice log equipment, when the processor is judged as that terminal called has voice data transmission chain with access network equipment foundation Road, the transmitter is used to send the 3rd speech data to the terminal called；

It is described when the processor is judged as that terminal called has voice data transmission link with target access network equipment foundation Transmitter is used to send first speech data to the target access network equipment, and first speech data is used to indicate institute State target access network equipment and send the 7th speech data to the terminal called, the 7th speech data is based on the 4th voice The caller speech data of code encoding/decoding mode coding, wherein, the 4th voice encoding and decoding mode is the target access network What the voice encoding and decoding mode that equipment is supported jointly according to the terminal called and the target access network equipment was determined, the mesh The voice encoding and decoding mode that voice data transmission link between tag splice log equipment and the terminal called is used.

12. the method according to any one of claim 7 to 10, it is characterised in that the access network equipment also includes processing Device, wherein：

It is described when the processor is judged as that terminal called has voice data transmission link with target access network equipment foundation Transmitter is used to send the 7th speech data to the target access network equipment, and the 7th speech data is used to indicate the mesh Tag splice log equipment sends the 7th speech data to the terminal called, and the 7th speech data is based on the 4th voice The caller speech data of code encoding/decoding mode coding, wherein, the 4th voice encoding and decoding mode is the target access network What the voice encoding and decoding mode that equipment is supported jointly according to the terminal called and the target access network equipment was determined, the mesh The voice encoding and decoding mode that voice data transmission link between tag splice log equipment and the terminal called is used.

13. a kind of device of transmitting audio data, it is characterised in that described device includes：

First receiving module, for receiving the first speech data, first speech data be calling terminal send based on the The caller speech data of one voice encoding and decoding mode coding；

First sending module, for sending second speech data to equipment of the core network, the second speech data is based on second The caller speech data of voice encoding and decoding mode coding, the 3rd speech data, the 3rd voice are sent to terminal called Data are the caller speech data encoded based on the 3rd voice encoding and decoding mode, wherein, the first encoding and decoding speech side At least two differ in formula, second voice encoding and decoding mode and the 3rd voice encoding and decoding mode.

14. device according to claim 13, it is characterised in that described device also includes：

Second receiving module, for receiving the 4th speech data, the 4th speech data is the base that the terminal called is sent The called speech data encoded in the 3rd voice encoding and decoding mode；

Second sending module, for the equipment of the core network send the 5th speech data, the 5th speech data be based on The called speech data of the second voice encoding and decoding mode coding, the 6th speech data is sent to the calling terminal, 6th speech data is the called speech data encoded based on first voice encoding and decoding mode.

15. the device according to claim 13 or 14, it is characterised in that described device also includes determining module, is used for：

16. the device according to any one of claim 13 to 15, it is characterised in that second voice encoding and decoding mode Voice quality grade of the voice quality grade less than or equal to the first voice encoding and decoding mode and the 3rd voice encoding and decoding mode.

17. the device according to any one of claim 13 to 16, it is characterised in that the access network equipment also includes determining Module, wherein：

The determining module be used for judge with terminal called set up have voice data transmission link be the access network equipment or Target access network equipment, when the determining module is judged as that terminal called has voice data transmission with access network equipment foundation Link, first sending module is used to send the 3rd speech data to the terminal called；

When the determining module is judged as that terminal called has voice data transmission link, institute with target access network equipment foundation Stating the first sending module is used to send first speech data to the target access network equipment, and first speech data is used Send the 7th speech data in indicating the target access network equipment to the terminal called, the 7th speech data be based on The caller speech data of 4th voice encoding and decoding mode coding, wherein, the 4th voice encoding and decoding mode is the mesh The voice encoding and decoding mode that tag splice log equipment is supported jointly according to the terminal called and the target access network equipment is determined , the encoding and decoding speech side that the voice data transmission link between the target access network equipment and the terminal called is used Formula.

18. the method according to any one of claim 13 to 16, it is characterised in that the access network equipment also includes determining Module, wherein：

When the determining module is judged as that terminal called has voice data transmission link, institute with target access network equipment foundation Stating the first sending module is used to send the 7th speech data to the target access network equipment, and the 7th speech data is used to refer to Show that the target access network equipment sends the 7th speech data to the terminal called, the 7th speech data be based on The caller speech data of 4th voice encoding and decoding mode coding, wherein, the 4th voice encoding and decoding mode is the mesh The voice encoding and decoding mode that tag splice log equipment is supported jointly according to the terminal called and the target access network equipment is determined , the encoding and decoding speech side that the voice data transmission link between the target access network equipment and the terminal called is used Formula.