CN106804031B

CN106804031B - Voice conversion method and device

Info

Publication number: CN106804031B
Application number: CN201510844605.3A
Authority: CN
Inventors: 尚宇翔; 王亚晨; 张剑寅; 姜怡; 宋月; 李继; 于青
Original assignee: China Mobile Communications Group Co Ltd
Current assignee: China Mobile Communications Group Co Ltd
Priority date: 2015-11-26
Filing date: 2015-11-26
Publication date: 2020-08-07
Anticipated expiration: 2035-11-26
Also published as: CN106804031A

Abstract

The invention discloses a voice conversion method and a voice conversion device, wherein the method comprises the following steps: receiving a call request of a calling terminal, wherein the call request at least carries identification information of the calling terminal and identification information of a called terminal; judging whether the calling terminal signs a voice translation service or not according to the identification information of the calling terminal to obtain a first judgment result; when the first judgment result shows that the calling terminal signs the voice translation service, acquiring the position information of the calling terminal and the position information of the called terminal according to the identification information of the calling terminal and the identification information of the called terminal; determining the content of prompt information according to the position information of the calling terminal and the position information of the called terminal, wherein the prompt information is used for prompting a calling user whether to use a voice translation service in the call process; and outputting the prompt information.

Description

Voice conversion method and device

Technical Field

The present invention relates to voice conversion technologies, and in particular, to a voice conversion method and apparatus.

Background

The existing voice system does not process the media, but under a plurality of scenes, especially the intercommunication and roaming scenes among the international, because the language difference exists between the calling person or the visitor and the called person, the content which the user wants to convey can not be accurately and effectively expressed. The existing voice translation system only has a general concept, the implementation details in the voice system are not clearly defined, and for some scenes, a user may adopt the same language as a called party and does not need translation, and the prior art does not have the right of selection for the user; the existing voice system does not have the capability of voice translation, namely, the capability of converting the voice of a visitor or an initiator into the language of a called party in real time, and the deployment mode of the voice conversion gateway and the interaction process with the network are not defined.

Disclosure of Invention

In view of this, embodiments of the present invention provide a voice conversion method and apparatus for solving at least one problem in the prior art, which can perform a function of performing or not performing voice translation according to the intention of an originator, and complete a function of translating the voice of a caller or a visited user.

The technical scheme of the embodiment of the invention is realized as follows:

in a first aspect, an embodiment of the present invention provides a voice conversion method, where the method includes:

receiving a call request of a calling terminal, wherein the call request at least carries identification information of the calling terminal and identification information of a called terminal;

judging whether the calling terminal signs a voice translation service or not according to the identification information of the calling terminal to obtain a first judgment result;

when the first judgment result shows that the calling terminal signs the voice translation service, acquiring the position information of the calling terminal and the position information of the called terminal according to the identification information of the calling terminal and the identification information of the called terminal;

determining the content of prompt information according to the position information of the calling terminal and the position information of the called terminal, wherein the prompt information is used for prompting a calling user whether to use a voice translation service in the call process;

and outputting the prompt information to the calling terminal.

In a second aspect, an embodiment of the present invention provides a voice conversion apparatus, including a first receiving unit, a first judging unit, a first obtaining unit, a determining unit, and an output unit, where:

the first receiving unit is used for receiving a call request of a calling terminal, wherein the call request at least carries identification information of the calling terminal and identification information of a called terminal;

the first judging unit is used for judging whether the calling terminal signs a voice translation service or not according to the identification information of the calling terminal to obtain a first judging result;

the determining unit is configured to, when the first determination result indicates that the calling terminal signs the voice translation service, obtain location information of the calling terminal and location information of the called terminal according to the identification information of the calling terminal and the identification information of the called terminal;

the output unit is used for determining the content of prompt information according to the position information of the calling terminal and the position information of the called terminal, wherein the prompt information is used for prompting the calling user whether to use a voice translation service in the conversation process;

and the output unit is used for outputting the prompt information to the calling terminal.

The embodiment of the invention provides a voice conversion method and a voice conversion device, wherein a call request of a calling terminal is received, and the call request at least carries identification information of the calling terminal and identification information of a called terminal; judging whether the calling terminal signs a voice translation service or not according to the identification information of the calling terminal to obtain a first judgment result; when the first judgment result shows that the calling terminal signs the voice translation service, acquiring the position information of the calling terminal and the position information of the called terminal according to the identification information of the calling terminal and the identification information of the called terminal; determining the content of prompt information according to the position information of the calling terminal and the position information of the called terminal, wherein the prompt information is used for prompting a calling user whether to use a voice translation service in the call process; outputting the prompt information; therefore, the function of voice translation can be carried out or not carried out according to the intention of the initiator, and the function of voice translation of the calling or visiting user can be completed.

Drawings

FIG. 1-1 is a schematic diagram of a network architecture according to an embodiment of the present invention;

fig. 1-2 are schematic flow charts illustrating a voice conversion method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a flow chart of a second voice conversion method according to an embodiment of the present invention;

fig. 3 is a schematic diagram of a network architecture in an IMS scenario according to a fourth embodiment of the present invention;

FIG. 4 is a schematic diagram of the structure of a fifth voice conversion device according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of a sixth voice conversion device according to an embodiment of the present invention.

Detailed Description

In order to solve the problems in the prior art, in the embodiment of the invention, a corresponding voice conversion gateway is deployed, and the gateway can perform the function of voice translation or not according to the intention of an initiator, so that the function of voice translation of a calling user or a visiting user is completed, and the smooth conversation with the user at a visiting place can be performed in real time on line.

For more clearly describing the embodiments of the present invention, the present invention provides a schematic network architecture, fig. 1-1 is a schematic network architecture of the embodiments of the present invention, as shown in fig. 1-1, the network architecture includes an originating terminal 11, a media gateway 12, a service platform 13, a voice conversion gateway 14, and a calling terminal 15, wherein: the originating terminal 11 is the originating terminal of speech. In the prior art, the service platform 13 cannot be directly connected with the called terminal 15, and the service platform and the called terminal 15 are always connected through a voice conversion gateway. In the network architecture provided by the embodiment of the invention, the service platform is directly connected with the called terminal.

The technical solution of the present invention is further elaborated below with reference to the drawings and the specific embodiments.

Example one

In order to solve the foregoing technical problem, an embodiment of the present invention provides a voice conversion method, which is applied to a voice conversion apparatus, and the functions implemented by the method can be implemented by a processor in the apparatus calling a program code, which of course can be stored in a computer storage medium, and the apparatus at least includes the processor and the storage medium.

Fig. 1-2 are schematic diagrams illustrating a flow chart of a voice conversion method according to an embodiment of the present invention, as shown in fig. 1-2, the voice conversion method includes:

step S101, receiving a calling request of a calling terminal, wherein the calling request at least carries identification information of the calling terminal and identification information of a called terminal;

step S102, judging whether the calling terminal signs a voice translation service or not according to the identification information of the calling terminal to obtain a first judgment result;

step S103, when the first judgment result shows that the calling terminal signs the voice translation service, acquiring the position information of the calling terminal and the position information of the called terminal according to the identification information of the calling terminal and the identification information of the called terminal;

step S104, determining the content of prompt information according to the position information of the calling terminal and the position information of the called terminal, wherein the prompt information is used for prompting the calling user whether to use a voice translation service in the call process;

and step S105, outputting the prompt information to the calling terminal.

In the embodiment of the invention, the position information of the calling terminal comprises the attribution position information of the calling terminal and the current position information of the calling terminal, and the position information of the called terminal comprises the attribution position information of the called terminal and the current position information of the called terminal.

In this embodiment of the present invention, in step S104, the determining content of the prompt information according to the location information of the calling terminal and the location information of the called terminal includes: step S141, judging whether the attribution information of the called terminal is consistent with the current position information or not, and obtaining a third judgment result; and step S142, determining the content of the prompt message according to the third judgment result and the attribution of the calling terminal.

Here, in step S142, the determining the content of the prompt information according to the third determination result and the attribution of the calling terminal includes: when the third judgment result shows that the attribution information of the called terminal is consistent with the current position information, determining the content of the prompt information at least comprises the following steps: and the voice of the calling terminal is translated into the language corresponding to the home location information of the called terminal from the language corresponding to the home location information of the calling terminal. When the third judgment result shows that the home location information of the called terminal is inconsistent with the current location information, determining the content of the prompt information at least comprises: and translating the voice of the calling terminal from the language corresponding to the attribution information of the calling terminal into the language corresponding to the attribution information of the called terminal without using a voice translation service, and translating the voice of the calling terminal from the language corresponding to the attribution information of the calling terminal into the language corresponding to the current position information of the called terminal.

Example two

Based on the first embodiment, the embodiment of the present invention provides a voice conversion method, which is applied in a voice conversion device, and the functions implemented by the method can be implemented by a processor in the device calling a program code, although the program code can be stored in a computer storage medium, and the device at least includes a processor and a storage medium.

Fig. 2 is a schematic flow chart illustrating an implementation process of a two-voice conversion method according to an embodiment of the present invention, as shown in fig. 2, the voice conversion method includes:

and step S105, outputting the prompt information to the calling terminal.

Step S106, receiving response information from the calling terminal, wherein the response information is a response to the prompt information;

step S107, determining whether the calling terminal uses the voice translation service or translates the voice of the calling into a target language in the voice call according to the response information to obtain a second judgment result;

step S108, when the second judgment result shows that the calling terminal uses the voice translation service, the voice of the calling terminal is translated into the voice of the target language in the voice call;

and step S109, outputting the voice of the target language to the called terminal.

Step S110, when the second judgment result shows that the calling terminal does not use the voice translation service, the voice of the calling party in the voice call is output to the called terminal.

In the embodiment of the present invention, the call request further includes first identification information, where the first identification information is used to indicate that a voice translation service is used in the current call. In a specific implementation process, the first identification information may be a pre-insertion code.

EXAMPLE III

Based on the network architecture shown in fig. 1-1, an embodiment of the present invention provides a voice conversion method, including:

step S301, the initial calling terminal initiates a voice call;

here, the originating terminal is a terminal that originates a voice call, i.e., the originating terminal is a calling terminal.

Step S302, the service platform side finds out that the calling party or the called party signs the service of voice translation, and then the service platform plays corresponding prompt tones according to the information such as the home position of the calling party and the visiting place position of the calling party.

Here, the prompt tone is used to prompt the user whether to use the corresponding Voice translation function, and playing the prompt tone may prompt the user whether to use the corresponding Voice translation function in an Interactive Voice Response (IVR) manner or in other manners.

Here, the visited place location where the caller is located may be understood as a roaming place location of the caller.

Here, for example, if the user in the united states roams to china, the broadcast needs the chinese-english conversion service, and the user can select the service according to the key or the like.

Here, assuming that the user selects the voice translation function through the originating terminal in response to the alert tone, the following flow of the present invention will be described by taking the user selecting the voice translation function (i.e., voice translation service) as an example.

Step S303, when the user selects to adopt the voice translation function, the service platform transmits the voice media, the calling position information and the calling attribution domain information to the voice conversion gateway;

step S304, the voice conversion gateway executes the corresponding voice translation capability and sends the translated media stream to the called terminal;

step S305, the called terminal receives the media stream translated by the voice conversion gateway, and similarly, the voice of the called terminal can be converted into the voice required by the corresponding originating terminal user through the voice conversion gateway.

Example four

In the network architecture shown in fig. 1-1, based on the difference in network types, the specific implementation process and implementation mechanism may also be different, and in the embodiment of the present invention, an IMS network is taken AS an example below, and when the network architecture shown in fig. 1-1 is applied to an IP Multimedia Subsystem (IMS) network, the specific network architecture is shown in fig. 3, and the architecture of the IMS network includes a calling terminal (calling UE), a Session Border Controller (SBC), a Call Session Control Function (Call Session Control Function, CSCF) entity, a voice translation Application Server (AS, Application Server), a voice recognition platform, an inter-Border Control Function (IBCF), and a called terminal. Based on the network architecture illustrated in fig. 3, another embodiment of the present invention further provides a voice conversion method, where the method includes:

step S401, a calling terminal initiates an initial call, and adds and dials a certain pre-plug code in front of a called number;

here, the prefix is a flag added before the called number during dialing, and may be a number, a "#" or an "#" or the like. For example, the called number is "XXX" and the prefix is "7", if the calling subscriber wants to use the voice translation service while dialing, the subscriber dials "7 XXX" on the calling terminal.

S402, S-CSCF recognizes that the called number has a prefix, and routes the call to the voice translation AS of the calling side;

here, the CSCF includes three types, i.e., S-CSCF (Serving CSCF), I-CSCF (Interrogating CSCF), and P-CSCF (proxy CSCF).

Step S403, the voice translation AS responds to the calling of the calling side, generates a prompt tone of default translation rule according to the acquired calling country code and called country code, and plays the prompt tone to the calling terminal according to the interactive flow of the IVR;

here, for example, assuming that the calling terminal is located in china and the called terminal is located in uk, the country code of the calling party is the country code of china and the country code of the called party is the country code of uk, the voice translation AS recognizes that the telephone is a call from china to uk based on the country code of the calling party and the country code of the called party, and thus generates prompt tones such AS "translate english to ask to press 0 and other please to press 1", and then plays the prompt tones "translate english to ask to press 0 and other ask to press 1" to the calling terminal according to the interactive flow of IVR.

Step S404, the calling terminal supports Dual Tone Multi Frequency (DTMF) function, selects corresponding translation rule and transmits to voice translation AS;

step S405, the voice translation AS executes number conversion to route the call to the I-CSCF, and the I-CSCF executes the subsequent called process;

step S406, the media of the calling party and the called party are connected, and the voice translation AS executes the corresponding voice stream translation function.

EXAMPLE five

Based on the foregoing embodiments, embodiments of the present invention provide a voice conversion apparatus, where each unit included in the apparatus and each module included in each unit can be implemented by a processor in the voice conversion apparatus, and certainly can also be implemented by a specific logic circuit; in the course of a particular embodiment, the processor may be a Central Processing Unit (CPU), a Microprocessor (MPU), a Digital Signal Processor (DSP), a Field Programmable Gate Array (FPGA), or the like.

Fig. 4 is a schematic diagram of a composition structure of a fifth voice conversion apparatus according to an embodiment of the present invention, and as shown in fig. 4, the apparatus 400 includes a first receiving unit 401, a first judging unit 402, a first obtaining unit 403, a determining unit 404, and an output unit 405, where:

the first receiving unit 401 is configured to receive a call request of a calling terminal, where the call request at least carries identification information of the calling terminal and identification information of a called terminal;

the first determining unit 402 is configured to determine whether the calling terminal signs a voice translation service according to the identifier information of the calling terminal, so as to obtain a first determination result;

the determining unit 403, configured to, when the first determination result indicates that the calling terminal signs the voice translation service, obtain location information of the calling terminal and location information of the called terminal according to the identification information of the calling terminal and the identification information of the called terminal;

the output unit 404 is configured to determine content of a prompt message according to the location information of the calling terminal and the location information of the called terminal, where the prompt message is used to prompt a calling user whether to use a voice translation service in the current call process;

the output unit 405 is configured to output the prompt information to the calling terminal.

Here, the location information of the calling terminal includes home location information of the calling terminal and current location information of the calling terminal, and the location information of the called terminal includes home location information of the called terminal and current location information of the called terminal.

In an embodiment of the present invention, the determining unit includes a determining module and a determining module, wherein:

the judging module is used for judging whether the attribution information of the called terminal is consistent with the current position information or not to obtain a third judging result;

and the determining module is used for determining the content of the prompt message according to the third judgment result and the attribution of the calling terminal.

Here, the determining module is configured to determine, when the third determination result indicates that the home location information of the called terminal and the current location information are consistent, that the content of the prompt information at least includes: and the voice of the calling terminal is translated into the language corresponding to the home location information of the called terminal from the language corresponding to the home location information of the calling terminal.

Here, the determining module is configured to determine, when the third determination result indicates that the home location information of the called terminal and the current location information are inconsistent, that the content of the prompt information at least includes: and translating the voice of the calling terminal from the language corresponding to the attribution information of the calling terminal into the language corresponding to the attribution information of the called terminal without using a voice translation service, and translating the voice of the calling terminal from the language corresponding to the attribution information of the calling terminal into the language corresponding to the current position information of the called terminal.

EXAMPLE six

Fig. 5 is a schematic diagram of a composition structure of a sixth voice conversion apparatus according to an embodiment of the present invention, and as shown in fig. 5, the apparatus 400 includes a first receiving unit 401, a first determining unit 402, a first obtaining unit 403, a determining unit 404, an output unit 405, a second receiving unit 406, a second determining unit 407, a translating unit 408, and a second output unit 409, where:

The second receiving unit 406 is configured to receive response information from the calling terminal, where the response information is a response to the prompt information;

the second judging unit 407 is configured to determine, according to the response information, whether the calling terminal uses the voice translation service, or translate the voice of the calling terminal into a target language in the current voice call, so as to obtain a second judgment result;

the translating unit 408 is configured to translate, when the second determination result indicates that the calling terminal uses a voice translation service, the voice of the calling terminal into a voice in a target language in the current voice call;

the second output unit 409 is configured to output the voice of the target language to the called terminal.

In this embodiment of the present invention, the second output unit is further configured to output, when the second determination result indicates that the calling terminal does not use the voice translation service, the voice of the calling party in the voice call to the called terminal.

In the embodiment of the present invention, the call request further includes first identification information, where the first identification information is used to indicate that a voice translation service is used in the current call. In a specific implementation process, the first identification information is a pre-insertion code.

Here, it should be noted that: the above description of the embodiment of the apparatus is similar to the above description of the embodiment of the method, and has similar beneficial effects to the embodiment of the method, and therefore, the description thereof is omitted. For technical details that are not disclosed in the embodiments of the apparatus of the present invention, please refer to the description of the embodiments of the method of the present invention for understanding, and therefore, for brevity, will not be described again.

The voice conversion apparatus provided in the foregoing description of the present invention is a logical classification unit, and may be implemented on one device or multiple devices in a specific implementation process, for example, when implemented on one device, the voice conversion apparatus may be a service platform in fig. 1-1 or a voice translation AS in fig. 3; when implemented on two devices, the voice conversion apparatus may include the service platform and the voice conversion gateway in fig. 1-1. Those skilled in the art can determine the implementation form of each module in the voice conversion apparatus on a specific device according to the actual voice conversion scenario and the related prior art, and will not be described herein again.

It should be appreciated that reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. It should be understood that, in various embodiments of the present invention, the sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation on the implementation process of the embodiments of the present invention. The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described device embodiments are merely illustrative, for example, the division of the unit is only a logical functional division, and there may be other division ways in actual implementation, such as: multiple units or components may be combined, or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the coupling, direct coupling or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or units may be electrical, mechanical or other forms.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units; can be located in one place or distributed on a plurality of network units; some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, all the functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may be separately regarded as one unit, or two or more units may be integrated into one unit; the integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.

Those of ordinary skill in the art will understand that: all or part of the steps for realizing the method embodiments can be completed by hardware related to program instructions, the program can be stored in a computer readable storage medium, and the program executes the steps comprising the method embodiments when executed; and the aforementioned storage medium includes: various media that can store program codes, such as a removable Memory device, a Read Only Memory (ROM), a magnetic disk, or an optical disk.

Alternatively, the integrated unit of the present invention may be stored in a computer-readable storage medium if it is implemented in the form of a software functional module and sold or used as a separate product. Based on such understanding, the technical solutions of the embodiments of the present invention may be essentially implemented or a part contributing to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the methods described in the embodiments of the present invention. And the aforementioned storage medium includes: a removable storage device, a ROM, a magnetic or optical disk, or other various media that can store program code.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims

1. A method of voice conversion, the method comprising:

receiving a call request of a calling terminal, wherein the call request at least carries identification information of the calling terminal and identification information of a called terminal; the call request also comprises first identification information, wherein the first identification information is used for indicating that a voice translation service is used in the current call;

and outputting the prompt information to the calling terminal.

2. The method of claim 1, wherein the location information of the calling terminal comprises home location information of the calling terminal and current location information of the calling terminal, and the location information of the called terminal comprises home location information of the called terminal and current location information of the called terminal.

3. The method of claim 1, further comprising:

receiving response information from the calling terminal, wherein the response information is a response to the prompt information;

determining whether the calling terminal uses the voice translation service or not according to the response information, or translating the voice of the calling terminal into a target language in the voice call to obtain a second judgment result;

when the second judgment result shows that the calling terminal uses the voice translation service, the voice of the calling terminal is translated into the voice of the target language in the voice call;

and outputting the voice of the target language to a called terminal.

4. The method of claim 3, further comprising: and when the second judgment result shows that the calling terminal does not use the voice translation service, outputting the voice of the calling party in the voice call to the called terminal.

5. The method according to any one of claims 1 to 4, wherein the determining the content of the prompt message according to the location information of the calling terminal and the location information of the called terminal comprises:

judging whether the home location information of the called terminal is consistent with the current location information or not to obtain a third judgment result;

and determining the content of the prompt message according to the third judgment result and the attribution of the calling terminal.

6. The method according to claim 5, wherein the determining the content of the prompt message according to the third determination result and the attribution of the calling terminal comprises:

when the third judgment result shows that the attribution information of the called terminal is consistent with the current position information, determining the content of the prompt information at least comprises the following steps: and the voice of the calling terminal is translated into the language corresponding to the home location information of the called terminal from the language corresponding to the home location information of the calling terminal.

7. The method according to claim 5, wherein the determining the content of the prompt message according to the third determination result and the attribution of the calling terminal comprises:

when the third judgment result shows that the home location information of the called terminal is inconsistent with the current location information, determining the content of the prompt information at least comprises: and translating the voice of the calling terminal from the language corresponding to the attribution information of the calling terminal into the language corresponding to the attribution information of the called terminal without using a voice translation service, and translating the voice of the calling terminal from the language corresponding to the attribution information of the calling terminal into the language corresponding to the current position information of the called terminal.

8. The method of claim 1, wherein the first identification information is a pre-amble.

9. A voice conversion apparatus characterized by comprising a first receiving unit, a first judging unit, a first acquiring unit, a determining unit, and an outputting unit, wherein:

the first receiving unit is used for receiving a call request of a calling terminal, wherein the call request at least carries identification information of the calling terminal and identification information of a called terminal; the call request also comprises first identification information, wherein the first identification information is used for indicating that a voice translation service is used in the current call;

10. The apparatus of claim 9, further comprising a second receiving unit, a second determining unit, a translating unit, and a second outputting unit, wherein:

the second receiving unit is configured to receive response information from the calling terminal, where the response information is a response to the prompt information;

the second judging unit is configured to determine whether the calling terminal uses the voice translation service according to the response information, or translate the voice of the calling terminal into a target language in the voice call, so as to obtain a second judgment result;

the translation unit is configured to translate, when the second determination result indicates that the calling terminal uses a voice translation service, the voice of the calling terminal into a voice of a target language in the voice call;

and the second output unit is used for outputting the voice of the target language to the called terminal.

11. The apparatus according to claim 9 or 10, wherein the determining unit comprises a judging module and a determining module, wherein: