CN104754536A

CN104754536A - Method and system for realizing communication between different languages

Info

Publication number: CN104754536A
Application number: CN201310743004.4A
Authority: CN
Inventors: 郭勐; 杨蕾; 张俭
Original assignee: China Mobile Communications Group Co Ltd
Current assignee: China Mobile Communications Group Co Ltd
Priority date: 2013-12-27
Filing date: 2013-12-27
Publication date: 2015-07-01

Abstract

The invention discloses a method and a system for realizing communication between different languages, which is used for solving a problem of poor applicability caused by being restricted by capabilities of a called terminal in a scheme of the prior art. The method comprises the steps that a data server receives original voice of the first user terminal sent by a voice server in the voice communication process of a first user terminal and a second user terminal; and the data server translates the original voice according to a language type, which is acquired in advance, required by the second user terminal, acquires a translation result corresponding to the original voice and sends the translation result, thereby being convenient for the second user terminal to receive the translation result.

Description

The method and system communicated is realized between a kind of different language

Technical field

The present invention relates to mobile communication technology field, particularly relate to the method and system realizing between a kind of different language communicating.

Background technology

Along with the develop rapidly of computer technology and the communication technology, communication terminal enters into huge numbers of families, it is infinite convenient to bring, but user more and more international trend also causes the people of country variant existing problems when using communication terminal to link up.For Chinese and English native user, can skillfully make to exchange with American in English the user few people of China, American also few people understands Chinese, so language becomes the biggest obstacle that Basic national topographic map exchanges.

There is a kind of direct communication of translating telephony and can realize between different language at present, namely by " speech recognition-machine translation-phonetic synthesis ", a kind of language conversion is become another kind of language.

The main realization rate of this technology is: terminal called receives the sound of first language, and this sound is converted to the word of first language; The character translation of first language is become the word of second language; Be that the sound of second language sends by the text conversion of second language.

Although the user that the program can realize between different language communicates, but require that terminal called must possess the function such as mutual conversion and character translation of sound and word.Obviously, if terminal called does not have above-mentioned functions, such as landline telephone or non intelligent mobile phone etc., then cannot realize the direct communication between different language, therefore the applicability of the program is poor.

Summary of the invention

The embodiment of the present invention provides the method and system realizing between a kind of different language communicating, in order to solve the problem that scheme of the prior art causes applicability poor due to the restriction by terminal called ability.

The embodiment of the present invention is by the following technical solutions:

Realize a method for voice communication between different language, comprising:

In the voice call process of first user terminal and the second user terminal, data server receives the raw tone of the described first user terminal that voice server sends;

Data server is according to the language form needed for described second user terminal obtained in advance, described raw tone is translated, described second user terminal obtains the translation result corresponding with described raw tone and sends, so that can receive described translation result.

Wherein, data server, according to the language form needed for the second user terminal obtained in advance, is translated described raw tone, is obtained the translation result corresponding with described raw tone and send, specifically comprising:

When described second user terminal is fixed terminal, data server is according to the language form needed for the second user terminal obtained in advance, described raw tone is translated, obtain the translation voice corresponding with described raw tone, and described translation voice are sent to voice server, described second user terminal by described voice server, described translation voice sent to described second user terminal, so that can receive described translation voice and play;

When described second user terminal is the mobile terminal with speech-sound synthesizing function, data server is according to the language form needed for the second user terminal obtained in advance, described raw tone is translated, obtain the version corresponding with described raw tone, and described version is sent to described second user terminal, so that described second user terminal can receive described version, and described version is synthesized translation voice and plays;

When described second user terminal is the mobile terminal without speech-sound synthesizing function, data server is according to the language form needed for the second user terminal obtained in advance, described raw tone is translated, obtain the translation voice corresponding with described raw tone, and described second user terminal described translation voice sent to described second user terminal, so that can receive described translation voice and play.

Wherein, when described second user terminal is the mobile terminal without speech-sound synthesizing function, described method also comprises:

Data server is according to the language form needed for the second user terminal obtained in advance, described raw tone is translated, obtain the version corresponding with described raw tone, and described second user terminal described version sent to described second user terminal with the form of short message, so that can receive described version and show.

Wherein, data server receives the raw tone of the first user terminal that voice server sends, and specifically comprises:

Data server receives at least one voice segments in the described raw tone of voice server transmission; Wherein, institute's speech segment be voice server according to the dead time in described raw tone, described raw tone is carried out to segmentation obtains, and to send according to the sequencing of each voice segments in described raw tone; Then

According to the language form needed for the second user terminal obtained in advance, described raw tone is translated, obtains the translation result corresponding with described raw tone and send, specifically comprising:

According to the sequencing of each voice segments in described raw tone, following operation is performed respectively to each voice segments received:

According to the language form needed for the second user terminal obtained in advance, current speech segment is translated, obtain the translation result corresponding with current speech segment and send.

Wherein, each voice segments has voice segments sequence number, and institute's speech segment sequence number is that described voice server distributes according to the sequencing of each voice segments in described raw tone; Then

Described method also comprises:

Data server is according to the time cycle pre-set, to the voice segments sequence number of the current processed voice segments of voice server feedback, so that voice server can, according to the voice segments sequence number of the current processed voice segments received, be the follow-up voice segments distribution voice segments sequence number obtained.

Wherein, described voice server is deployed in circuit domain, and described data server is deployed in data field.

Realize a system for voice communication between different language, comprising: data server and voice server, wherein:

Described voice server, in the voice call process of first user terminal and the second user terminal, sends to data server by the raw tone of described first user terminal;

Described data server, for receiving the raw tone of the described first user terminal that voice server sends; According to the language form needed for described second user terminal obtained in advance, described raw tone is translated, obtains the translation result corresponding with described raw tone and send, so that described second user terminal can receive described translation result.

Wherein, described data server, specifically for:

When described second user terminal is fixed terminal, according to the language form needed for the second user terminal obtained in advance, described raw tone is translated, obtain the translation voice corresponding with described raw tone, and described translation voice are sent to voice server, described second user terminal by described voice server, described translation voice sent to described second user terminal, so that can receive described translation voice and play;

When described second user terminal is the mobile terminal with speech-sound synthesizing function, according to the language form needed for the second user terminal obtained in advance, described raw tone is translated, obtain the version corresponding with described raw tone, and described version is sent to described second user terminal, so that described second user terminal can receive described version, and described version is synthesized translation voice and plays;

When described second user terminal is the mobile terminal without speech-sound synthesizing function, according to the language form needed for the second user terminal obtained in advance, described raw tone is translated, obtain the translation voice corresponding with described raw tone, and described second user terminal described translation voice sent to described second user terminal, so that can receive described translation voice and play.

Wherein, described data server also for:

When described second user terminal is the mobile terminal without speech-sound synthesizing function, according to the language form needed for the second user terminal obtained in advance, described raw tone is translated, obtain the version corresponding with described raw tone, and described second user terminal described version sent to described second user terminal with the form of short message, so that can receive described version and show.

Wherein, described voice server, specifically for:

According to the dead time in described raw tone, segmentation is carried out to described raw tone, and according to the sequencing of each voice segments in described raw tone, send at least one voice segments to data server; Then

Described data server, specifically for:

Wherein, described voice server, also for:

Be that each voice segments distributes voice segments sequence number according to the sequencing of each voice segments in described raw tone; Then

Described data server, also for:

According to the time cycle pre-set, to the voice segments sequence number of the current processed voice segments of described voice server feedback, so that voice server can, according to the voice segments sequence number of the current processed voice segments received, be the follow-up voice segments distribution voice segments sequence number obtained.

The beneficial effect of the embodiment of the present invention is as follows:

In the embodiment of the present invention, data server receives the raw tone of the first user terminal that voice server sends, and according to the language form needed for the second user terminal obtained in advance, this raw tone is translated, and the translation result obtained is sent to the second user terminal, because the second user terminal directly can obtain translation result, therefore this technical scheme is not by the restriction of terminal called ability, communication between two users that just can realize different language by means of only the voice server of network side and data server, thus applicability is stronger.

Accompanying drawing explanation

The realization flow figure of the method for real-time speech communicating between a kind of different language that Fig. 1 provides for the embodiment of the present invention;

The system architecture diagram realizing said method that Fig. 2 provides for the embodiment of the present invention;

The said method realization flow figure in actual applications that Fig. 3 provides for the embodiment of the present invention;

The structural representation of the system of voice communication is realized between a kind of different language that Fig. 4 provides for the embodiment of the present invention.

Embodiment

In order to solve the problem that scheme of the prior art causes applicability poor due to the restriction by terminal called ability, the embodiment of the present invention proposes the method and system realizing voice communication between a kind of different language.In this technical scheme, data server receives the raw tone of the first user terminal that voice server sends, and according to the language form needed for the second user terminal obtained in advance, this raw tone is translated, and the translation result obtained is sent to the second user terminal, because the second user terminal directly can obtain translation result, therefore this technical scheme is not by the restriction of terminal called ability, communication between two users that just can realize different language by means of only the voice server of network side and data server, thus applicability is stronger.

Below in conjunction with Figure of description, embodiments of the invention are described, should be appreciated that embodiment described herein is only for instruction and explanation of the present invention, is not limited to the present invention.And when not conflicting, the embodiment in the present invention and the feature of embodiment can be combined with each other.

Provide the method for real-time speech communicating between a kind of different language in the embodiment of the present invention, as shown in Figure 1, be the realization flow figure of the method, mainly comprise the steps:

Step 11, in the voice call process of first user terminal and the second user terminal, data server receives the raw tone of the first user terminal that voice server sends;

When first user terminal and the second user terminal are conversed, the raw tone of first user terminal first can send voice server, and the raw tone received is given data server and processed by voice server again.

Voice server in the embodiment of the present invention can dispose circuit domain in the network side, and data server can dispose data field in the network side.

Step 12, data server, according to the language form needed for the second user terminal obtained in advance, is translated this raw tone, is obtained the translation result corresponding with this raw tone and send, so that the second user terminal can receive translation result.

In the embodiment of the present invention, in order to reduce the resource occupation to network side, data server, when obtaining translation result, can, according to the terminal capability of the second user terminal, select multi-form translation result to send to the second user terminal.Such as when the second user terminal possesses speech-sound synthesizing function, then can send to the second user terminal by translating the version obtained, performing the synthesis from Text To Speech by the second user terminal; Again such as, when the second user terminal does not have data access function, when namely the second user terminal is fixed terminal, then data server can be sent to translating the translation voice obtained on the second user terminal by voice server.

Based on above-mentioned situation, this step 12 can specifically comprise:

When the second user terminal is fixed terminal, data server is according to the language form needed for the second user terminal obtained in advance, raw tone is translated, obtain the translation voice corresponding with raw tone, and translation voice are sent to voice server, the second user terminal by voice server, translation voice sent to the second user terminal, so that can receive translation voice and play;

When the second user terminal is the mobile terminal with speech-sound synthesizing function, data server is according to the language form needed for the second user terminal obtained in advance, raw tone is translated, obtain the version corresponding with raw tone, and version is sent to the second user terminal, so that the second user terminal can receive version, and version is synthesized translation voice and plays;

Wherein, the version received can also also show by this second user terminal, determines the voice content that opposite end sends further.

When the second user terminal is the mobile terminal without speech-sound synthesizing function, data server is according to the language form needed for the second user terminal obtained in advance, raw tone is translated, obtain the translation voice corresponding with raw tone, and the second user terminal translation voice sent to the second user terminal, so that can receive translation voice and play.

Wherein, data server is when translating raw tone, first raw tone is converted to urtext under normal circumstances, again the urtext obtained is changed according to the language form needed for the second user terminal, obtain version, again the version obtained is synthesized translation voice, send to the second user terminal.

Therefore, when the second user terminal is the mobile terminal without speech-sound synthesizing function, data server is except sending to the second user terminal during by translation voice, the version obtained can also be sent to the second user terminal in the mode of short message, enable the second user terminal determine the voice content that opposite end sends further.

It should be noted that, according to the user ID of the second user terminal (such as telephone number), data server can determine that the second user terminal is fixed terminal or mobile terminal, and the user ID of the second user terminal sends to data server by voice server.

In addition, if the raw tone that first user terminal sends is long, so the stand-by period of the second user terminal also can be long, thus cause the Consumer's Experience of the second user terminal poor, therefore when in above-mentioned steps 11, voice server sends raw tone to data server process, can first according to the dead time in raw tone, segmentation is carried out to raw tone, obtain multiple voice segments, then according to the sequencing of each voice segments in raw tone, voice segments is sent to data server.Wherein, when sending voice segments, once only can send out a voice segments, also can once send out voice segments multiple, in the embodiment of the present invention, this not limited.

In addition, while a voice segments is to data server, voice segments also can also be sent to the second user terminal simultaneously, make the second user terminal can hear the raw tone of first user terminal, the second user terminal also can be made to learn, and current talking keeps connected state simultaneously.

In this case, when receiving the voice segments in raw tone in step 12 when data server, according to the sequencing of each voice segments in raw tone, following operation is performed respectively to each voice segments received:

In addition, when the number ratio of voice segments is more, in order to avoid data server and voice server, the operating sequence to data segment causes and obscures, voice server can according to the sequencing of the voice segments obtained in raw tone, for each voice segments obtained distributes a voice segments sequence number, then data server just can be translated corresponding voice segments according to the sequencing of the voice segments sequence number obtained.

And in order to enable voice server and data server realize synchronously to voice segments sequence number, data server can according to the time cycle pre-set, to the voice segments sequence number of the current processed voice segments of voice server feedback, so that voice server can, according to the voice segments sequence number of the current processed voice segments received, be the follow-up voice segments distribution voice segments sequence number obtained.

As shown in Figure 2, for realizing the system architecture diagram of the said method that the embodiment of the present invention provides.This system comprises data server and voice server, and wherein, voice server is responsible for reception and the transmission of raw tone, has the interface of circuit domain and data field, can be one or a series of server; Data server has been responsible for reception and the process of raw tone, and sends version and translation voice, has the interface of circuit domain and data field, can be one or a series of server.

Based on the above-mentioned system architecture provided, below using calling subscriber A as first user terminal, called subscriber B is that example is specifically introduced said method implementation procedure in actual applications as the second user terminal.

As shown in Figure 3, be said method realization flow figure in actual applications that the embodiment of the present invention provides.The method comprises the steps:

Step 31, the raw tone of calling subscriber A is sent to voice server;

Step 32, voice server, according to the dead time in raw tone, carries out segmentation to the raw tone received, and according to the sequencing of each voice segments in raw tone, is each voice segments adding language segment sequence number.

Wherein, the form of voice segments sequence number can be: user ID+current speech segment in raw tone starting point time+duration+count value of current speech segment; Such as, current speech segment is the second segment in raw tone, and so count value is just 2.

Step 33, voice segments is sent to data server by voice server, voice segments is sent to called subscriber B simultaneously;

Step 34, the voice segments received is converted to urtext section by data server, then urtext section is changed according to the language form needed for the second user terminal, obtains version section;

When called subscriber B be fixed terminal or the mobile terminal without speech-sound synthesizing function time, perform step 37; When called subscriber B is the mobile terminal with speech-sound synthesizing function, perform step 35.

Step 35, the version section obtained is sent to called subscriber B by data server;

Step 36, the version section synthesis that called subscriber B will obtain, obtains translation voice segments and plays.Flow process terminates;

Step 37, version section is synthesized by data server, obtains translation voice segments; When called subscriber B is fixed terminal, perform step 39, when called subscriber B is the mobile terminal without speech-sound synthesizing function, perform step 38;

Step 38, the translation voice segments obtained is sent to called subscriber B by data server, continues to perform step 311.

Step 39, the translation voice segments obtained is sent to voice server by data server;

Step 310, the translation voice segments obtained is sent to called subscriber B by voice server;

Step 311, called subscriber B plays the translation voice segments received.Flow process terminates.

Based on above-mentioned implementation procedure, it should be noted that, the second user terminal in the embodiment of the present invention both can be calling subscriber, also can be called subscriber, when first user terminal and the second user terminal are conversed, the language form that can obtain needed for both call sides according to any one mode following:

1 if for the user terminal installed in advance for using the application program translating phone, and can arrange required language form in the application in advance, when it communicates with other users, network side just can get this information; If unintelligent terminal, or do not have the intelligent terminal of the application program of installing for using translation phone, so when it communicates with other users, network side can utilize the mode of voice menu two parties please select required language form voluntarily;

2, network side can be detected by the languages of both call sides, determines the language form needed for both call sides;

3, network side according to the ownership place of the number of both call sides (country), can determine the language form needed for both call sides.

Same, when determining the terminal capability of both call sides, be first the terminal type of the number determination both call sides according to both call sides; When being defined as fixed terminal, then giving tacit consent to it and not there is the ability such as phonetic synthesis and data access; When being defined as mobile terminal, if mobile terminal has installed the application program for using translation phone in advance, then can arrange the ability self had in advance in the application, when communicating with other users when it, network side just can get this information.

In the embodiment of the present invention, utilize the circuit domain of network side and data field to realize the voice communication between different language, the Appropriate application feature of existing network ability and terminal, can rapid deployment on existing network, improving cost is low, meet user's use habit simultaneously, the development of this business can be promoted.

Accordingly, the embodiment of the present invention provides the system realizing voice communication between a kind of different language, is illustrated in figure 4 the structural representation of this system, mainly comprises: data server 41 and voice server 42, wherein:

Described voice server 42, in the voice call process of first user terminal and the second user terminal, sends to data server 41 by the raw tone of described first user terminal;

Described data server 41, for receiving the raw tone of the described first user terminal that voice server 42 sends; According to the language form needed for described second user terminal obtained in advance, described raw tone is translated, obtains the translation result corresponding with described raw tone and send, so that described second user terminal can receive described translation result.

Wherein, described data server 41, specifically for:

When described second user terminal is fixed terminal, according to the language form needed for the second user terminal obtained in advance, described raw tone is translated, obtain the translation voice corresponding with described raw tone, and described translation voice are sent to voice server 41, described second user terminal by described voice server 42, described translation voice sent to described second user terminal, so that can receive described translation voice and play;

Wherein, described data server 41 also for:

Wherein, described voice server 42, specifically for:

According to the dead time in described raw tone, segmentation is carried out to described raw tone, and according to the sequencing of each voice segments in described raw tone, send at least one voice segments to data server 41; Then

Described data server 41, specifically for:

Further, described voice server 42, also for:

Described data server 41, also for:

According to the time cycle pre-set, the voice segments sequence number of current processed voice segments is fed back to described voice server 42, so that voice server 42 can, according to the voice segments sequence number of the current processed voice segments received, be the follow-up voice segments distribution voice segments sequence number obtained.

Wherein, described voice server 42 is deployed in circuit domain, and described data server 41 is deployed in data field.

Those skilled in the art should understand, embodiments of the invention can be provided as method, system or computer program.Therefore, the present invention can adopt the form of complete hardware embodiment, completely software implementation or the embodiment in conjunction with software and hardware aspect.And the present invention can adopt in one or more form wherein including the upper computer program implemented of computer-usable storage medium (including but not limited to magnetic disc store, CD-ROM, optical memory etc.) of computer usable program code.

The present invention describes with reference to according to the flow chart of the method for the embodiment of the present invention, equipment (system) and computer program and/or block diagram.Should understand can by the combination of the flow process in each flow process in computer program instructions realization flow figure and/or block diagram and/or square frame and flow chart and/or block diagram and/or square frame.These computer program instructions can being provided to the processor of all-purpose computer, special-purpose computer, Embedded Processor or other programmable data processing device to produce a machine, making the instruction performed by the processor of computer or other programmable data processing device produce device for realizing the function of specifying in flow chart flow process or multiple flow process and/or block diagram square frame or multiple square frame.

These computer program instructions also can be stored in can in the computer-readable memory that works in a specific way of vectoring computer or other programmable data processing device, the instruction making to be stored in this computer-readable memory produces the manufacture comprising command device, and this command device realizes the function of specifying in flow chart flow process or multiple flow process and/or block diagram square frame or multiple square frame.

These computer program instructions also can be loaded in computer or other programmable data processing device, make on computer or other programmable devices, to perform sequence of operations step to produce computer implemented process, thus the instruction performed on computer or other programmable devices is provided for the step realizing the function of specifying in flow chart flow process or multiple flow process and/or block diagram square frame or multiple square frame.

Although describe the preferred embodiments of the present invention, those skilled in the art once obtain the basic creative concept of cicada, then can make other change and amendment to these embodiments.So claims are intended to be interpreted as comprising preferred embodiment and falling into all changes and the amendment of the scope of the invention.

Obviously, those skilled in the art can carry out various change and modification to the present invention and not depart from the spirit and scope of the present invention.Like this, if these amendments of the present invention and modification belong within the scope of the claims in the present invention and equivalent technologies thereof, then the present invention is also intended to comprise these change and modification.

Claims

1. realize a method for voice communication between different language, it is characterized in that, comprising:

2. the method for claim 1, is characterized in that, data server, according to the language form needed for the second user terminal obtained in advance, is translated described raw tone, obtained the translation result corresponding with described raw tone and send, specifically comprising:

3. method as claimed in claim 2, it is characterized in that, when described second user terminal is the mobile terminal without speech-sound synthesizing function, described method also comprises:

4. the method for claim 1, is characterized in that, data server receives the raw tone of the first user terminal that voice server sends, and specifically comprises:

5. method as claimed in claim 4, it is characterized in that, each voice segments has voice segments sequence number, and institute's speech segment sequence number is that described voice server distributes according to the sequencing of each voice segments in described raw tone; Then

Described method also comprises:

6. the method as described in as arbitrary in Claims 1 to 5, it is characterized in that, described voice server is deployed in circuit domain, and described data server is deployed in data field.

7. realize a system for voice communication between different language, it is characterized in that, comprising: data server and voice server, wherein:

8. system as claimed in claim 7, is characterized in that, described data server, specifically for:

9. system as claimed in claim 8, is characterized in that, described data server also for:

10. system as claimed in claim 7, is characterized in that, described voice server, specifically for:

Described data server, specifically for:

11. systems as claimed in claim 10, is characterized in that, described voice server, also for:

Described data server, also for:

12. as arbitrary in claim 7 ~ 11 as described in system, it is characterized in that, described voice server is deployed in circuit domain, and described data server is deployed in data field.