CN102903361A - Instant call translation system and instant call translation method - Google Patents

Instant call translation system and instant call translation method Download PDF

Info

Publication number
CN102903361A
CN102903361A CN2012103909731A CN201210390973A CN102903361A CN 102903361 A CN102903361 A CN 102903361A CN 2012103909731 A CN2012103909731 A CN 2012103909731A CN 201210390973 A CN201210390973 A CN 201210390973A CN 102903361 A CN102903361 A CN 102903361A
Authority
CN
China
Prior art keywords
language
speech signal
input speech
cutting
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2012103909731A
Other languages
Chinese (zh)
Inventor
钟实
刘鹤
袁首鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Itp Innovation Ltd
Original Assignee
Itp Innovation Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Itp Innovation Ltd filed Critical Itp Innovation Ltd
Priority to CN2012103909731A priority Critical patent/CN102903361A/en
Publication of CN102903361A publication Critical patent/CN102903361A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Machine Translation (AREA)

Abstract

The invention discloses an instant call translation system and an instant call translation method. The system comprises a divider, a voice recognition device, a translation device and a voice synthesizer, wherein the divider is connected to a switchboard and divides an inputted voice signal into one or more audio files; the voice recognition device is connected with the divider and is used for transcribing the one or more audio files into texts in source language; the translation device is connected with the voice recognition device, and is used for translating the texts in the source language into texts in objective language; and the voice synthesizer is connected with the translation device, and is used for converting the texts in the objective language into output voice signals and outputting the voice signals to the switchboard. By the instant call translation system and the instant call translation method, both call sides with language barrier can be freely communicated with each other in real time.

Description

A kind of conversation instant translation system and method
Technical field
The present invention relates to the instant translation field, relate in particular to a kind of conversation instant translation system and method.
Background technology
In the current epoch, the people of country variant can realize the interchange between the people of different geographical expediently because many-sided demands such as politics, economy, culture, amusement will often be linked up by modes such as network and phones.Yet except needs network, phone etc. easily the information transmitting medium, also to solve the problem of language obstacle.Skillfully master a foreign language and be very difficult with smooth and easy interchange of other national people.Therefore, language obstacle is the biggest obstacle of people's interchange of country variant.At present, have many translation software on the network or on the intelligent terminal such as mobile phone, but these translation software can not be used for immediate communication usually.
Therefore, need to provide a kind of conversation instant translation system and method to address the above problem.
Summary of the invention
Introduced the concept of a series of reduced forms in the summary of the invention part, this will further describe in the embodiment part.Summary of the invention part of the present invention does not also mean that key feature and the essential features that will attempt to limit technical scheme required for protection, does not more mean that the protection domain of attempting to determine technical scheme required for protection.
In order to address the above problem, the invention discloses a kind of for the conversation instant translation system, comprise sheer, speech recognition equipment, translating equipment and speech synthetic device, wherein, described sheer is used for being connected to switch and is one or more audio files with the input speech signal cutting; Described speech recognition equipment links to each other with described sheer, is used for described one or more audio files are transcribed into the text of source language; Described translating equipment links to each other with described speech recognition equipment, and the text translation that is used for described source language is the text of target language; And described speech synthetic device links to each other with described translating equipment, is used for the text-converted of described target language is the output voice signal, and exports to described switch.
In a preferred embodiment of the invention, described system also comprises storer, and it is connected between described sheer and the described speech recognition equipment; Wherein, described sheer also is used for described one or more audio files storage to described storer; And described one or more audio files that described speech recognition equipment is transcribed come from described storer.
In a preferred embodiment of the invention, described system also comprises: device determined in language, and it links to each other with described sheer, is used for the language of determining that both call sides uses; Wherein, a kind of as described source language, another kind of as described target language in the language that described both call sides uses.
In a preferred embodiment of the invention, described system also comprises: input interface is used for receiving described input speech signal from described switch; And output interface, be used for exporting described output voice signal to described switch.
In a preferred embodiment of the invention, described sheer further comprises: detecting unit, for detection of the quiet part in the described input speech signal; And cutting unit, being used for based on the quiet part that detects is described one or more audio file with described input speech signal cutting.
Preferably, described quiet part is included in the part that decibel value in time period more than 0.6 second or 0.6 second is less than or equal to noise threshold.
In a preferred embodiment of the invention, described system also comprises automatic gain controller, and it links to each other with described sheer, is used for and to the control that gains of described input speech signal.
In a preferred embodiment of the invention, described automatic gain controller further comprises: amplifying unit is used for decibel value is amplified to described setting value less than the described input speech signal of setting value; And dwindle the unit, be used for decibel value is contracted to described setting value greater than the described input speech signal of described setting value.
In a preferred embodiment of the invention, described system also comprises wave filter, and it links to each other with described sheer, is used for described input speech signal is carried out noise reduction process.
Preferably, described wave filter is S filter.
According to a further aspect of the invention, also provide a kind of conversation instant translation method, having comprised: be one or more audio files with the input speech signal cutting; Described one or more audio files are transcribed into the text of source language; Be the text of target language with the text translation of described source language; And the text-converted of described target language is the output voice signal.
In a preferred embodiment of the invention, also comprise after the described cutting: with described one or more audio files storage to storer; And described one or more audio files of transcribing come from described storer.
In a preferred embodiment of the invention, also comprise before the described cutting: determine the language that both call sides uses; Wherein, a kind of as described source language, another kind of as described target language in the language that described both call sides uses.
In a preferred embodiment of the invention, also comprise before the described cutting: receive described input speech signal from switch; And also comprise after the described conversion: export described output voice signal to described switch.
In a preferred embodiment of the invention, described cutting further comprises: detect the quiet part in the described input speech signal; And be described one or more audio file based on the quiet part that detects with described input speech signal cutting.
Preferably, described quiet part is included in the part that decibel value in time period more than 0.6 second or 0.6 second is less than or equal to noise threshold.
In a preferred embodiment of the invention, also comprise before the described cutting: to the control that gains of described input speech signal.
In a preferred embodiment of the invention, described gain control further comprises: decibel value is amplified to described setting value less than the described input speech signal of setting value; And decibel value is contracted to described setting value greater than the described input speech signal of described setting value.
In a preferred embodiment of the invention, also comprise before the described cutting: described input speech signal is carried out noise reduction process.
Preferably, described noise reduction process further comprises described input speech signal is carried out Wiener filtering.
Above-mentioned conversation instant translation system provided by the present invention can be so that the both call sides of language obstacle can be realized real-time freely exchanging with method.
Description of drawings
Following accompanying drawing of the present invention is used for understanding the present invention at this as a part of the present invention.Shown in the drawings of embodiments of the invention and description thereof, be used for explaining principle of the present invention.In the accompanying drawings,
Fig. 1 shows the structured flowchart of conversation instant translation system in accordance with a preferred embodiment of the present invention;
Fig. 2 shows the synoptic diagram of input speech signal in accordance with a preferred embodiment of the present invention;
Fig. 3 shows the process flow diagram of conversation instant translation method in accordance with a preferred embodiment of the present invention;
Fig. 4 shows the synoptic diagram of the verbal system of the conversation instant translation system that comprises in accordance with a preferred embodiment of the present invention.
Embodiment
In the following description, a large amount of concrete details have been provided in order to more thorough understanding of the invention is provided.Yet, it will be apparent to one skilled in the art that the present invention can need not one or more these details and implemented.In other example, for fear of obscuring with the present invention, be not described for technical characterictics more well known in the art.
In order thoroughly to understand the present invention, detailed structure will be proposed in following description.Obviously, execution of the present invention is not limited to the specific details that those skilled in the art has the knack of.Preferred embodiment of the present invention is described in detail as follows, yet except these were described in detail, the present invention can also have other embodiments.
According to an aspect of the present invention, provide a kind of conversation instant translation system.Fig. 1 shows the structured flowchart of conversation instant translation system 100 in accordance with a preferred embodiment of the present invention.As shown in Figure 1, this conversation instant translation system comprises sheer 104, speech recognition equipment 106, translating equipment 107 and speech synthetic device 108.Wherein, sheer 104 is used for being connected to external switch and is one or more audio files with the input speech signal cutting.Speech recognition equipment 106 links to each other with sheer 104, is used for one or more audio files of 104 cuttings of sheer are transcribed into the text of source language.Translating equipment 107 links to each other with speech recognition equipment 106, and the text translation that is used for source language that speech recognition equipment 106 is transcribed is the text of target language.Speech synthetic device 108 links to each other with translating equipment 107, and the text-converted that is used for target language that translating equipment 107 is translated is the output voice signal, and exports to external switch.
Speech recognition technology is normally based on vocabulary, phrase or carry out than short sentence.As shown in Figure 1, sheer 104 links to each other with external switch, and it is used for will be one or more audio files from the input speech signal cutting of external switch.Thus, the conversation cutting of its continuous large section is short statement.Like this, follow-up voice recognition processing can be carried out for the data after the cutting, has greatly improved processing accuracy.This has effectively guaranteed the quality of conversation instant translation.
According to a preferred embodiment of the present invention, sheer 104 can be divided into detecting unit and cutting unit, wherein, detecting unit is for detection of the quiet part in the input speech signal, and partitioning portion to be used for based on the quiet part that detects be one or more audio files with the input speech signal cutting.Fig. 2 shows the synoptic diagram of input speech signal in accordance with a preferred embodiment of the present invention.As shown in Figure 2, can in input speech signal, detect quiet part, then based on the quiet part that detects with the input speech signal cutting be one or more audio files quiet be the conversation in requisite part, come the cutting voice signal can express better speaker's statement implication based on quiet part.Can not occur like this making pauses in reading unpunctuated ancient writings or half situation, avoid subsequent treatment mistake to occur.
The quiet part of input speech signal can be less than or equal to for the decibel value of certain time the part of noise threshold.Noise threshold can be decided according to the concrete condition of both call sides place environment.For example, in the noisy environment, noise threshold can arrange highlyer.By increase duration length, thereby noise can be regarded as quiet being removed.Preferably, length is more than 0.6 second or 0.6 second duration.0.6 second be the person to person when exchanging sentence with between the cardinal principle dwell interval, select this time period quiet can be comparatively exactly person to person's dialog context to be divided into audio file take natural sentences as unit, and can effectively remove noise, so that ensuing processing procedure accuracy is higher.
Speech recognition equipment 106 links to each other with sheer 104, is used for one or more audio files of 104 cuttings of sheer are transcribed into the text of source language.In accordance with a preferred embodiment of the present invention, speech recognition equipment 106 transcription of carrying out comprises following operation.At first the one or more audio files that form after the cutting are carried out the extraction of the phonetic feature of voice signal.Phonetic feature according to extracting can carry out analyzing and processing to voice signal, can remove the redundant information that has nothing to do with speech recognition and the important information that obtains to affect speech recognition, can compress voice signal simultaneously.Then, speech recognition equipment 106 acoustic model of having trained according to the phonetic feature utilization of extracting is identified.Particularly, the phonetic feature of voice signal is mated with the phonetic feature of acoustics model and relatively, obtain best recognition result.Whole transcription has been finished the text that one or more audio files of 104 cuttings of sheer is transcribed into source language.
Translating equipment 107 links to each other with speech recognition equipment 106, and the text translation that is used for source language that speech recognition equipment 106 is transcribed is the text of target language.Translating equipment 107 is based on grammer, semanteme, syntax, the knowledge of idiom and speaker's the culture of the text of source language, then the decode meaning of text of source language of all features that analyze the text of source language is re-encoded as the text of source language the text of the target language of expressing the same meaning.
Speech synthetic device 108 links to each other with translating equipment 107, and the target language text that is used for producing after translating equipment 107 translations is converted to the output voice signal of target language, and exports to external switch.This transfer process is as follows: at first, the characteristic parameter that the text of the target language that produces after translating equipment 107 translation is converted into target language is with the corresponding prosodic information of each syllable of the sentence of the text that produces this target language; The tone that uses when then, speaking at ordinary times in conjunction with the people, the tone, pause mode, and pronunciation length convert this prosodic information to corresponding prosodic parameter; At last, generate corresponding output voice signal in conjunction with the parameter of this prosodic parameter and acoustics, and export to external switch.
According to a preferred embodiment of the present invention, conversation instant translation system 100 can also comprise input interface and output interface (not shown in Figure 1).Wherein, input interface can be connected between external switch and the sheer 104, is used for receiving input speech signal from external switch, and this input speech signal can be that simulating signal also can be digital signal.If digital signal, its sample frequency is preferably 8000Hz, and its quantization digit is preferably 16 bits.Output interface can be connected between speech synthetic device 108 and external switch, is used for exporting voice signal and exports external switch to.
According to a preferred embodiment of the present invention, conversation instant translation system 100 can also comprise that language determines device 101, and it links to each other with sheer 104, is used for determining the language of both call sides use.In the process of conversation, if a kind of as source language, then another kind of as target language in the language that both call sides uses.As shown in Figure 1, after both call sides connected by external switch, device 101 determined in the language that a word (for example, the initial greeting of both call sides) of saying separately can be inputed to system 100 through switch.Then, the language that device 101 definite both call sides use determined in language.For example, both call sides is respectively Chinese and American, be that the employed language of both call sides is Chinese and English, (Chinese say " feeding " by the initial greeting of both call sides, the American says " hello "), language is determined device 101 by receiving external switch input " feeding " and " hello ", determines the used language of both call sides and is Chinese and English.Like this, in follow-up processing procedure, if input speech signal is the voice signal of Chinese, then source language is Chinese, and target language is English; Otherwise if input speech signal is English voice signal, then source language is English, and target language is Chinese.Can identify the voice signal of various language according to the system 100 of the preferred embodiment, applied widely.One of ordinary skill in the art will appreciate that the source language of system 100 and target language can also set in advance to need not to use language to determine device 101.
According to a preferred embodiment of the present invention, conversation instant translation system 100 can also comprise automatic gain controller 102, and it links to each other with sheer, for control that input speech signal is gained.For example, the decibel value with the input speech signal that receives is adjusted to roughly unified setting value level.By automatic gain controller 102 on input speech signal the control that gains can successfully avoid because of the suddenly big or suddenly small impact that causes subsequent treatment of speaker's volume, and then the user who has influence on the other side experiences.
Preferably, this automatic gain controller 102 can comprise amplifying unit and dwindle the unit.Wherein, when the decibel value of the input speech signal that receives during less than setting value, amplifying unit is used for decibel value is amplified to this setting value less than the input speech signal of this setting value; Otherwise, when the decibel value of the input speech signal that receives during greater than this setting value, dwindle the unit decibel value be contracted to this setting value greater than the input speech signal of this setting value.This setting value can freely limit according to actual needs.
According to a preferred embodiment of the present invention, conversation instant translation system 100 can also comprise wave filter 103, and it links to each other with sheer 104, is used for input speech signal is carried out noise reduction process.Noise reduction process can adopt the method for filtering.Filtering can be from continuous or discrete input data filtering noise and disturb to extract useful information.Preferably, wave filter 103 can be that S filter is to obtain good filter effect.
In a word, automatic gain controller 102 and wave filter 103 all can make input speech signal be convenient to be identified and improve the accuracy of identification and translation.
According to a preferred embodiment of the present invention, conversation instant translation system 100 can also comprise storer 105, and it is connected between sheer 104 and the speech recognition equipment 106.In this situation, sheer 104 also is used for one or more audio files storage with its cutting to storer 105, and one or more audio files that speech recognition equipment 106 is transcribed come from storer 105.Through storer 105, can temporarily deposit one or more audio files of sheer 104 cuttings in storer 105, before entering speech recognition equipment, to cushion, so that the work of transcribing that next speech recognition equipment 106 carries out is more smooth and easy.
In addition, it should be noted that the direct connection that can represent above term " connection " and " linking to each other " between each device, also can represent indirect joint, only show a kind of connected mode between the different device of conversation instant translation system 100 among Fig. 1, other connected mode can also be arranged.For example, language determines that device 101 can directly connect wave filter 103, and automatic gain controller 102 is connected between wave filter 103 and the sheer 104.
According to a further aspect in the invention, also provide a kind of conversation instant translation method.Fig. 3 shows the process flow diagram of conversation instant translation method 300 in accordance with a preferred embodiment of the present invention.As shown in Figure 3, this conversation instant translation method 300 comprises cutting step 304, voice lard speech with literary allusions this step 306, translation steps 307 and text-to-speech step 308.Wherein, cutting step 304 is one or more audio files with the input speech signal cutting; Lard speech with literary allusions one or more audio files that this step 306 forms after with the cutting of cutting step 304 of voice are transcribed into the text of source language; Translation steps 307 is the text of target language with the voice text translation that this step 306 transcribes the source language of rear formation of larding speech with literary allusions; Text-to-speech step 308 is the output voice signal with the text-converted of the rear target language that forms of translation steps 307 translations.
In cutting step 304, the process of input speech signal being carried out cutting has further comprised detecting step and segmentation procedure, wherein, detecting step is for detection of the quiet part of input speech signal, and then to be used for based on the quiet part that detects be a plurality of audio files with the input speech signal cutting to segmentation procedure.
According to a preferred embodiment of the present invention, the quiet part of input speech signal is less than or equal to the part of noise threshold for decibel value within the time period more than 0.6 second or 0.6 second.
After cutting step 304 is one or more audio files with the input speech signal cutting, enter voice this step 306 of larding speech with literary allusions.Lard speech with literary allusions one or more audio files that this step 306 forms after with the cutting of cutting step 304 of voice are transcribed into the text of source language.In this step 306 larded speech with literary allusions in voice, at first the one or more audio files that form after the cutting of cutting step 304 are carried out the extraction of the phonetic feature of voice signal; The acoustic model of then having trained according to the phonetic feature utilization of extracting is identified.Particularly, the phonetic feature of voice signal is mated with the phonetic feature of acoustics model and relatively, obtain best recognition result.
Lard speech with literary allusions after one or more audio files that this step 306 forms after with the cutting of cutting step 304 are transcribed into the text of source language at voice, enter translation steps 307.Translation steps 307 is the text of target language with the voice text translation that this step 306 transcribes the source language of rear formation of larding speech with literary allusions.In translation steps 307, by grammer, semanteme, syntax, the knowledge of idiom and speaker's the culture based on the text of source language, the decode meaning of text of source language of all features that analyze the text of source language, then the text of source language is re-encoded as the text of the target language of the same meaning, the text translation of namely having finished source language is the text of target language.
Voice are larded speech with literary allusions after the Language Translation of the source text that forms after this step 306 is transcribed becomes the text of target language in translation steps 307, enter text-to-speech step 308.Text-to-speech step 308 is converted to the output voice signal of target language with the rear target language text that forms of translation steps 307 translations, and exports to external switch.In text-to-speech step 308, preferably, the characteristic parameter that at first text of the target language that forms after translation steps 307 translation is converted into target language is with the corresponding prosodic information of each syllable of the sentence of the text that produces this target language, the tone that uses when then speaking at ordinary times in conjunction with the people, the tone, pause mode, and pronunciation length convert this prosodic information to corresponding prosodic parameter, parameter in conjunction with prosodic parameter and acoustics generates corresponding output voice signal at last, and exports to external switch.
Like this, whole conversation instant translation process finishes.
According to a preferred embodiment of the present invention, conversation instant translation method 300 can also comprise receiving step and output step (not shown in Figure 3).Wherein, receiving step received input speech signal from switch in this receiving step before cutting step 304, and this input speech signal can be that simulating signal also can be digital signal.If digital signal, its sample frequency is preferably 8000Hz, and its quantization digit is preferably 16 bits.The output step will be exported voice signal and export described switch to after text-to-speech step 308.
According to a preferred embodiment of the present invention, conversation instant translation method 300 can also comprise language determining step 301, and it is used for determining the language that both call sides uses before cutting step 304.A kind of as source language, then another kind of as target language in the language that both call sides uses.For example, both call sides is respectively Chinese and American, be that the employed language of both call sides is Chinese and English, (Chinese say " feeding " by the initial greeting of both call sides, the American says " hello "), receive " feeding " and " hello " that external switch send and the language of determining the both call sides effect is Chinese and English in language determining step 301.Like this, in follow-up processing procedure, if input speech signal is the voice signal of Chinese, then source language is Chinese, and target language is English; Otherwise if input speech signal is English voice signal, then source language is English, and target language is Chinese.
According to a preferred embodiment of the present invention, conversation instant translation method 300 can also comprise gain control step 302, it is used in cutting step 304 front to the input speech signal control that gains, for example, the decibel value with the input speech signal that receives is adjusted to roughly unified setting value level.
Preferably, in gain control step 302, when the decibel value of the input speech signal that receives during less than setting value, decibel value is amplified to this setting value less than the input speech signal of this setting value; Otherwise, when the decibel value of the input speech signal that receives during greater than this setting value, decibel value is contracted to this setting value greater than the input speech signal of this setting value.This setting value can freely limit according to actual needs.
According to a preferred embodiment of the present invention, conversation instant translation method 300 can also comprise noise reduction process step 303, and it is used for input speech signal being carried out noise reduction process in that cutting step 304 is front.Noise reduction process can adopt the method for filtering.Preferably, noise reduction process step 303 comprises input speech signal is carried out Wiener filtering.
In addition, one of ordinary skill in the art will appreciate that Fig. 3 shows a kind of execution sequence of conversation instant translation method step in accordance with a preferred embodiment of the present invention, this order can be adjusted.For example, gain control step 302 can be carried out after noise reduction process step 303.
According to a preferred embodiment of the present invention, conversation instant translation method 300 can also comprise storing step 305, and it is used for cutting step 304 after larding speech with literary allusions before this step 306 the one or more audio files storage that will form after the cutting of cutting step 304 to storer with voice.The voice one or more audio files that this step 306 transcribes of larding speech with literary allusions come from this storer.
Fig. 4 shows the synoptic diagram of preferred embodiment of the verbal system of the conversation instant translation system that comprises in accordance with a preferred embodiment of the present invention.This verbal system 400 comprises the employed phone 401 of user's communication and phone 402, public switched telephone network (PSTN) 403, private branch exchange system (IP PBX) 404 and conversation instant translation system 405 provided by the present invention.Wherein, the employed phone 401 of user's communication and phone 402 also can replace with intelligent terminal, and correspondingly, PSTN 403 also can replace with internet voice transfer protocol (VOIP) network.
As shown in Figure 4, the both sides of conversation are respectively user 1 and user 2.Wherein, user's 1 employed language is A, and user's 2 employed language are B.A side who makes a phone call, for example, the user 1, by PSTN 403 dial-up users 2.IP PBX 404 sets up both sides' call connection.Subsequently, user 1 and user 2 begin conversation, and its voice that send separately enter conversation instant translation system 405 through IP PBX 404, and the voice after translation send corresponding user to by IP PBX respectively.The below specifically describes the workflow of verbal system 400.At first, set up the conversation connection that user 1 is connected with the user.Then, user 1 A language input speech signal S1 is sent to conversation instant translation system 405 via IP PBX 404.Subsequently, translated by conversation instant translation system 405, form the output voice signal S4 of B language performance.At last, IP PBX 404 detects this signal S4, is sent to user 2.One of ordinary skill in the art will appreciate that, in the description of said process, omitted PSTN and IP PBX to the routine operation of voice signal, to avoid covering the present invention.Like this, user 2 just can hear the user's 1 who expresses with the language (being the B language) of oneself voice.In like manner, when user 1 words responded in user's 2 usefulness B language, user 1 also can hear the user's 2 of A language performance voice.Optionally, user 1 and user 2 can also hear the voice without translation except hearing the other side's voice with own language.
Use conversation instant translation system provided by the invention and method, the both call sides of language obstacle utilizes traditional public phone exchanges network or VOIP network etc. can realize real-time freely exchanging.
The present invention is illustrated by above-described embodiment, but should be understood that, above-described embodiment just is used for for example and the purpose of explanation, but not is intended to the present invention is limited in the described scope of embodiments.It will be appreciated by persons skilled in the art that in addition the present invention is not limited to above-described embodiment, can also make more kinds of variants and modifications according to instruction of the present invention, these variants and modifications all drop in the present invention's scope required for protection.Protection scope of the present invention is defined by the appended claims and equivalent scope thereof.

Claims (20)

1. a conversation instant translation system comprises sheer, speech recognition equipment, translating equipment and speech synthetic device, wherein,
Described sheer is used for being connected to switch and is one or more audio files with the input speech signal cutting;
Described speech recognition equipment links to each other with described sheer, is used for described one or more audio files are transcribed into the text of source language;
Described translating equipment links to each other with described speech recognition equipment, and the text translation that is used for described source language is the text of target language; And
Described speech synthetic device links to each other with described translating equipment, is used for the text-converted of described target language is the output voice signal, and exports to described switch.
2. system according to claim 1 is characterized in that, described system also comprises:
Storer, it is connected between described sheer and the described speech recognition equipment;
Wherein, described sheer also is used for described one or more audio files storage to described storer; And
Described one or more audio files that described speech recognition equipment is transcribed come from described storer.
3. system according to claim 1 is characterized in that, described system also comprises:
Device determined in language, and it links to each other with described sheer, is used for the language of determining that both call sides uses;
Wherein, a kind of as described source language, another kind of as described target language in the language that described both call sides uses.
4. system according to claim 1 is characterized in that, described system also comprises:
Input interface is used for receiving described input speech signal from described switch; And
Output interface is used for exporting described output voice signal to described switch.
5. system according to claim 1 is characterized in that, described sheer further comprises:
Detecting unit is for detection of the quiet part in the described input speech signal; And
Cutting unit, being used for based on the quiet part that detects is described one or more audio file with described input speech signal cutting.
6. system according to claim 5 is characterized in that, described quiet part is included in the part that decibel value in time period more than 0.6 second or 0.6 second is less than or equal to noise threshold.
7. system according to claim 1 is characterized in that, described system also comprises:
Automatic gain controller, it links to each other with described sheer, is used for and to the control that gains of described input speech signal.
8. system according to claim 7 is characterized in that, described automatic gain controller further comprises:
Amplifying unit is used for decibel value is amplified to described setting value less than the described input speech signal of setting value; And
Dwindle the unit, be used for decibel value is contracted to described setting value greater than the described input speech signal of described setting value.
9. system according to claim 1 is characterized in that, described system also comprises:
Wave filter, it links to each other with described sheer, is used for described input speech signal is carried out noise reduction process.
10. system according to claim 9 is characterized in that, described wave filter is S filter.
11. a conversation instant translation method comprises:
Be one or more audio files with the input speech signal cutting;
Described one or more audio files are transcribed into the text of source language;
Be the text of target language with the text translation of described source language; And
The text-converted of described target language is the output voice signal.
12. method according to claim 11 is characterized in that, also comprises after the described cutting:
With described one or more audio files storage to storer; And
Described one or more audio files of transcribing come from described storer.
13. method according to claim 11 is characterized in that, also comprises before the described cutting:
Determine the language that both call sides uses;
Wherein, a kind of as described source language, another kind of as described target language in the language that described both call sides uses.
14. method according to claim 11 is characterized in that,
Also comprise before the described cutting: receive described input speech signal from switch; And
Also comprise after the described conversion: export described output voice signal to described switch.
15. method according to claim 11 is characterized in that, described cutting further comprises:
Detect the quiet part in the described input speech signal; And
Be described one or more audio file based on the quiet part that detects with described input speech signal cutting.
16. method according to claim 15 is characterized in that, described quiet part is included in the part that decibel value in time period more than 0.6 second or 0.6 second is less than or equal to noise threshold.
17. method according to claim 11 is characterized in that, also comprises before the described cutting: to the control that gains of described input speech signal.
18. method according to claim 17 is characterized in that, described gain control further comprises:
Decibel value is amplified to described setting value less than the described input speech signal of setting value; And
Decibel value is contracted to described setting value greater than the described input speech signal of described setting value.
19. method according to claim 11 is characterized in that, also comprises before the described cutting: described input speech signal is carried out noise reduction process.
20. method according to claim 19 is characterized in that, described noise reduction process further comprises carries out Wiener filtering to described input speech signal.
CN2012103909731A 2012-10-15 2012-10-15 Instant call translation system and instant call translation method Pending CN102903361A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2012103909731A CN102903361A (en) 2012-10-15 2012-10-15 Instant call translation system and instant call translation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2012103909731A CN102903361A (en) 2012-10-15 2012-10-15 Instant call translation system and instant call translation method

Publications (1)

Publication Number Publication Date
CN102903361A true CN102903361A (en) 2013-01-30

Family

ID=47575565

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2012103909731A Pending CN102903361A (en) 2012-10-15 2012-10-15 Instant call translation system and instant call translation method

Country Status (1)

Country Link
CN (1) CN102903361A (en)

Cited By (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103167360A (en) * 2013-02-21 2013-06-19 中国对外翻译出版有限公司 Method for achieving multilingual subtitle translation
CN103226947A (en) * 2013-03-27 2013-07-31 广东欧珀移动通信有限公司 Mobile terminal-based audio processing method and device
CN103647880A (en) * 2013-12-13 2014-03-19 南京丰泰通信技术股份有限公司 Telephone set having function of telephone text translation
CN103647567A (en) * 2013-12-13 2014-03-19 南京丰泰通信技术股份有限公司 Radio cassette recorder having functions of radio receiving and recording and text translation
CN103646664A (en) * 2013-12-13 2014-03-19 南京丰泰通信技术股份有限公司 Recording machine with function of translating records into telegraph text
CN103647566A (en) * 2013-12-13 2014-03-19 南京丰泰通信技术股份有限公司 Radio set with radio receiving and message translating functions
CN104252861A (en) * 2014-09-11 2014-12-31 百度在线网络技术(北京)有限公司 Video voice conversion method, video voice conversion device and server
CN104427294A (en) * 2013-08-29 2015-03-18 中兴通讯股份有限公司 Method for supporting video conference simultaneous interpretation and cloud-terminal server thereof
CN104754536A (en) * 2013-12-27 2015-07-01 中国移动通信集团公司 Method and system for realizing communication between different languages
CN105139849A (en) * 2015-07-22 2015-12-09 百度在线网络技术(北京)有限公司 Speech recognition method and apparatus
CN105828101A (en) * 2016-03-29 2016-08-03 北京小米移动软件有限公司 Method and device for generation of subtitles files
CN106598982A (en) * 2015-10-15 2017-04-26 比亚迪股份有限公司 Method and device for creating language databases and language translation method and device
CN107291704A (en) * 2017-05-26 2017-10-24 北京搜狗科技发展有限公司 Treating method and apparatus, the device for processing
CN107316639A (en) * 2017-05-19 2017-11-03 北京新美互通科技有限公司 A kind of data inputting method and device based on speech recognition, electronic equipment
CN108281145A (en) * 2018-01-29 2018-07-13 南京地平线机器人技术有限公司 Method of speech processing, voice processing apparatus and electronic equipment
WO2018195704A1 (en) * 2017-04-24 2018-11-01 Beijing Didi Infinity Technology And Development Co., Ltd. System and method for real-time transcription of an audio signal into texts
CN108810291A (en) * 2014-05-23 2018-11-13 三星电子株式会社 The system and method that " voice-message " calling service is provided
CN108847237A (en) * 2018-07-27 2018-11-20 重庆柚瓣家科技有限公司 continuous speech recognition method and system
CN109036451A (en) * 2018-07-13 2018-12-18 深圳市小瑞科技股份有限公司 A kind of simultaneous interpretation terminal and its simultaneous interpretation system based on artificial intelligence
CN109102804A (en) * 2018-08-17 2018-12-28 飞救医疗科技(赣州)有限公司 A kind of method and its system of the input of voice case history terminal
WO2019019135A1 (en) * 2017-07-28 2019-01-31 深圳市沃特沃德股份有限公司 Voice translation method and device
CN109754808A (en) * 2018-12-13 2019-05-14 平安科技(深圳)有限公司 Method, apparatus, computer equipment and the storage medium of voice conversion text
CN110473519A (en) * 2018-05-11 2019-11-19 北京国双科技有限公司 A kind of method of speech processing and device
CN110730360A (en) * 2019-10-25 2020-01-24 北京达佳互联信息技术有限公司 Video uploading and playing methods and devices, client equipment and storage medium
CN111046680A (en) * 2018-10-15 2020-04-21 华为技术有限公司 Translation method and electronic equipment
CN112037768A (en) * 2019-05-14 2020-12-04 北京三星通信技术研究有限公司 Voice translation method and device, electronic equipment and computer readable storage medium
CN112435666A (en) * 2020-09-30 2021-03-02 远传融创(杭州)科技有限公司 Intelligent voice digital communication method based on deep learning model
US11893359B2 (en) 2018-10-15 2024-02-06 Huawei Technologies Co., Ltd. Speech translation method and terminal when translated speech of two users are obtained at the same time

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1334532A (en) * 2000-07-13 2002-02-06 白涛 Automatic simultaneous interpretation system between multiple languages for GSM
CN1870728A (en) * 2005-05-23 2006-11-29 北京大学 Method and system for automatic subtilting
CN101867632A (en) * 2009-06-12 2010-10-20 刘越 Mobile phone speech instant translation system and method
CN102227767A (en) * 2008-11-12 2011-10-26 Scti控股公司 System and method for automatic speach to text conversion
US8059641B1 (en) * 2006-07-20 2011-11-15 Avaya Inc. Encapsulation method discovery protocol for network address translation gateway traversal

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1334532A (en) * 2000-07-13 2002-02-06 白涛 Automatic simultaneous interpretation system between multiple languages for GSM
CN1870728A (en) * 2005-05-23 2006-11-29 北京大学 Method and system for automatic subtilting
US8059641B1 (en) * 2006-07-20 2011-11-15 Avaya Inc. Encapsulation method discovery protocol for network address translation gateway traversal
CN102227767A (en) * 2008-11-12 2011-10-26 Scti控股公司 System and method for automatic speach to text conversion
CN101867632A (en) * 2009-06-12 2010-10-20 刘越 Mobile phone speech instant translation system and method

Cited By (41)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103167360A (en) * 2013-02-21 2013-06-19 中国对外翻译出版有限公司 Method for achieving multilingual subtitle translation
CN103226947A (en) * 2013-03-27 2013-07-31 广东欧珀移动通信有限公司 Mobile terminal-based audio processing method and device
CN103226947B (en) * 2013-03-27 2016-08-17 广东欧珀移动通信有限公司 A kind of audio-frequency processing method based on mobile terminal and device
CN104427294A (en) * 2013-08-29 2015-03-18 中兴通讯股份有限公司 Method for supporting video conference simultaneous interpretation and cloud-terminal server thereof
CN103647880B (en) * 2013-12-13 2015-11-18 南京丰泰通信技术股份有限公司 A kind of telephone set with telephone text translation function
CN103647566A (en) * 2013-12-13 2014-03-19 南京丰泰通信技术股份有限公司 Radio set with radio receiving and message translating functions
CN103646664A (en) * 2013-12-13 2014-03-19 南京丰泰通信技术股份有限公司 Recording machine with function of translating records into telegraph text
CN103647567A (en) * 2013-12-13 2014-03-19 南京丰泰通信技术股份有限公司 Radio cassette recorder having functions of radio receiving and recording and text translation
CN103646664B (en) * 2013-12-13 2016-01-13 南京丰泰通信技术股份有限公司 A kind of sound-track engraving apparatus with recording text translation function
CN103647880A (en) * 2013-12-13 2014-03-19 南京丰泰通信技术股份有限公司 Telephone set having function of telephone text translation
CN104754536A (en) * 2013-12-27 2015-07-01 中国移动通信集团公司 Method and system for realizing communication between different languages
CN108810291A (en) * 2014-05-23 2018-11-13 三星电子株式会社 The system and method that " voice-message " calling service is provided
CN108810291B (en) * 2014-05-23 2021-04-20 三星电子株式会社 System and method for providing voice-message call service
CN104252861A (en) * 2014-09-11 2014-12-31 百度在线网络技术(北京)有限公司 Video voice conversion method, video voice conversion device and server
CN105139849A (en) * 2015-07-22 2015-12-09 百度在线网络技术(北京)有限公司 Speech recognition method and apparatus
CN105139849B (en) * 2015-07-22 2017-05-10 百度在线网络技术(北京)有限公司 Speech recognition method and apparatus
CN106598982A (en) * 2015-10-15 2017-04-26 比亚迪股份有限公司 Method and device for creating language databases and language translation method and device
CN105828101A (en) * 2016-03-29 2016-08-03 北京小米移动软件有限公司 Method and device for generation of subtitles files
CN105828101B (en) * 2016-03-29 2019-03-08 北京小米移动软件有限公司 Generate the method and device of subtitle file
WO2018195704A1 (en) * 2017-04-24 2018-11-01 Beijing Didi Infinity Technology And Development Co., Ltd. System and method for real-time transcription of an audio signal into texts
CN107316639A (en) * 2017-05-19 2017-11-03 北京新美互通科技有限公司 A kind of data inputting method and device based on speech recognition, electronic equipment
CN107291704A (en) * 2017-05-26 2017-10-24 北京搜狗科技发展有限公司 Treating method and apparatus, the device for processing
CN107291704B (en) * 2017-05-26 2020-12-11 北京搜狗科技发展有限公司 Processing method and device for processing
WO2019019135A1 (en) * 2017-07-28 2019-01-31 深圳市沃特沃德股份有限公司 Voice translation method and device
CN108281145A (en) * 2018-01-29 2018-07-13 南京地平线机器人技术有限公司 Method of speech processing, voice processing apparatus and electronic equipment
CN108281145B (en) * 2018-01-29 2021-07-02 南京地平线机器人技术有限公司 Voice processing method, voice processing device and electronic equipment
CN110473519A (en) * 2018-05-11 2019-11-19 北京国双科技有限公司 A kind of method of speech processing and device
CN110473519B (en) * 2018-05-11 2022-05-27 北京国双科技有限公司 Voice processing method and device
CN109036451A (en) * 2018-07-13 2018-12-18 深圳市小瑞科技股份有限公司 A kind of simultaneous interpretation terminal and its simultaneous interpretation system based on artificial intelligence
CN108847237A (en) * 2018-07-27 2018-11-20 重庆柚瓣家科技有限公司 continuous speech recognition method and system
CN109102804A (en) * 2018-08-17 2018-12-28 飞救医疗科技(赣州)有限公司 A kind of method and its system of the input of voice case history terminal
CN111046680A (en) * 2018-10-15 2020-04-21 华为技术有限公司 Translation method and electronic equipment
CN111046680B (en) * 2018-10-15 2022-05-24 华为技术有限公司 Translation method and electronic equipment
US11570299B2 (en) 2018-10-15 2023-01-31 Huawei Technologies Co., Ltd. Translation method and electronic device
US11843716B2 (en) 2018-10-15 2023-12-12 Huawei Technologies Co., Ltd. Translation method and electronic device
US11893359B2 (en) 2018-10-15 2024-02-06 Huawei Technologies Co., Ltd. Speech translation method and terminal when translated speech of two users are obtained at the same time
CN109754808A (en) * 2018-12-13 2019-05-14 平安科技(深圳)有限公司 Method, apparatus, computer equipment and the storage medium of voice conversion text
CN109754808B (en) * 2018-12-13 2024-02-13 平安科技(深圳)有限公司 Method, device, computer equipment and storage medium for converting voice into text
CN112037768A (en) * 2019-05-14 2020-12-04 北京三星通信技术研究有限公司 Voice translation method and device, electronic equipment and computer readable storage medium
CN110730360A (en) * 2019-10-25 2020-01-24 北京达佳互联信息技术有限公司 Video uploading and playing methods and devices, client equipment and storage medium
CN112435666A (en) * 2020-09-30 2021-03-02 远传融创(杭州)科技有限公司 Intelligent voice digital communication method based on deep learning model

Similar Documents

Publication Publication Date Title
CN102903361A (en) Instant call translation system and instant call translation method
CN111128126B (en) Multi-language intelligent voice conversation method and system
US20080077387A1 (en) Machine translation apparatus, method, and computer program product
US20040073423A1 (en) Phonetic speech-to-text-to-speech system and method
CN103903627A (en) Voice-data transmission method and device
US20100217591A1 (en) Vowel recognition system and method in speech to text applictions
CN104679729A (en) Recorded message effective processing method and system
CN110415680B (en) Simultaneous interpretation method, simultaneous interpretation device and electronic equipment
WO2020198799A1 (en) Instant messaging/chat system with translation capability
CN111199160A (en) Instant call voice translation method and device and terminal
Abushariah et al. Modern standard Arabic speech corpus for implementing and evaluating automatic continuous speech recognition systems
CN103856602A (en) System and method for duplicating call
CN111081219A (en) End-to-end voice intention recognition method
CN111667834B (en) Hearing-aid equipment and hearing-aid method
EP2763136B1 (en) Method and system for obtaining relevant information from a voice communication
TW200304638A (en) Network-accessible speaker-dependent voice models of multiple persons
CN111739536A (en) Audio processing method and device
CN102056093A (en) Method for converting text message into voice message
CN110767233A (en) Voice conversion system and method
JP2018036320A (en) Sound processing method, sound processing device, and program
CN107886940B (en) Voice translation processing method and device
CN103474062A (en) Voice identification method
CN113076747A (en) Voice recognition recording method based on role recognition
CN102196100A (en) Instant call translation system and method
CN109616116B (en) Communication system and communication method thereof

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 1177546

Country of ref document: HK

C12 Rejection of a patent application after its publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20130130