CN107992485A - A kind of simultaneous interpretation method and device - Google Patents
A kind of simultaneous interpretation method and device Download PDFInfo
- Publication number
- CN107992485A CN107992485A CN201711207834.XA CN201711207834A CN107992485A CN 107992485 A CN107992485 A CN 107992485A CN 201711207834 A CN201711207834 A CN 201711207834A CN 107992485 A CN107992485 A CN 107992485A
- Authority
- CN
- China
- Prior art keywords
- source language
- speech data
- data
- target
- language speech
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 46
- 238000013519 translation Methods 0.000 claims abstract description 143
- 230000015572 biosynthetic process Effects 0.000 claims abstract description 82
- 238000003786 synthesis reaction Methods 0.000 claims abstract description 82
- 230000001360 synchronised effect Effects 0.000 claims description 9
- 235000013399 edible fruits Nutrition 0.000 claims description 7
- 230000004044 response Effects 0.000 claims description 6
- 230000007306 turnover Effects 0.000 claims description 3
- 238000012545 processing Methods 0.000 description 26
- 230000004927 fusion Effects 0.000 description 12
- 238000010586 diagram Methods 0.000 description 11
- 238000004891 communication Methods 0.000 description 10
- 238000005516 engineering process Methods 0.000 description 10
- 238000006243 chemical reaction Methods 0.000 description 5
- 230000004048 modification Effects 0.000 description 5
- 238000012986 modification Methods 0.000 description 5
- 230000003287 optical effect Effects 0.000 description 4
- 230000005236 sound signal Effects 0.000 description 4
- 230000000712 assembly Effects 0.000 description 3
- 238000000429 assembly Methods 0.000 description 3
- 230000008859 change Effects 0.000 description 3
- 230000006872 improvement Effects 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 230000033764 rhythmic process Effects 0.000 description 3
- 238000001228 spectrum Methods 0.000 description 3
- 230000001133 acceleration Effects 0.000 description 2
- 230000006399 behavior Effects 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000007726 management method Methods 0.000 description 2
- 238000010422 painting Methods 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 238000009987 spinning Methods 0.000 description 2
- 230000002194 synthesizing effect Effects 0.000 description 2
- KLDZYURQCUYZBL-UHFFFAOYSA-N 2-[3-[(2-hydroxyphenyl)methylideneamino]propyliminomethyl]phenol Chemical compound OC1=CC=CC=C1C=NCCCN=CC1=CC=CC=C1O KLDZYURQCUYZBL-UHFFFAOYSA-N 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 239000003795 chemical substances by application Substances 0.000 description 1
- 235000009508 confectionery Nutrition 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 201000001098 delayed sleep phase syndrome Diseases 0.000 description 1
- 208000033921 delayed sleep phase type circadian rhythm sleep disease Diseases 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 239000004744 fabric Substances 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 239000012092 media component Substances 0.000 description 1
- 238000002844 melting Methods 0.000 description 1
- 230000008018 melting Effects 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/58—Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Machine Translation (AREA)
Abstract
The embodiment of the present invention provides a kind of simultaneous interpretation method and apparatus, the described method includes:Gather source language speech data;Obtain and export voiced translation result that is corresponding with the recognition result of the source language speech data, being represented with object language;Wherein, the voiced translation result is obtained by natural person's phonetic synthesis, and the target voice belongs to different language with the original language.The embodiment of the present invention can realize automatic speech recognition and translation, reduce human cost, improve the accuracy and integrality of translation result, voiced translation result has natural person's phonetic feature, is effectively improved audio experience.
Description
Technical field
The present embodiments relate to field of computer technology, and in particular to a kind of simultaneous interpretation method and device.
Background technology
At present, more and more scenes need to use simultaneous interpretation technology.Traditional simultaneous interpretation technology be spokesman A into
The speech of row original language, object language output is transcribed into by translator B.But this mode is, it is necessary to extra translation
Personnel are translated, and can not realize that automatic speech recognition is handled with translation.Since human translation is likely to result in dredging for word
Leakage or translation error, the integrality of translation result, accuracy it is poor therefore, simultaneous interpretation method existing in the prior art exists
Of high cost, translation integrity, accuracy difference defect.
The content of the invention
An embodiment of the present invention provides a kind of simultaneous interpretation method and device, it is intended to solves prior art simultaneous interpretation method
The technical problem of existing of high cost, translation integrity, accuracy difference.
For this reason, the embodiment of the present invention provides following technical solution:
In a first aspect, an embodiment of the present invention provides a kind of simultaneous interpretation method, including:Gather source language speech data;
Obtain and export voiced translation result that is corresponding with the recognition result of the source language speech data, being represented with object language;
Wherein, the voiced translation result is obtained by natural person's phonetic synthesis, and the target voice belongs to different languages from the original language
Kind.
Second aspect, an embodiment of the present invention provides a kind of synchronous translation apparatus, including:Collecting unit, for gathering source
Language voice data;Acquiring unit, for obtain and export it is corresponding with the recognition result of the source language speech data, with mesh
The voiced translation result that poster speech represents;Wherein, the voiced translation result is obtained by natural person's phonetic synthesis, the target language
Sound belongs to different language with the original language.
The third aspect, an embodiment of the present invention provides a kind of device for simultaneous interpretation, includes memory, Yi Jiyi
A either more than one program one of them or more than one program storage is configured to by one in memory
Or more than one processor performs the one or more programs and includes the instruction for being used for being operated below:Collection
Source language speech data;Obtain and export it is corresponding with the recognition result of the source language speech data, represented with object language
Voiced translation result;Wherein, the voiced translation result is obtained by natural person's phonetic synthesis, the target voice and the source
Language belongs to different language.
Fourth aspect, an embodiment of the present invention provides a kind of machine readable media, is stored thereon with instruction, when by one or
When multiple processors perform so that device performs the simultaneous interpretation method as shown in first aspect.
Simultaneous interpretation method and device provided in an embodiment of the present invention, can gather source language speech data, obtain and defeated
Go out voiced translation result that is corresponding with the recognition result of the source language speech data, being represented with object language.Wherein, it is described
Voiced translation result is obtained by natural person's phonetic synthesis, and the target voice belongs to different language with the original language.It is different from
The prior art can not human translation mode, method provided in an embodiment of the present invention can realize automatic speech recognition with translation,
Human cost is reduced, improves efficiency, effectively increases the integrality and accuracy of translation result.Further, since obtain
Voiced translation result is obtained by natural person's phonetic synthesis so that the auditory perception of audience is more natural, warm, significantly improves same
The quality that sound is interpreted, improves audio experience.
Brief description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing
There is attached drawing needed in technology description to be briefly described, it should be apparent that, drawings in the following description are only this
Some embodiments described in invention, for those of ordinary skill in the art, without creative efforts,
Other attached drawings can also be obtained according to these attached drawings.
Fig. 1 is the simultaneous interpretation method flow diagram that one embodiment of the invention provides;
Fig. 2 is the simultaneous interpretation method flow diagram that another embodiment of the present invention provides;
Fig. 3 is the synchronous translation apparatus schematic diagram that one embodiment of the invention provides;
Fig. 4 is a kind of block diagram with synchronous translation apparatus according to an exemplary embodiment;
Fig. 5 is the block diagram of the server according to an exemplary embodiment.
Embodiment
An embodiment of the present invention provides a kind of simultaneous interpretation method and device, it is possible to achieve automatic speech recognition and translation,
Human cost is reduced, improves the accuracy and integrality of translation result, voiced translation result has natural person's phonetic feature,
It is effectively improved audio experience.
In order to make those skilled in the art more fully understand the technical solution in the present invention, below in conjunction with of the invention real
The attached drawing in example is applied, the technical solution in the embodiment of the present invention is clearly and completely described, it is clear that described implementation
Example is only part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, this area is common
Technical staff's all other embodiments obtained without making creative work, should all belong to protection of the present invention
Scope.
The simultaneous interpretation method shown in exemplary embodiment of the present is introduced below in conjunction with attached drawing 1 to attached drawing 2.
Referring to Fig. 1, the simultaneous interpretation method flow diagram provided for one embodiment of the invention.As shown in Figure 1, it can include:
S101, gathers source language speech data.
During specific implementation, source language speech data can be gathered by the audio collection such as microphone unit.The original language
Data are specifically as follows the data to be translated for object language.For example, user is made a speech in meeting using Chinese, it is expected
To the simultaneous interpretation result that English is object language.Client can gather source language speech number by the microphone that user uses
According to.Before or after voice data is gathered, user can also select to need the object language being translated as by user interface, turn over
Result is translated to represent using object language.Usually, original language belongs to different language with object language.
S102, obtain and export it is with the recognition result of the source language speech data corresponding, represented with object language
Voiced translation result.
In some embodiments, it can not only obtain and export voiced translation as a result, can also obtain and export and institute
The text translation result that the recognition result of stating source language speech data is corresponding, is represented with object language.It is, of course, also possible to obtain
And export the recognition result of the source language speech data.
In some embodiments, two-way above voice data can be gathered at the same time, and is exported respectively to two-way above sound
The voiced translation result of frequency evidence.Wherein, the collection source language speech data include:Gather at least two source language speech numbers
According to the source of at least two source language speech data is different with phonetic feature;It is described to obtain and export and the original language
The voiced translation result that the recognition result of voice data is corresponding, is represented with object language includes:Obtain and export and institute respectively
At least two voiced translation knots that the recognition result of stating at least two source language speech data is corresponding, is represented with object language
Fruit.Wherein, the phonetic feature includes tamber characteristic and style and features.For example, in meeting and salon's scene, may go out
Existing men and women's dialogue, can gather the voice data of men and women's dialogue respectively, and export the English synthesis voiced translation knot of female voice respectively
Fruit and the English synthesis voiced translation result of male voice.
In some embodiments, the voiced translation result is obtained by natural person's phonetic synthesis, the target voice with
The original language belongs to different language.For example, the voiced translation result can have the tamber characteristic and/or wind of target audio
Lattice feature.Specifically, can be grasped in response to user for the selection of target sound color type and/or target stylistic category or switching
Make, determine target sound color type corresponding with the selection or handover operation and/or target stylistic category, obtain and the target sound
The tamber characteristic and/or style and features of color and/or the corresponding target audio of target style.It is described to obtain and export and the source language
The recognition result of saying voice data is corresponding, the voiced translation result that is represented with object language includes:Obtain and export with it is described
The recognition result of source language speech data is corresponding, being represented with object language, corresponding with target tone color and/or target style
Synthesize voiced translation result;The synthesis voiced translation result has tamber characteristic corresponding with target tone color, and/or, it is described
Synthesis voiced translation result has style and features corresponding with target style.
For example, user interface can be provided, the audio of synthesis is voluntarily selected for user.The audio of synthesis can include
The tone color type of synthesis, and/or stylistic category.Tone color type for example can include sweet type female voice, child's voice, vicissitudes type male voice,
Adult form male voice etc..Corresponding, tamber characteristic generally comprises spectrum signature, energy feature etc..Usually, style and features are used for table
Levy locution, feature of speaking or the language Symbols power of a people.In embodiments of the present invention, style and features generally refer to
At least one of duration and the higher duration prosodic features of rhythm fluctuating correlation, fundamental frequency feature, energy feature.The duration
Prosodic features generally comprise the duration of some character/word, pause, whether the feature such as stress.
When synthesizing voiced translation result, text translation result corresponding with the recognition result, target sound are generally utilized
Frequency synthesis obtains.Specifically, text feature data can be determined according to the text translation result, according to the tone color of target audio
Feature and/or style and features carry out phonetic synthesis with the text feature data, obtain synthesis voice data as voiced translation
As a result.
In some embodiments, the voiced translation result has the tamber characteristic of source language data.Specifically, may be used
Source language speech data to be identified, and obtain voiced translation result corresponding with recognition result.Wherein, the voiced translation
As a result synthesize to obtain according to the source language speech data and text translation result corresponding with the recognition result;The text
This translation result represents that the object language belongs to different language with the original language with object language;The voiced translation knot
Fruit at least has the tamber characteristic of the source language speech data.
In one implementation, the tamber characteristic of the spokesman with source language speech data can be obtained, with target
The voiced translation that language is presented is as a result, so as to reach the translation result of " in unison ".Specifically, according to the source language speech data
And text translation result corresponding with the recognition result is synthesized to obtain voiced translation result and included:
(1) text feature data are determined according to the text translation result.
It should be noted that after source language speech data are gathered, voice recognition processing can be carried out to the voice data,
Obtain voice recognition result.Further, which is translated, the text for obtaining representing with object language turns over
Translate result.Text feature data are determined according to text translation result., can be with for given any text during specific implementation
Text feature data are obtained by text analyzing.The present invention without limiting, can adopt the mode for obtaining text feature data
Carried out with the method for the prior art.
(2) tamber characteristic of the source language speech data is obtained.
Wherein, it is special to generally comprise the spectrum signature of the voice data, fundamental frequency for the tamber characteristic of the source language speech data
Sign etc..
(3) phonetic synthesis is carried out according to the tamber characteristic of the source language speech data and the text feature data, obtained
To synthesis voice data as voiced translation result.
In a kind of possible implementation, can according to the spectrum signature and/or fundamental frequency feature of source language speech data,
The text feature data carry out phonetic synthesis, obtain synthesis voice data.The synthesis voice data has source language speech number
According to spokesman tamber characteristic but presented with object language.For example, user A has said that in short " hello using Chinese
", obtained synthesis voice is English voice " hello ", and the English voice has the tamber characteristic of user A.
In alternatively possible implementation, the style and features of the source language data can also be obtained.The style
Feature includes at least one of duration prosodic features, fundamental frequency feature, energy feature.Usually, style and features are used to characterize one
Personal locution, feature of speaking or language Symbols power.In embodiments of the present invention, style and features generally refer to and duration
At least one of higher duration prosodic features, fundamental frequency feature, energy feature with rhythm fluctuating correlation.The duration rhythm
Feature generally comprise the duration of some character/word, pause, whether the feature such as stress.
Correspondingly, it is described that voice is carried out according to the tamber characteristic of the source language speech data and the text feature data
Synthesis, obtaining the voiced translation result of object language includes:According to the tamber characteristic of the source language speech data, the source language
The style and features and the text feature data for saying data carry out phonetic synthesis, obtain synthesis voice data as voiced translation knot
Fruit;The voiced translation result has the tamber characteristic and style and features of the source language speech data.In this realization, language
Sound translation result not only has the tamber characteristic of source language speech data spokesman, also with source language speech data spokesman's
Style and features.In this way, realize the simultaneous interpretation translation of " same to sensual pleasure " " the same to painting style ".
In some possible implementations, the style and features of target style audio can also be obtained;The style and features
Including at least one of duration prosodic features, fundamental frequency feature, energy feature;The target style audio and object language pair
Should.For example, original language is Chinese, target style audio is English, and has certain style and features, such as Donald Trump
Locution.
Correspondingly, it is described that voice is carried out according to the tamber characteristic of the source language speech data and the text feature data
Synthesis, obtaining the voiced translation result of object language includes:According to the tamber characteristic of the source language speech data, the target
The style and features of style audio and the text feature data carry out phonetic synthesis, obtain synthesis voice data as voiced translation
As a result;The voiced translation result has the tamber characteristic of the source language speech data and the style of the target style audio
Feature.That is, in this realization, voiced translation result remains the tone color of original language spokesman, but has target language
The style and features of another spokesman of speech so that voiced translation result is more in line with the custom of object language audience, so as to fulfill
The simultaneous interpretation translation of " same to sensual pleasure " " the different paintings style ".
In some embodiments, in response to selection of the user for target style or handover operation, definite and the selection
Or the corresponding target style of handover operation, the style and features of acquisition target style audio corresponding with the target style.Citing
Illustrate, some styles can be provided and select or switch for user.
In some embodiments, tamber characteristic, the target style sound according to the source language speech data
The style and features of frequency and the text feature data carry out phonetic synthesis, obtain synthesis voice data as voiced translation result bag
Include implemented below step:
A, according to the text feature data, the duration prosodic features of the target style audio, the source language speech
The tamber characteristic of data obtains the acoustic feature data of the source language speech data.
In some possible embodiments, it is described according to the text feature data, the target style audio when
The acoustic feature that long prosodic features, the tamber characteristic of the source language speech data obtain the source language speech data includes:
Target duration is determined according to the duration prosodic features of the target style audio;According to the text feature data, the target
Duration, the tamber characteristic of the source language speech data obtain the acoustic feature data of the source language speech data.At this
In implementation, target duration is determined according to the duration prosodic features of target style audio, the prior art is instead of and uses source
Language voice data predicts the mode of duration.And then according to text feature data, the target duration, the original language language
The tamber characteristic of sound data obtains the acoustic feature data of the source language speech data.
In some possible embodiments, it is described according to the text feature data, the target style audio when
The acoustic feature that long prosodic features, the tamber characteristic of the source language speech data obtain the source language speech data includes:
Prediction duration is obtained according to the duration characteristics of the text feature data, the source language speech data;During according to the prediction
It is long to carry out linear interpolation processing with target duration, obtain the duration characteristics after interpolation;The target duration is according to the target wind
The duration prosodic features of lattice voice data determines;According to the duration characteristics after the text feature data, the interpolation, the source
The tamber characteristic of language voice data obtains the acoustic feature data of the source language speech data.It should be noted that closing
Into when spinning out sound, it is possible that the situation that synthetic effect is unstable., can be according to source language speech in order to improve this case
Duration characteristics, this article notebook data of data obtain prediction duration, are determined according to the duration prosodic features of target style voice data
Target duration.Linear interpolation processing is carried out with target duration according to the prediction duration, the duration characteristics after interpolation is obtained, utilizes
The tamber characteristic of duration characteristics and source language speech data after difference obtains acoustic feature data.
B, by the fundamental frequency feature of the target style audio and/or energy feature and the acoustics of the source language speech data
Characteristic is merged, the acoustic feature data after being merged.
, can be by the fundamental frequency feature of target style audio and/or energy feature and the source language speech during specific implementation
Fundamental frequency feature and/or energy feature in the acoustic feature data of data carry out Fusion Features respectively, the acoustics after being merged
Characteristic.
Wherein, Feature Fusion Algorithm can be very flexible, be one of which example below:
Str(n)=(T (n) * Smean/Tmean)*w+S(n)*(1-w),where 0≤w≤1.0
Wherein, Str(n) n-th frame fundamental frequency (or energy) feature being characterized after fusion, living person n-th is said in source when S (n) is synthesizes
Fundamental frequency (or energy) feature of frame prediction, T (n) are that the target of extraction says fundamental frequency (or energy) feature of living person's n-th frame prediction,
SmeanThe characteristic mean in expression source speaker's sound storehouse, TmeanRepresent the corresponding characteristic mean of target speaker's audio, w represents fusion
Coefficient.
In some embodiments, duration characteristics are obtained according to the mode of prediction duration and target duration linear interpolation,
Then after step B, further include:After the acoustic feature data after being merged, to the acoustic feature data after fusion into line
Property interpolation processing so that the duration of the acoustic feature data after the fusion is consistent with the target duration.
C, the acoustic feature data conversion is obtained into style and features, the source with target style audio into speech waveform
The synthesis voice data of the tamber characteristic of language voice data.
Merge to obtain the sound of source audio according to the duration prosodic features of target style audio different from former embodiment
The mode of feature is learned, in other embodiments, prediction duration can be obtained according to the duration characteristics of source audio, according to prediction
Duration and other style and features of target style audio merged after acoustic feature, then to the acoustic feature carry out it is poor
Value, to reduce the influence for spinning out sound.
Specifically, the tamber characteristic according to the source language speech data, the style of the target style audio are special
The text feature data of seeking peace carry out phonetic synthesis, and obtain synthesis voice data includes as voiced translation result:
A ', institute is obtained according to the text feature data, the tamber characteristic of the source language speech data and duration characteristics
State the acoustic feature data of source language speech data.
In this implementation, it can be obtained according to the duration characteristics of text feature data, source language speech data pre-
Duration is surveyed, acoustic feature data are obtained according to the tamber characteristic of the prediction duration, source language speech data.
B ', by the fundamental frequency feature of the target style audio and/or energy feature and the sound of the source language speech data
Learn characteristic to be merged, the acoustic feature data after being merged.
Acoustic feature data after fusion are carried out linear interpolation processing so that the acoustic feature number after the fusion by C '
According to duration it is consistent with target duration.The target duration is true according to the duration prosodic features of the target style voice data
It is fixed.
D ', by the acoustic feature data conversion after processing into speech waveform, obtains having the style of target style audio special
Levy, the synthesis voice data of the tamber characteristic of source language speech data.
In some embodiments, can be with order to remove the style relevance of the source spokesman in source language speech data
Remove the status information of source language speech data., can be according to the text feature data, target wind when synthesizing voice data
The style and features of lattice audio and the tamber characteristic of the source language speech data of removal status information carry out phonetic synthesis, are closed
Into voice data.
In embodiments of the present invention, the style and features of target style audio can be fused in source language speech data,
So that the voice after synthesis has more prosodic features, with more expressive force, the quality of phonetic synthesis is effectively increased.
It should be noted that specifically executive agent, above-mentioned steps can be held the unlimited system of the embodiment of the present invention by client
OK, it can also be performed, can also partly be performed partly by client executing by server by server.
Be more clearly understood that embodiment of the present invention under concrete scene for the ease of those skilled in the art, below with
Embodiment of the present invention is introduced in one specific example.It should be noted that the specific example is only so that this area skill
Art personnel more clearly understand the present invention, but embodiments of the present invention are not limited to the specific example.
S201, client collection source language speech data.
During specific implementation, user's manual operation is not required after client operation, as long as spokesman says " beginning simultaneous interpretation " i.e.
Identification interpretative function can be opened, that is, starts to perform S202.
The source language speech data are carried out serializing processing, obtain the source language speech of serializing by S202, client
Data.
S203, client is by the source language speech data sending of serializing to server.
S204, server carry out unserializing processing to the original language of the serializing of reception.
S205, server carry out speech recognition to source language speech data, obtain language recognition result.
S206, server carry out translation processing to voice recognition result, obtain text translation result.
S207, server by utilizing text translation result and source language speech data obtain synthesis voiced translation result.
Specific implementation is referred to the method shown in Fig. 1.
S208, server will synthesize voiced translation result, recognition result is sent to client.
S209, the client output synthesis voiced translation result.
Client can export the synthesis voiced translation result in a manner of voice.
The corresponding device of method provided in an embodiment of the present invention and equipment are introduced below.
Referring to Fig. 3, the synchronous translation apparatus schematic diagram provided for one embodiment of the invention.
A kind of synchronous translation apparatus 300, including:
Collecting unit 301, for gathering source language speech data.Wherein, the specific implementation of the collecting unit 301 can be with
Realized with reference to the step 101 of embodiment illustrated in fig. 1.
Acquiring unit 302, for obtain and export it is corresponding with the recognition result of the source language speech data, with target
The voiced translation result that language represents;Wherein, the voiced translation result is obtained by natural person's phonetic synthesis, the target voice
Belong to different language with the original language.Wherein, the specific implementation of the acquiring unit 302 is referred to embodiment illustrated in fig. 1
Step 102 and realize.
In some embodiments, described device further includes:
Text output unit, for obtain and export it is corresponding with the recognition result of the source language speech data, with mesh
The text translation result that poster speech represents;And/or
Recognition result output unit, for obtaining and exporting the recognition result of the source language speech data.
In some embodiments, the collecting unit is specifically used for:At least two source language speech data are gathered, it is described
The source of at least two source language speech data is different with phonetic feature;
The acquiring unit is specifically used for:
Obtain respectively and export it is corresponding with the recognition result of at least two source language speech data, with object language
At least two voiced translation results represented.
In some embodiments, described device further includes:
Determination unit, for the selection or switching in response to user for target sound color type and/or target stylistic category
Operation, determines target sound color type corresponding with the selection or handover operation and/or target stylistic category, obtains and the target
The tamber characteristic and/or style and features of tone color and/or the corresponding target audio of target style;
The acquiring unit is specifically used for:
Obtain and export, with object language representing and mesh corresponding with the recognition result of the source language speech data
Mark with phonetic symbols color and/or the corresponding synthesis voiced translation result of target style;The synthesis voiced translation result has and target tone color
Corresponding tamber characteristic, and/or, the synthesis voiced translation result has style and features corresponding with target style.
In some embodiments, the acquiring unit is specifically used for:
Obtain and export the voiced translation of the tamber characteristic with source language speech data as a result, the voiced translation result
Synthesize to obtain according to the source language speech data and text translation result corresponding with the recognition result;The text turns over
Result is translated to represent with object language.
In some implementations, the acquiring unit includes:
Text feature data determination unit, for determining text feature data according to the text translation result;
Tamber characteristic determination unit, for obtaining the tamber characteristic of the source language speech data;
Phonetic synthesis unit, for the tamber characteristic according to the source language speech data and the text feature data into
Row phonetic synthesis, obtains synthesis voice data as voiced translation result.
In some embodiments, described device further includes:
First style and features determination unit, for obtaining the style and features of the source language data;The style and features bag
Include at least one of duration prosodic features, fundamental frequency feature, energy feature;
The phonetic synthesis unit specifically includes:
First phonetic synthesis unit, for the tamber characteristic according to the source language speech data, the source language data
Style and features and the text feature data carry out phonetic synthesis, obtain synthesis voice data as voiced translation result;Institute
Predicate sound translation result has the tamber characteristic and style and features of the source language speech data.
In some embodiments, described device further includes:
Second style and features determination unit, for obtaining the style and features of target style audio;The style and features include
At least one of duration prosodic features, fundamental frequency feature, energy feature;The target style audio is corresponding with object language;
The phonetic synthesis unit specifically includes:
Second phonetic synthesis unit, for the tamber characteristic according to the source language speech data, the target style sound
The style and features of frequency and the text feature data carry out phonetic synthesis, obtain synthesis voice data as voiced translation result;
The voiced translation result has the tamber characteristic of the source language speech data and the style and features of the target style audio.
In some embodiments, the second phonetic synthesis unit includes:
First integrated unit, for according to the text feature data, the target style audio duration prosodic features,
The tamber characteristic of the source language speech data obtains the acoustic feature data of the source language speech data;
Second integrated unit, for by the fundamental frequency feature of the target style audio and/or energy feature and the source language
The acoustic feature data of speech voice data are merged, the acoustic feature data after being merged;
First converting unit, for into speech waveform, obtaining the acoustic feature data conversion with target style sound
The synthesis voice data of the style and features of frequency, the tamber characteristic of source language speech data.
In some embodiments, the second phonetic synthesis unit includes:
Acoustic feature predicting unit, for special according to the text feature data, the tone color of the source language speech data
Duration characteristics of seeking peace obtain the acoustic feature data of the source language speech data;
3rd integrated unit, for by the fundamental frequency feature of the target style audio and/or energy feature and the source language
The acoustic feature data of speech voice data are merged, the acoustic feature data after being merged;
Feature interpolating unit, for carrying out linear interpolation processing to the acoustic feature data after fusion so that the fusion
The duration of acoustic feature data afterwards is consistent with target duration;The target duration according to the target style voice data when
Long prosodic features determines;
Second converting unit, for into speech waveform, obtaining the acoustic feature data conversion after processing with target wind
The style and features of lattice audio, the source language speech data tamber characteristic synthesis voice data.
Wherein, the setting of apparatus of the present invention each unit or module is referred to the method shown in Fig. 1 to Fig. 2 and realizes,
This is not repeated.
Referring to Fig. 4, for a kind of block diagram for synchronous translation apparatus according to an exemplary embodiment.Referring to Fig. 4,
For a kind of block diagram for synchronous translation apparatus according to an exemplary embodiment.For example, device 400 can be mobile electricity
Words, computer, digital broadcast terminal, messaging devices, game console, tablet device, Medical Devices, body-building equipment are a
Personal digital assistant etc..
With reference to Fig. 4, device 400 can include following one or more assemblies:Processing component 402, memory 404, power supply
Component 406, multimedia component 408, audio component 410, the interface 412 of input/output (I/O), sensor component 414, and
Communication component 416.
The integrated operation of the usual control device 400 of processing component 402, such as with display, call, data communication, phase
The operation that machine operates and record operation is associated.Processing component 402 can refer to including one or more processors 420 to perform
Order, to complete all or part of step of above-mentioned method.In addition, processing component 402 can include one or more modules, just
Interaction between processing component 402 and other assemblies.For example, processing component 402 can include multi-media module, it is more to facilitate
Interaction between media component 408 and processing component 402.
Memory 404 is configured as storing various types of data to support the operation in equipment 400.These data are shown
Example includes the instruction of any application program or method for being operated on device 400, and contact data, telephone book data, disappears
Breath, picture, video etc..Memory 404 can be by any kind of volatibility or non-volatile memory device or their group
Close and realize, as static RAM (SRAM), electrically erasable programmable read-only memory (EEPROM) are erasable to compile
Journey read-only storage (EPROM), programmable read only memory (PROM), read-only storage (ROM), magnetic memory, flash
Device, disk or CD.
Power supply module 406 provides electric power for the various assemblies of device 400.Power supply module 406 can include power management system
System, one or more power supplys, and other components associated with generating, managing and distributing electric power for device 400.
Multimedia component 408 is included in the screen of one output interface of offer between described device 400 and user.One
In a little embodiments, screen can include liquid crystal display (LCD) and touch panel (TP).If screen includes touch panel, screen
Curtain may be implemented as touch-screen, to receive input signal from the user.Touch panel includes one or more touch sensings
Device is to sense the gesture on touch, slip and touch panel.The touch sensor can not only sense touch or sliding action
Border, but also detect and the duration and pressure associated with the touch or slide operation.In certain embodiments, more matchmakers
Body component 408 includes a front camera and/or rear camera.When equipment 400 is in operator scheme, such as screening-mode or
During video mode, front camera and/or rear camera can receive exterior multi-medium data.Each front camera and
Rear camera can be a fixed optical lens system or have focusing and optical zoom capabilities.
Audio component 410 is configured as output and/or input audio signal.For example, audio component 410 includes a Mike
Wind (MIC), when device 400 is in operator scheme, during such as call model, logging mode and speech recognition mode, microphone by with
It is set to reception external audio signal.The received audio signal can be further stored in memory 404 or via communication set
Part 416 is sent.In certain embodiments, audio component 410 further includes a loudspeaker, for exports audio signal.
I/O interfaces 412 provide interface between processing component 402 and peripheral interface module, and above-mentioned peripheral interface module can
To be keyboard, click wheel, button etc..These buttons may include but be not limited to:Home button, volume button, start button and lock
Determine button.
Sensor component 414 includes one or more sensors, and the state for providing various aspects for device 400 is commented
Estimate.For example, sensor component 414 can detect opening/closed mode of equipment 400, and the relative positioning of component, for example, it is described
Component is the display and keypad of device 400, and sensor component 414 can be with 400 1 components of detection device 400 or device
Position change, the existence or non-existence that user contacts with device 400,400 orientation of device or acceleration/deceleration and device 400
Temperature change.Sensor component 414 can include proximity sensor, be configured to detect without any physical contact
Presence of nearby objects.Sensor component 414 can also include optical sensor, such as CMOS or ccd image sensor, for into
As being used in application.In certain embodiments, which can also include acceleration transducer, gyro sensors
Device, Magnetic Sensor, pressure sensor or temperature sensor.
Communication component 416 is configured to facilitate the communication of wired or wireless way between device 400 and other equipment.Device
400 can access the wireless network based on communication standard, such as WiFi, 2G or 3G, or combinations thereof.In an exemplary implementation
In example, communication component 414 receives broadcast singal or broadcast related information from external broadcasting management system via broadcast channel.
In one exemplary embodiment, the communication component 414 further includes near-field communication (NFC) module, to promote junction service.Example
Such as, in NFC module radio frequency identification (RFID) technology can be based on, Infrared Data Association (IrDA) technology, ultra wide band (UWB) technology,
Bluetooth (BT) technology and other technologies are realized.
In the exemplary embodiment, device 400 can be believed by one or more application application-specific integrated circuit (ASIC), numeral
Number processor (DSP), digital signal processing appts (DSPD), programmable logic device (PLD), field programmable gate array
(FPGA), controller, microcontroller, microprocessor or other electronic components are realized, for performing the above method.
Specifically, an embodiment of the present invention provides a kind of synchronous translation apparatus 400, memory 404, and one are included
Either more than one program one of them or more than one program storage is in memory 404, and is configured to by one
Or more than one processor 420 performs the one or more programs and includes the instruction for being used for being operated below:
Gather source language speech data;Obtain and export it is corresponding with the recognition result of the source language speech data, with object language
The voiced translation result of expression;Wherein, the voiced translation result is obtained by natural person's phonetic synthesis, the target voice and institute
State original language and belong to different language.
Further, the processor 420 is specific is additionally operable to perform the one or more programs and include to be used for
Carry out the instruction of following operation:Obtain and export it is corresponding with the recognition result of the source language speech data, with object language
The text translation result of expression;And/or obtain and export the recognition result of the source language speech data.
Further, the processor 420 is specific is additionally operable to perform the one or more programs and include to be used for
Carry out the instruction of following operation:At least two source language speech data are gathered, at least two source language speech data are come
Source is different with phonetic feature;Obtain respectively and export it is corresponding with the recognition result of at least two source language speech data,
At least two voiced translation results represented with object language.
Further, the processor 420 is specific is additionally operable to perform the one or more programs and include to be used for
Carry out the instruction of following operation:In response to user for the selection of target sound color type and/or target stylistic category or switching behaviour
Make, determine target sound color type corresponding with the selection or handover operation and/or target stylistic category, obtain and the target sound
The tamber characteristic and/or style and features of color and/or the corresponding target audio of target style;Obtain and export and the original language language
The synthesis voice corresponding, represented with object language, corresponding with target tone color and/or target style of recognition result of sound data
Translation result;The synthesis voiced translation result has tamber characteristic corresponding with target tone color, and/or, the synthesis voice
Translation result has style and features corresponding with target style.
Further, the processor 420 is specific is additionally operable to perform the one or more programs and include to be used for
Carry out the instruction of following operation:Obtain and export the voiced translation of the tamber characteristic with source language speech data as a result, described
Voiced translation result is synthesized according to the source language speech data and text translation result corresponding with the recognition result
Arrive;The text translation result is represented with object language.
Further, the processor 420 is specific is additionally operable to perform the one or more programs and include to be used for
Carry out the instruction of following operation:Text feature data are determined according to the text translation result;Obtain the source language speech number
According to tamber characteristic;Phonetic synthesis is carried out according to the tamber characteristic of the source language speech data and the text feature data,
Synthesis voice data is obtained as voiced translation result.
Further, the processor 420 is specific is additionally operable to perform the one or more programs and include to be used for
Carry out the instruction of following operation:Obtain the style and features of the source language data;The style and features include duration prosodic features,
At least one of fundamental frequency feature, energy feature;According to the tamber characteristic of the source language speech data, the source language data
Style and features and the text feature data carry out phonetic synthesis, obtain synthesis voice data as voiced translation result;Institute
Predicate sound translation result has the tamber characteristic and style and features of the source language speech data.
Further, the processor 420 is specific is additionally operable to perform the one or more programs and include to be used for
Carry out the instruction of following operation:Obtain the style and features of target style audio;The style and features include duration prosodic features, base
At least one of frequency feature, energy feature;The target style audio is corresponding with object language;According to the source language speech
The tamber characteristic of data, the style and features of the target style audio and the text feature data carry out phonetic synthesis, obtain
Voice data is synthesized as voiced translation result;The voiced translation result has the tamber characteristic of the source language speech data
With the style and features of the target style audio.
Further, the processor 420 is specific is additionally operable to perform the one or more programs and include to be used for
Carry out the instruction of following operation:According to the text feature data, the duration prosodic features of the target style audio, the source
The tamber characteristic of language voice data obtains the acoustic feature data of the source language speech data;By the target style audio
Fundamental frequency feature and/or energy feature merged with the acoustic feature data of the source language speech data, after obtaining fusion
Acoustic feature data;By the acoustic feature data conversion into speech waveform, obtain having the style of target style audio special
Levy, the synthesis voice data of the tamber characteristic of source language speech data.
Further, the processor 420 is specific is additionally operable to perform the one or more programs and include to be used for
Carry out the instruction of following operation:It is special according to the text feature data, the tamber characteristic of the source language speech data and duration
Obtain the acoustic feature data of the source language speech data;By the fundamental frequency feature and/or energy of the target style audio
Feature is merged with the acoustic feature data of the source language speech data, the acoustic feature data after being merged;To melting
Acoustic feature data after conjunction carry out linear interpolation processing so that when the duration and target of the acoustic feature data after the fusion
It is long consistent;The target duration is determined according to the duration prosodic features of the target style voice data;By the acoustics after processing
Characteristic is converted into speech waveform, obtains style and features, the sound of the source language speech data with target style audio
The synthesis voice data of color characteristic.
Further, the processor 420 is specific is additionally operable to perform the one or more programs and include to be used for
Carry out the instruction of following operation:In response to selection of the user for target style or handover operation, determine and the selection or switching
Corresponding target style is operated, obtains the style and features of target style audio corresponding with the target style.
In the exemplary embodiment, a kind of non-transitorycomputer readable storage medium including instructing, example are additionally provided
Such as include the memory 404 of instruction, above-metioned instruction can be performed to complete the above method by the processor 420 of device 400.For example,
The non-transitorycomputer readable storage medium can be ROM, random access memory (RAM), CD-ROM, tape, floppy disk
With optical data storage devices etc..
A kind of machine readable media, such as the machine readable media can be non-transitorycomputer readable storage medium,
When the instruction in the medium is performed by the processor of device (terminal or server) so that device is able to carry out a kind of same
Sound is interpreted method, the described method includes:Gather source language speech data;Obtain and export the knowledge with the source language speech data
The voiced translation result that other result is corresponding, is represented with object language;Wherein, the voiced translation result is closed by natural person's voice
Into obtaining, the target voice belongs to different language with the original language.
Fig. 5 is the structure diagram of server in the embodiment of the present invention.The server 500 can because configuration or performance it is different and
Produce bigger difference, can include one or more central processing units (central processing units,
CPU) 522 (for example, one or more processors) and memory 532, one or more storage application programs 542 or
The storage medium 530 (such as one or more mass memory units) of data 544.Wherein, memory 532 and storage medium
530 can be of short duration storage or persistently storage.One or more modules can be included by being stored in the program of storage medium 530
(diagram does not mark), each module can include operating the series of instructions in server.Further, central processing unit
522 could be provided as communicating with storage medium 530, and the series of instructions behaviour in storage medium 530 is performed on server 500
Make.
Server 500 can also include one or more power supplys 526, one or more wired or wireless networks
Interface 550, one or more input/output interfaces 558, one or more keyboards 556, and/or, one or one
Above operating system 541, such as Windows ServerTM, Mac OSXTM, UnixTM, LinuxTM, FreeBSDTM etc..
Those skilled in the art will readily occur to the present invention its after considering specification and putting into practice invention disclosed herein
Its embodiment.It is contemplated that cover the present invention any variations, uses, or adaptations, these modifications, purposes or
Person's adaptive change follows the general principle of the present invention and including the undocumented common knowledge in the art of the disclosure
Or conventional techniques.Description and embodiments are considered only as exemplary, and true scope and spirit of the invention are by following
Claim is pointed out.
It should be appreciated that the invention is not limited in the precision architecture for being described above and being shown in the drawings, and
And various modifications and changes may be made without departing from the scope thereof.The scope of the present invention is only limited by appended claim
The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all the present invention spirit and
Within principle, any modification, equivalent replacement, improvement and so on, should all be included in the protection scope of the present invention.
It should be noted that herein, relational terms such as first and second and the like are used merely to a reality
Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation
In any this actual relation or order.Moreover, term " comprising ", "comprising" or its any other variant are intended to
Non-exclusive inclusion, so that process, method, article or equipment including a series of elements not only will including those
Element, but also including other elements that are not explicitly listed, or further include as this process, method, article or equipment
Intrinsic key element.In the absence of more restrictions, the key element limited by sentence "including a ...", it is not excluded that
Also there are other identical element in process, method, article or equipment including the key element.The present invention can be by calculating
Described in the general context for the computer executable instructions that machine performs, such as program module.Usually, program module includes holding
Row particular task realizes the routine of particular abstract data type, program, object, component, data structure etc..It can also divide
The present invention is put into practice in cloth computing environment, in these distributed computing environment, by by communication network and connected long-range
Processing equipment performs task.In a distributed computing environment, program module can be located at the local including storage device
In remote computer storage medium.
Each embodiment in this specification is described by the way of progressive, identical similar portion between each embodiment
Divide mutually referring to what each embodiment stressed is the difference with other embodiment.It is real especially for device
For applying example, since it is substantially similar to embodiment of the method, so describing fairly simple, related part is referring to embodiment of the method
Part explanation.Device embodiment described above is only schematical, wherein described be used as separating component explanation
Unit may or may not be physically separate, may or may not be as the component that unit is shown
Physical location, you can with positioned at a place, or can also be distributed in multiple network unit.Can be according to the actual needs
Some or all of module therein is selected to realize the purpose of this embodiment scheme.Those of ordinary skill in the art are not paying
In the case of creative work, you can to understand and implement.The above is only the embodiment of the present invention, should be referred to
Go out, for those skilled in the art, without departing from the principle of the present invention, can also make some
Improvements and modifications, these improvements and modifications also should be regarded as protection scope of the present invention.
Claims (10)
- A kind of 1. simultaneous interpretation method, it is characterised in that including:Gather source language speech data;Obtain and export voiced translation knot that is corresponding with the recognition result of the source language speech data, being represented with object language Fruit;Wherein, the voiced translation result is obtained by natural person's phonetic synthesis, and the target voice belongs to not with the original language Same languages.
- 2. according to the method described in claim 1, it is characterized in that, the method further includes:Obtain and export text that is corresponding with the recognition result of the source language speech data, being represented with object language and translate knot Fruit;And/orObtain and export the recognition result of the source language speech data.
- 3. according to the method described in claim 1, it is characterized in that, the collection source language speech data include:At least two source language speech data are gathered, the source of at least two source language speech data and phonetic feature are not Together;It is described to obtain and export voice that is corresponding with the recognition result of the source language speech data, being represented with object language and turn over Translating result includes:Obtain respectively and export it is corresponding with the recognition result of at least two source language speech data, represented with object language At least two voiced translation results.
- 4. according to the method described in claim 1, it is characterized in that, the method further includes:In response to user for target sound color type and/or selection or the handover operation of target stylistic category, definite and the selection Or the corresponding target sound color type of handover operation and/or target stylistic category, obtain and the target tone color and/or target style The tamber characteristic and/or style and features of corresponding target audio;It is described to obtain and export voice that is corresponding with the recognition result of the source language speech data, being represented with object language and turn over Translating result includes:Obtain and export, with object language representing and target sound corresponding with the recognition result of the source language speech data Color and/or the corresponding synthesis voiced translation result of target style;The synthesis voiced translation result has corresponding with target tone color Tamber characteristic, and/or, it is described synthesis voiced translation result there are style and features corresponding with target style.
- 5. according to the method described in claim 1, it is characterized in that, the voiced translation result has the source language speech number According to tamber characteristic, the voiced translation result is according to the source language speech data and text corresponding with the recognition result This translation result synthesizes to obtain;The text translation result is represented with object language.
- 6. according to the method described in claim 1, it is characterized in that, according to the source language speech data and with the identification As a result corresponding text translation result is synthesized to obtain voiced translation result and included:Text feature data are determined according to the text translation result;Obtain the tamber characteristic of the source language speech data;Phonetic synthesis is carried out according to the tamber characteristic of the source language speech data and the text feature data, obtains synthesis language Sound data are as voiced translation result.
- 7. according to the method described in claim 6, it is characterized in that, the method further includes:Obtain the style and features of the source language data;It is special that the style and features include duration prosodic features, fundamental frequency feature, energy At least one of sign;It is described that phonetic synthesis is carried out according to the tamber characteristic of the source language speech data and the text feature data, obtain mesh The voiced translation result of poster speech includes:According to the tamber characteristic of the source language speech data, the style and features of the source language data and the text feature number According to phonetic synthesis is carried out, synthesis voice data is obtained as voiced translation result;The voiced translation result has the source language Say the tamber characteristic and style and features of voice data.
- A kind of 8. synchronous translation apparatus, it is characterised in that including:Collecting unit, for gathering source language speech data;Acquiring unit, for obtain and export it is corresponding with the recognition result of the source language speech data, with object language table The voiced translation result shown;Wherein, the voiced translation result is obtained by natural person's phonetic synthesis, the target voice with it is described Original language belongs to different language.
- 9. a kind of device for simultaneous interpretation, it is characterised in that include memory, and one or more than one journey Sequence, either more than one program storage in memory and is configured to by one or more than one processor for one of them Perform the one or more programs and include the instruction for being used for being operated below:Gather source language speech data;Obtain and export voiced translation knot that is corresponding with the recognition result of the source language speech data, being represented with object language Fruit;Wherein, the voiced translation result is obtained by natural person's phonetic synthesis, and the target voice belongs to not with the original language Same languages.
- 10. a kind of machine readable media, is stored thereon with instruction, when executed by one or more processors so that device is held Simultaneous interpretation method of the row as described in one or more in claim 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711207834.XA CN107992485A (en) | 2017-11-27 | 2017-11-27 | A kind of simultaneous interpretation method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711207834.XA CN107992485A (en) | 2017-11-27 | 2017-11-27 | A kind of simultaneous interpretation method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107992485A true CN107992485A (en) | 2018-05-04 |
Family
ID=62032096
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711207834.XA Pending CN107992485A (en) | 2017-11-27 | 2017-11-27 | A kind of simultaneous interpretation method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107992485A (en) |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108447486A (en) * | 2018-02-28 | 2018-08-24 | 科大讯飞股份有限公司 | A kind of voice translation method and device |
CN109036416A (en) * | 2018-07-02 | 2018-12-18 | 腾讯科技(深圳)有限公司 | simultaneous interpretation method and system, storage medium and electronic device |
CN109448698A (en) * | 2018-10-17 | 2019-03-08 | 深圳壹账通智能科技有限公司 | Simultaneous interpretation method, apparatus, computer equipment and storage medium |
CN110059313A (en) * | 2019-04-03 | 2019-07-26 | 百度在线网络技术(北京)有限公司 | Translation processing method and device |
CN110415680A (en) * | 2018-09-05 | 2019-11-05 | 满金坝(深圳)科技有限公司 | A kind of simultaneous interpretation method, synchronous translation apparatus and a kind of electronic equipment |
CN110473516A (en) * | 2019-09-19 | 2019-11-19 | 百度在线网络技术(北京)有限公司 | Phoneme synthesizing method, device and electronic equipment |
CN110610720A (en) * | 2019-09-19 | 2019-12-24 | 北京搜狗科技发展有限公司 | Data processing method and device and data processing device |
CN112365880A (en) * | 2020-11-05 | 2021-02-12 | 北京百度网讯科技有限公司 | Speech synthesis method, speech synthesis device, electronic equipment and storage medium |
CN112382269A (en) * | 2020-11-13 | 2021-02-19 | 北京有竹居网络技术有限公司 | Audio synthesis method, device, equipment and storage medium |
WO2021102647A1 (en) * | 2019-11-25 | 2021-06-03 | 深圳市欢太科技有限公司 | Data processing method and apparatus, and storage medium |
WO2021134592A1 (en) * | 2019-12-31 | 2021-07-08 | 深圳市欢太科技有限公司 | Speech processing method, apparatus and device, and storage medium |
CN113192510A (en) * | 2020-12-29 | 2021-07-30 | 云从科技集团股份有限公司 | Method, system and medium for implementing voice age and/or gender identification service |
CN114781407A (en) * | 2022-04-21 | 2022-07-22 | 语联网(武汉)信息技术有限公司 | Voice real-time translation method and system and visual terminal |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TW201009613A (en) * | 2008-08-28 | 2010-03-01 | Inventec Corp | System and method for speech translation between classical Chinese and vernacular Chinese |
CN101727904A (en) * | 2008-10-31 | 2010-06-09 | 国际商业机器公司 | Voice translation method and device |
US20100198580A1 (en) * | 2000-10-25 | 2010-08-05 | Robert Glen Klinefelter | System, method, and apparatus for providing interpretive communication on a network |
CN106528547A (en) * | 2016-11-09 | 2017-03-22 | 王东宇 | Translation method for translation machine |
CN106791913A (en) * | 2016-12-30 | 2017-05-31 | 深圳市九洲电器有限公司 | Digital television program simultaneous interpretation output intent and system |
-
2017
- 2017-11-27 CN CN201711207834.XA patent/CN107992485A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100198580A1 (en) * | 2000-10-25 | 2010-08-05 | Robert Glen Klinefelter | System, method, and apparatus for providing interpretive communication on a network |
TW201009613A (en) * | 2008-08-28 | 2010-03-01 | Inventec Corp | System and method for speech translation between classical Chinese and vernacular Chinese |
CN101727904A (en) * | 2008-10-31 | 2010-06-09 | 国际商业机器公司 | Voice translation method and device |
CN106528547A (en) * | 2016-11-09 | 2017-03-22 | 王东宇 | Translation method for translation machine |
CN106791913A (en) * | 2016-12-30 | 2017-05-31 | 深圳市九洲电器有限公司 | Digital television program simultaneous interpretation output intent and system |
Non-Patent Citations (1)
Title |
---|
张鹏: "嵌入式语音合成系统的研究与实现", 《中国优秀硕士学位论文全文数据库信息科技辑(月刊)》 * |
Cited By (25)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108447486A (en) * | 2018-02-28 | 2018-08-24 | 科大讯飞股份有限公司 | A kind of voice translation method and device |
CN109036416A (en) * | 2018-07-02 | 2018-12-18 | 腾讯科技(深圳)有限公司 | simultaneous interpretation method and system, storage medium and electronic device |
CN109036416B (en) * | 2018-07-02 | 2022-12-20 | 腾讯科技(深圳)有限公司 | Simultaneous interpretation method and system, storage medium and electronic device |
CN110415680A (en) * | 2018-09-05 | 2019-11-05 | 满金坝(深圳)科技有限公司 | A kind of simultaneous interpretation method, synchronous translation apparatus and a kind of electronic equipment |
CN110415680B (en) * | 2018-09-05 | 2022-10-04 | 梁志军 | Simultaneous interpretation method, simultaneous interpretation device and electronic equipment |
EP3620939A1 (en) * | 2018-09-05 | 2020-03-11 | Manjinba (Shenzhen) Technology Co., Ltd. | Method and device for simultaneous interpretation based on machine learning |
WO2020048143A1 (en) * | 2018-09-05 | 2020-03-12 | 满金坝(深圳)科技有限公司 | Machine learning-based simultaneous interpretation method and device |
CN109448698A (en) * | 2018-10-17 | 2019-03-08 | 深圳壹账通智能科技有限公司 | Simultaneous interpretation method, apparatus, computer equipment and storage medium |
WO2020077868A1 (en) * | 2018-10-17 | 2020-04-23 | 深圳壹账通智能科技有限公司 | Simultaneous interpretation method and apparatus, computer device and storage medium |
CN110059313B (en) * | 2019-04-03 | 2021-02-12 | 百度在线网络技术(北京)有限公司 | Translation processing method and device |
CN110059313A (en) * | 2019-04-03 | 2019-07-26 | 百度在线网络技术(北京)有限公司 | Translation processing method and device |
CN110610720B (en) * | 2019-09-19 | 2022-02-25 | 北京搜狗科技发展有限公司 | Data processing method and device and data processing device |
CN110610720A (en) * | 2019-09-19 | 2019-12-24 | 北京搜狗科技发展有限公司 | Data processing method and device and data processing device |
CN110473516A (en) * | 2019-09-19 | 2019-11-19 | 百度在线网络技术(北京)有限公司 | Phoneme synthesizing method, device and electronic equipment |
WO2021051588A1 (en) * | 2019-09-19 | 2021-03-25 | 北京搜狗科技发展有限公司 | Data processing method and apparatus, and apparatus used for data processing |
US11417314B2 (en) | 2019-09-19 | 2022-08-16 | Baidu Online Network Technology (Beijing) Co., Ltd. | Speech synthesis method, speech synthesis device, and electronic apparatus |
CN110473516B (en) * | 2019-09-19 | 2020-11-27 | 百度在线网络技术(北京)有限公司 | Voice synthesis method and device and electronic equipment |
WO2021102647A1 (en) * | 2019-11-25 | 2021-06-03 | 深圳市欢太科技有限公司 | Data processing method and apparatus, and storage medium |
WO2021134592A1 (en) * | 2019-12-31 | 2021-07-08 | 深圳市欢太科技有限公司 | Speech processing method, apparatus and device, and storage medium |
CN112365880A (en) * | 2020-11-05 | 2021-02-12 | 北京百度网讯科技有限公司 | Speech synthesis method, speech synthesis device, electronic equipment and storage medium |
CN112365880B (en) * | 2020-11-05 | 2024-03-26 | 北京百度网讯科技有限公司 | Speech synthesis method, device, electronic equipment and storage medium |
CN112382269A (en) * | 2020-11-13 | 2021-02-19 | 北京有竹居网络技术有限公司 | Audio synthesis method, device, equipment and storage medium |
CN113192510A (en) * | 2020-12-29 | 2021-07-30 | 云从科技集团股份有限公司 | Method, system and medium for implementing voice age and/or gender identification service |
CN113192510B (en) * | 2020-12-29 | 2024-04-30 | 云从科技集团股份有限公司 | Method, system and medium for realizing voice age and/or sex identification service |
CN114781407A (en) * | 2022-04-21 | 2022-07-22 | 语联网(武汉)信息技术有限公司 | Voice real-time translation method and system and visual terminal |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107992485A (en) | A kind of simultaneous interpretation method and device | |
CN107705783A (en) | A kind of phoneme synthesizing method and device | |
CN109801644B (en) | Separation method, separation device, electronic equipment and readable medium for mixed sound signal | |
CN108346433A (en) | A kind of audio-frequency processing method, device, equipment and readable storage medium storing program for executing | |
CN109637518A (en) | Virtual newscaster's implementation method and device | |
CN107992812A (en) | A kind of lip reading recognition methods and device | |
CN110634483A (en) | Man-machine interaction method and device, electronic equipment and storage medium | |
CN111508511A (en) | Real-time sound changing method and device | |
CN110188177A (en) | Talk with generation method and device | |
CN110097890A (en) | A kind of method of speech processing, device and the device for speech processes | |
CN111583944A (en) | Sound changing method and device | |
CN107291690A (en) | Punctuate adding method and device, the device added for punctuate | |
CN108198569A (en) | A kind of audio-frequency processing method, device, equipment and readable storage medium storing program for executing | |
CN107644646A (en) | Method of speech processing, device and the device for speech processes | |
CN107211198A (en) | Apparatus and method for content of edit | |
CN104461348B (en) | Information choosing method and device | |
CN106484138B (en) | A kind of input method and device | |
CN107944447A (en) | Image classification method and device | |
CN111199160A (en) | Instant call voice translation method and device and terminal | |
CN108509412A (en) | A kind of data processing method, device, electronic equipment and storage medium | |
CN108073572A (en) | Information processing method and its device, simultaneous interpretation system | |
CN109002184A (en) | A kind of association method and device of input method candidate word | |
CN108648754A (en) | Sound control method and device | |
CN107291704A (en) | Treating method and apparatus, the device for processing | |
CN110730360A (en) | Video uploading and playing methods and devices, client equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |