CN109448698A

CN109448698A - Simultaneous interpretation method, apparatus, computer equipment and storage medium

Info

Publication number: CN109448698A
Application number: CN201811211414.3A
Authority: CN
Inventors: 李晨光
Original assignee: OneConnect Smart Technology Co Ltd
Current assignee: OneConnect Smart Technology Co Ltd
Priority date: 2018-10-17
Filing date: 2018-10-17
Publication date: 2019-03-08
Also published as: WO2020077868A1

Abstract

This application involves a kind of simultaneous interpretation method, apparatus, computer equipment and storage mediums.The method is related to the speech recognition technology based on artificial intelligence, comprising: receives to simultaneous interpretation voice data, and determination is corresponding to simultaneous interpretation languages classification to simultaneous interpretation voice data；Simultaneous interpretation demand is obtained, simultaneous interpretation demand includes simultaneous interpretation target language and simultaneous interpretation voice output demand；Inquiry is with to the corresponding preset voice simultaneous interpretation model of simultaneous interpretation languages classification and simultaneous interpretation target language, and voice simultaneous interpretation model to the translation corresponding relationship between simultaneous interpretation languages classification and simultaneous interpretation target language based on constructing to obtain；It will be imported in voice simultaneous interpretation model to simultaneous interpretation voice data, and obtain Model voice data；According to simultaneous interpretation voice output demand, phonetic feature processing is carried out to Model voice data, exports simultaneous interpretation voice data.Special simultaneous interpretation personnel can not needed using this method and carry out human translation, the influence of human factor is avoided, effectively increase the efficiency and simultaneous interpretation sound effect of simultaneous interpretation.

Description

Simultaneous interpretation method, apparatus, computer equipment and storage medium

Technical field

This application involves field of computer technology, more particularly to a kind of simultaneous interpretation method, apparatus, computer equipment and Storage medium.

Background technique

Simultaneous interpretation referred to as " simultaneous interpretation " refers to interpreter in the case where not interrupting speaker, incessantly by content A kind of interpretative system interpreted to audience.Simultaneous interpretation have it is very strong academic and professional, in addition to being widely used in the world Except meeting, also diplomatic foreign affairs, meet negotiation, commercial activity, news media, training are given lessons, television broadcasting, international arbitration etc. Numerous areas is widely used.

However, current simultaneous interpretation process is manually interpreted by the simultaneous interpretation personnel of profession, by of simultaneous interpretation personnel People's factor influences greatly, and the efficiency and sound effect of simultaneous interpretation are limited.

Summary of the invention

Based on this, it is necessary to which in view of the above technical problems, providing one kind can be improved simultaneous interpretation efficiency and simultaneous interpretation sound Simultaneous interpretation method, apparatus, computer equipment and the storage medium of effect.

A kind of simultaneous interpretation method, which comprises

It receives to simultaneous interpretation voice data, and determination is corresponding to simultaneous interpretation languages classification to simultaneous interpretation voice data；

Simultaneous interpretation demand is obtained, simultaneous interpretation demand includes simultaneous interpretation target language and simultaneous interpretation voice output demand；

Inquiry with to the corresponding preset voice simultaneous interpretation model of simultaneous interpretation languages classification and simultaneous interpretation target language, voice simultaneous interpretation model Based on constructing to obtain to the translation corresponding relationship between simultaneous interpretation languages classification and simultaneous interpretation target language；

It will be imported in voice simultaneous interpretation model to simultaneous interpretation voice data, and obtain Model voice data；

According to simultaneous interpretation voice output demand, phonetic feature processing is carried out to Model voice data, exports simultaneous interpretation voice Data.

It determines in one of the embodiments, and includes: the step of the languages classification to simultaneous interpretation corresponding to simultaneous interpretation voice data

Phonetic feature phoneme is extracted to simultaneous interpretation voice data；

Preset languages phoneme disaggregated model is inquired, languages phoneme disaggregated model is corresponding by the various languages classifications of training Phonetic feature phoneme obtains；

Phonetic feature phoneme is inputted in languages phoneme disaggregated model, is obtained corresponding to simultaneous interpretation language to simultaneous interpretation voice data Kind classification.

Include: the step of extraction phonetic feature phoneme to simultaneous interpretation voice data in one of the embodiments,

It treats simultaneous interpretation voice data and carries out digitized processing, obtain digitizing to synchronous data transmission；

Endpoint detection processing is carried out to synchronous data transmission to digitlization, and to the digitlization after endpoint detection processing to simultaneous interpretation number According to voice sub-frame processing is carried out, obtain to simultaneous interpretation number of speech frames evidence；

Phonetic feature phoneme is extracted in simultaneous interpretation number of speech frames.

Inquiry is same with to simultaneous interpretation languages classification and the corresponding preset voice of simultaneous interpretation target language in one of the embodiments, Pass model the step of include:

Inquire preset voice simultaneous interpretation model library；

From voice simultaneous interpretation model library inquiry with to the corresponding multilingual simultaneous interpretation model of simultaneous interpretation languages classification；

Output languages configuration is carried out to multilingual simultaneous interpretation model according to simultaneous interpretation target language, obtains voice simultaneous interpretation model.

In one of the embodiments, before the step of inquiring preset voice simultaneous interpretation model library, further includes:

It obtains and corresponds to preset speech recognition modeling to simultaneous interpretation languages classification, speech recognition modeling is used for according to simultaneous interpretation language The output of sound data is corresponding to simultaneous interpretation languages text to simultaneous interpretation languages classification；

Data are translated according to the history between simultaneous interpretation languages text and the corresponding target language text of simultaneous interpretation target language, Character translation model is constructed, character translation model is used for according to the corresponding target of simultaneous interpretation languages text output simultaneous interpretation target language Languages text；

According to target language text and target language text in simultaneous interpretation target language corresponding voice data, construct mesh Poster kind speech model；

It successively combines speech recognition modeling, character translation model and target language speech model, obtains multilingual simultaneous interpretation Model；

Voice simultaneous interpretation model library is obtained according to multilingual simultaneous interpretation model.

Simultaneous interpretation voice output demand includes simultaneous interpretation scene demand and simultaneous interpretation user demand in one of the embodiments,；Root According to simultaneous interpretation voice output demand, the step of carrying out phonetic feature processing to Model voice data, export simultaneous interpretation voice data Include:

Preset scene speech database corresponding with simultaneous interpretation scene demand is inquired, it is same that scene speech database is stored with satisfaction Pass the scene phonetic representation data of scene demand；

Model voice data are updated by scene phonetic representation data, obtain scene voice data；

Scene voice data is configured by simultaneous interpretation user demand, exports simultaneous interpretation voice data.

Simultaneous interpretation user demand includes voice sensual pleasure demand and voice style demand in one of the embodiments,；By same Passing the step of user demand configures scene voice data, exports simultaneous interpretation voice data includes:

Sensual pleasure switching is carried out to scene voice data by voice sensual pleasure demand, obtains the sensual pleasure for meeting voice sensual pleasure demand Voice data；

Style switching is carried out to sensual pleasure voice data according to voice style demand, exports simultaneous interpretation voice data.

A kind of synchronous translation apparatus, described device include:

To synchronous data transmission receiving module, for receiving to simultaneous interpretation voice data, and determination is corresponding to simultaneous interpretation voice data To simultaneous interpretation languages classification；

Simultaneous interpretation demand obtains module, and for obtaining simultaneous interpretation demand, simultaneous interpretation demand includes simultaneous interpretation target language and simultaneous interpretation voice Output demand；

Simultaneous interpretation pattern query module, for inquire with to simultaneous interpretation languages classification and the corresponding preset voice of simultaneous interpretation target language Simultaneous interpretation model, voice simultaneous interpretation model to the translation corresponding relationship between simultaneous interpretation languages classification and simultaneous interpretation target language based on constructing It arrives；

Model voice data acquisition module obtains model for that will import in voice simultaneous interpretation model to simultaneous interpretation voice data Voice data；

Simultaneous interpretation voice data obtains module, for carrying out voice to Model voice data according to simultaneous interpretation voice output demand Characteristic processing exports simultaneous interpretation voice data.

A kind of computer equipment, including memory and processor, the memory are stored with computer program, the processing Device performs the steps of when executing the computer program

A kind of computer readable storage medium, is stored thereon with computer program, and the computer program is held by processor It is performed the steps of when row

Above-mentioned simultaneous interpretation method, apparatus, computer equipment and storage medium, determine receive to simultaneous interpretation voice data It is corresponding to wait for that simultaneous interpretation languages classification and the corresponding preset voice of simultaneous interpretation target language inquiry are same to simultaneous interpretation languages classification, and according to this Model is passed, the voice simultaneous interpretation model to the translation corresponding relationship between simultaneous interpretation languages classification and simultaneous interpretation target language based on constructing It arrives, Model voice data will be obtained after simultaneous interpretation voice data imports the voice simultaneous interpretation model, then need by simultaneous interpretation voice output It asks and phonetic feature processing is carried out to Model voice data, simultaneous interpretation voice data is exported, to realize simultaneous interpretation.Same During sound is interpreted, does not need special simultaneous interpretation personnel and carry out human translation, avoid the influence of human factor, effectively increase The efficiency and simultaneous interpretation sound effect of simultaneous interpretation.

Detailed description of the invention

Fig. 1 is the application scenario diagram of simultaneous interpretation method in one embodiment；

Fig. 2 is the flow diagram of simultaneous interpretation method in one embodiment；

Fig. 3 is the flow diagram of the construction step of voice simultaneous interpretation model library in one embodiment；

Fig. 4 is the flow diagram of simultaneous interpretation method in another embodiment；

Fig. 5 is the structural block diagram of synchronous translation apparatus in one embodiment；

Fig. 6 is the internal structure chart of computer equipment in one embodiment.

Specific embodiment

It is with reference to the accompanying drawings and embodiments, right in order to which the objects, technical solutions and advantages of the application are more clearly understood The application is further elaborated.It should be appreciated that specific embodiment described herein is only used to explain the application, not For limiting the application.

Simultaneous interpretation method provided by the present application, can be applied in application environment as shown in Figure 1.Wherein, first eventually End 102 and second terminal 106 are communicated with server 104 by network by network respectively.First terminal 102 will be to simultaneous interpretation Voice data is sent to server 104, and the determination of server 104 receives corresponding to simultaneous interpretation languages class to simultaneous interpretation voice data Not, and according to this simultaneous interpretation languages classification and the corresponding preset voice simultaneous interpretation model of simultaneous interpretation target language inquiry, the voice simultaneous interpretation are waited for Model, will be to simultaneous interpretation voice number based on constructing to obtain to the translation corresponding relationship between simultaneous interpretation languages classification and simultaneous interpretation target language According to obtaining Model voice data after importing the voice simultaneous interpretation model, then by simultaneous interpretation voice output demand to Model voice data into The processing of row phonetic feature, obtains simultaneous interpretation voice data and sends it to second terminal 106, pass in unison to realize It translates.Wherein, first terminal 102 and second terminal 106 can be, but not limited to be various personal computers, laptop, intelligence Mobile phone, tablet computer and portable wearable device, server 104 can use independent server either multiple server groups At server cluster realize.

In one embodiment, as shown in Fig. 2, providing a kind of simultaneous interpretation method, it is applied in Fig. 1 in this way It is illustrated for server 104, comprising the following steps:

Step S201: it receives to simultaneous interpretation voice data, and determination is corresponding to simultaneous interpretation languages classification to simultaneous interpretation voice data.

It wherein, is the source voice data translated to simultaneous interpretation voice data, it can be by the voice of first terminal 102 Signal picker receives the source voice data, such as the speech signal of speechmaker of meeting etc. that speech source issues；To same Passing languages classification is the languages such as languages, such as Chinese, English, French, German belonging to the voice data of source.In specific application, It can also specifically be refined to simultaneous interpretation languages classification, such as Chinese, mandarin, Guangdong language, the Wu dialect, four can be further divided into The sub- languages of various dialects such as river words and the south of Fujian Province language.Specifically, server 104 receive first terminal 102 upload to simultaneous interpretation After voice data, it can be determined belonging to its correspondence according to the voice data feature to simultaneous interpretation voice data, such as phoneme feature To simultaneous interpretation languages classification.

Step S203: simultaneous interpretation demand is obtained, simultaneous interpretation demand includes simultaneous interpretation target language and simultaneous interpretation voice output demand.

Server 104 receive the transmission of first terminal 102 to simultaneous interpretation voice data and determine it is corresponding belonging to simultaneous interpretation After languages classification, it is also necessary to the simultaneous interpretation target language translated needed for determining.Wherein, as needs will be to simultaneous interpretation for simultaneous interpretation target language The target language classification of voice data translation output, such as during the simultaneous interpretation of English to Chinese, English be to simultaneous interpretation languages classification, And Chinese is the simultaneous interpretation target language of required translation output.And simultaneous interpretation voice output demand can be to need to export voice data Phonetic feature requirement can specifically include such as male voice, female voice or children's sound sensual pleasure and require and cheerful and light-hearted, depressed or exciting etc. Voice style requirement, by simultaneous interpretation voice output demand adjust output voice data phonetic feature, can satisfy various scenes, The actual demand of various users.Specifically, simultaneous interpretation demand can be sent to by the second terminal 106 that reception simultaneous interpretation exports Server 104.

Step S205: inquiry with to the corresponding preset voice simultaneous interpretation model of simultaneous interpretation languages classification and simultaneous interpretation target language, language Sound simultaneous interpretation model to the translation corresponding relationship between simultaneous interpretation languages classification and simultaneous interpretation target language based on constructing to obtain.

Wherein, what voice simultaneous interpretation model was used to input is corresponding with simultaneous interpretation target language to the translation output of simultaneous interpretation voice data Voice data, voice simultaneous interpretation model is correspondingly arranged according to the languages and output languages of its input, based on to simultaneous interpretation languages class Translation corresponding relationship not between simultaneous interpretation target language constructs to obtain.For example, being needed when simultaneous interpretation languages classification is English In conjunction with simultaneous interpretation target language such as Chinese, German or French etc. with the corresponding English to Chinese voice simultaneous interpretation model of determination, Ying Yide voice Simultaneous interpretation model or English translation voice simultaneous interpretation model.Specifically, it is determined that after simultaneous interpretation languages classification and simultaneous interpretation target language, according to this To simultaneous interpretation languages classification and the corresponding preset voice simultaneous interpretation model of simultaneous interpretation target language inquiry.

Step S207: it will be imported in voice simultaneous interpretation model to simultaneous interpretation voice data, and obtain Model voice data.

After obtaining voice simultaneous interpretation model, carried out received at translation to simultaneous interpretation voice data input voice simultaneous interpretation model Reason, exports corresponding Model voice data.In specific implementation, voice simultaneous interpretation model can be turned over by speech recognition modeling, text It translates model and target language speech model combines to obtain.Wherein, speech recognition modeling can be, but not limited to as implicit Ma Erke Husband's model, machine learning model based on artificial neural network algorithm etc., specific such as LSTM Recognition with Recurrent Neural Network model, are used for Will to simultaneous interpretation voice data carry out speech recognition, obtain under to simultaneous interpretation languages classification, with to simultaneous interpretation voice data it is corresponding to Simultaneous interpretation languages text；Character translation model can be used for if KMP algorithm constructs to obtain by voice based on character match algorithm Identification model output to simultaneous interpretation languages character translation at target language text corresponding with simultaneous interpretation target language；Target language language Sound model is used for the target language text exported according to character translation model, extracts and corresponds to from preset target speech data library Voice data, synthesize and export final Model voice data, to realize the processing of simultaneous interpretation.

Step S209: according to simultaneous interpretation voice output demand, phonetic feature processing is carried out to Model voice data, output is in unison It interprets voice data.

After voice simultaneous interpretation model output translation treated Model voice data, in conjunction with the simultaneous interpretation language in simultaneous interpretation demand Sound exports demand and carries out phonetic feature processing to the Model voice data, obtains and exports simultaneous interpretation voice data.Wherein, language Sound characteristic processing can be, but not limited to include the processing of voice sensual pleasure, such as the switching of sensual pleasure men and women's sound and the processing of voice style, such as sound Sound is cheerful and light-hearted, the style switching of exciting and sad etc. moods.Language is carried out by the Model voice data exported to voice simultaneous interpretation model Sound characteristic processing, the simultaneous interpretation voice data made has different sound characteristics, and is not limited solely to simultaneous interpretation personnel The sound characteristic of itself can be suitable for various simultaneous interpretation application scenarios and towards all kinds of users, improve simultaneous interpretation Sound effect.

In above-mentioned simultaneous interpretation method, determine receive it is corresponding to simultaneous interpretation languages classification to simultaneous interpretation voice data, and Simultaneous interpretation languages classification and the corresponding preset voice simultaneous interpretation model of simultaneous interpretation target language inquiry, the voice simultaneous interpretation model base are waited for according to this It constructs to obtain in the translation corresponding relationship between simultaneous interpretation languages classification and simultaneous interpretation target language, will be imported to simultaneous interpretation voice data Model voice data are obtained after the voice simultaneous interpretation model, then voice is carried out to Model voice data by simultaneous interpretation voice output demand Characteristic processing exports simultaneous interpretation voice data, to realize simultaneous interpretation.During simultaneous interpretation, do not need special Simultaneous interpretation personnel carry out human translation, avoid the influence of human factor, effectively increase the efficiency of simultaneous interpretation and with transaudient Audio fruit.

In one embodiment, determine the languages classification to simultaneous interpretation corresponding to simultaneous interpretation voice data the step of include: to Phonetic feature phoneme is extracted in simultaneous interpretation voice data；Preset languages phoneme disaggregated model is inquired, languages phoneme disaggregated model is logical It crosses and the corresponding phonetic feature phoneme of various languages classifications is trained to obtain；Phonetic feature phoneme is inputted into languages phoneme disaggregated model In, it obtains corresponding to simultaneous interpretation languages classification to simultaneous interpretation voice data.

For different languages, there is different pronunciation rules.According to the natural quality of voice mark off come minimum language Sound unit, i.e. phoneme, in different language pronunciation, phoneme feature is not identical.For " mandarin " of Chinese, by 3 Syllable is constituted, and can be split as " 8 phonemes of p, u, t, o, ng, h, u, a "；And for English, including 48 phonemes, medial vowel Phoneme 20, consonant phoneme 28 in 26 letters of English, there is vowel 5, consonant 19, semivowel letter 2 It is a.So all kinds of languages can be distinguished by phoneme of speech sound feature.

It is corresponding when simultaneous interpretation languages classification when simultaneous interpretation voice data in determination in the present embodiment, to simultaneous interpretation voice number According to middle extraction phonetic feature phoneme, phonetic feature phoneme is for judging to simultaneous interpretation voice data to simultaneous interpretation languages classification.Inquiry Preset languages phoneme disaggregated model is obtained, languages phoneme by the corresponding phonetic feature phoneme of the various languages classifications of training Disaggregated model is for classifying to languages according to the phonetic feature phoneme of input, to determine that phonetic feature phoneme is corresponding to same Pass languages classification.Languages phoneme disaggregated model can be the phonetic feature factor based on artificial neural network algorithm and each languages The neural network model that training obtains.By inputting phonetic feature phoneme in languages phoneme disaggregated model, by languages phoneme point Class model exports to obtain corresponding to simultaneous interpretation languages classification to simultaneous interpretation voice data.

It in a particular application, can be according to languages sound when inputting phonetic feature phoneme in languages phoneme disaggregated model The input demand of plain disaggregated model carries out Screening Treatment to the phonetic feature phoneme extracted to simultaneous interpretation voice data, It therefrom chooses to meet the phonetic feature phoneme of input demand and be inputted in languages phoneme disaggregated model and carry out to simultaneous interpretation languages The processing that classification determines.

In one embodiment, the step of phonetic feature phoneme is extracted to simultaneous interpretation voice data includes: to treat simultaneous interpretation Voice data carries out digitized processing, obtains digitizing to synchronous data transmission；Digitlization is carried out at end-point detection to synchronous data transmission Reason, and voice sub-frame processing is carried out to synchronous data transmission to the digitlization after endpoint detection processing, it obtains to simultaneous interpretation number of speech frames evidence； Phonetic feature phoneme is extracted in simultaneous interpretation number of speech frames.

Generally, by first terminal 102 by speech signal collection device, such as microphone collect to simultaneous interpretation voice data For analog signal comprising redundancy, such as ambient noise, channel distortion need to pre-process the analog signal, such as Carry out anti-confusion filtering, sampling, A/D conversion etc. processes carry out digitized processing, to carry out later include preemphasis, adding window and divide The processing such as frame, end-point detection, to filter out unessential information therein and ambient noise, can effectively improve simultaneous interpretation Treatment effeciency and treatment effect.

In the present embodiment, when extracting phonetic feature phoneme from when simultaneous interpretation voice data, simultaneous interpretation voice data is first treated Carry out digitized processing, including anti-confusion filtering, sampling, A/D conversion, obtain digitizing to synchronous data transmission, then to digitlization to Synchronous data transmission carries out endpoint detection processing, to determine the whole story of the digitlization to synchronous data transmission, and to the number after endpoint detection processing Word waits for that synchronous data transmission carries out voice sub-frame processing, is divided into sectional frame signal to get to simultaneous interpretation speech frame Data wait for that simultaneous interpretation number of speech frames can be extracted to obtain phonetic feature phoneme in from this.

In one embodiment, inquiry with to the corresponding preset voice simultaneous interpretation mould of simultaneous interpretation languages classification and simultaneous interpretation target language The step of type includes: the preset voice simultaneous interpretation model library of inquiry；It is inquired from voice simultaneous interpretation model library and to simultaneous interpretation languages classification Corresponding multilingual simultaneous interpretation model；Output languages configuration is carried out to multilingual simultaneous interpretation model according to simultaneous interpretation target language, obtains language Sound simultaneous interpretation model.

In the present embodiment, voice simultaneous interpretation model library is stored with various to the corresponding multilingual simultaneous interpretation mould of simultaneous interpretation languages classification Type, multilingual simultaneous interpretation model are the languages simultaneous interpretation model according to fixed input to simultaneous interpretation languages classification, by multilingual same to this Model is passed according to actual simultaneous interpretation target language, carries out output languages configuration, the available voice for meeting simultaneous interpretation target language Simultaneous interpretation model.Specifically, inquiry with when simultaneous interpretation languages classification and simultaneous interpretation target language corresponding preset voice simultaneous interpretation model, Voice inquirement simultaneous interpretation model library, and according to simultaneous interpretation languages classification, it is inquired from the voice simultaneous interpretation model library and to simultaneous interpretation languages The corresponding multilingual simultaneous interpretation model of classification carries out output languages configuration to multilingual simultaneous interpretation model according still further to simultaneous interpretation target language, The voice simultaneous interpretation model for meeting simultaneous interpretation target language is obtained, which can receive corresponding to simultaneous interpretation languages classification To simultaneous interpretation voice data, and output simultaneous interpretation voice data corresponding with simultaneous interpretation target language after translation processing is carried out, thus Realize the simultaneous interpretation to voice data.

In one embodiment, as shown in figure 3, voice is same before the step of inquiring preset voice simultaneous interpretation model library Pass model library construction step include:

Step S301: obtaining and correspond to preset speech recognition modeling to simultaneous interpretation languages classification, and speech recognition modeling is used for root It is corresponding to simultaneous interpretation languages text to simultaneous interpretation languages classification according to being exported to simultaneous interpretation voice data.

In the present embodiment, voice simultaneous interpretation model can carry out output language according to simultaneous interpretation target language by multilingual simultaneous interpretation model Kind, which is matched, to be postponed to obtain, wherein multilingual simultaneous interpretation model is by speech recognition modeling, character translation model and target language voice mould Type combines to obtain, and various multilingual simultaneous interpretation models are unified after collecting to be stored by voice simultaneous interpretation model library.Specifically, it is creating When voice simultaneous interpretation model library, on the one hand, obtain and correspond to preset speech recognition modeling, speech recognition modeling to simultaneous interpretation languages classification For corresponding to simultaneous interpretation languages text to simultaneous interpretation languages classification according to being exported to simultaneous interpretation voice data.Speech recognition modeling can be with But be not limited to hidden markov model, the machine learning model based on artificial neural network algorithm etc., being used for will be to same It passes voice data and carries out speech recognition, obtain under to simultaneous interpretation languages classification, it is corresponding to simultaneous interpretation language with to simultaneous interpretation voice data Kind text.It, can will be in the Chinese speech data translation that received output for example, for the speech recognition modeling of Chinese languages Literary Chinese character.

Step S303: according to the history between simultaneous interpretation languages text and the corresponding target language text of simultaneous interpretation target language Data are translated, character translation model is constructed, character translation model is used to export simultaneous interpretation target language according to simultaneous interpretation languages text Corresponding target language text.

On the other hand, based on treating going through between simultaneous interpretation languages text and the corresponding target language text of simultaneous interpretation target language History translates the big data analysis of data as a result, establishing to simultaneous interpretation languages text and the corresponding target language text of simultaneous interpretation target language Between mapping relations, be specifically not limited to word mapping, word mapping, phrase mapping and common-use words mapping, wherein common-use words can be with Including well-known saying, common saying, proverb, maxim and slang etc..In a particular application, such as " what you yourself do not desire does not impose on people " this sentence Chinese well-known saying can be translated according to the official more approved in the world and be established between corresponding literal expression and Chinese text expression Mapping.Character translation model can be constructed according to the mapping relations, it can be according to simultaneous interpretation languages by the text translation model Text exports the corresponding target language text of simultaneous interpretation target language.

Step S305: according to target language text and target language text in simultaneous interpretation target language corresponding voice number According to building target language speech model.

In addition, building target language speech model, for extraction and target language from preset target speech data library The corresponding voice data of text synthesizes and exports final Model voice data.Target language speech model can be based on character Matching algorithm building, by by target language text text corresponding with the voice data in preset target speech data library into Line character matching, inquires and exports corresponding Model voice data.

Step S307: it successively combines speech recognition modeling, character translation model and target language speech model, obtains more Languages simultaneous interpretation model.

After obtaining speech recognition modeling, character translation model and target language speech model, sequentially combines it, obtain To multilingual simultaneous interpretation model.It in a particular application, can be and each according to the corresponding speech recognition modeling of simultaneous interpretation languages classification The corresponding character translation model of kind simultaneous interpretation target language and target language speech model establish one-to-many mapping relations, to realize Output languages configuration is carried out to multilingual simultaneous interpretation model, meets the output demand of various simultaneous interpretation target languages.

Step S309: voice simultaneous interpretation model library is obtained according to multilingual simultaneous interpretation model.

After obtaining multilingual simultaneous interpretation model, collects various to the corresponding multilingual simultaneous interpretation model of simultaneous interpretation languages classification, obtain To voice simultaneous interpretation model library.It is defeated by being carried out according to simultaneous interpretation target language to multilingual simultaneous interpretation model during simultaneous interpretation Languages match and postpone to obtain voice simultaneous interpretation model out, carry out received at translation to simultaneous interpretation voice data input voice simultaneous interpretation model Reason, exports corresponding Model voice data, realizes simultaneous interpretation processing.

In one embodiment, simultaneous interpretation voice output demand includes simultaneous interpretation scene demand and simultaneous interpretation user demand；According to same Pass voice output demand, to Model voice data carry out phonetic feature processing, export simultaneous interpretation voice data the step of include: Preset scene speech database corresponding with simultaneous interpretation scene demand is inquired, scene speech database, which is stored with, meets simultaneous interpretation scene need The scene phonetic representation data asked；Model voice data are updated by scene phonetic representation data, obtain scene voice Data；Scene voice data is configured by simultaneous interpretation user demand, exports simultaneous interpretation voice data.

Based on the different application scenarios of simultaneous interpretation, and towards user, output that can be final to simultaneous interpretation Simultaneous interpretation voice data carry out flexible configuration, to be adapted to various actual demands.In the present embodiment, simultaneous interpretation voice output demand packet Include simultaneous interpretation scene demand and simultaneous interpretation user demand, wherein simultaneous interpretation scene demand corresponds to the application scenarios of simultaneous interpretation, such as international meeting View, diplomatic foreign affairs, meeting negotiation, commercial activity and news media etc.；Simultaneous interpretation user demand correspond to towards output object, such as Gender, sensual pleasure, style etc..

When carrying out phonetic feature processing to Model voice data, preset scene language corresponding with simultaneous interpretation scene demand is inquired Sound database, the scene speech database are stored with the scene phonetic representation data for meeting simultaneous interpretation scene demand.Different same It passes in application scenarios, the expression to the voice data of simultaneous interpretation output, such as spoken language or written word and specialized vocabulary etc. have difference Expression, and the corresponding scene phonetic representation data of each simultaneous interpretation scene demand can be stored in advance in scene speech database, lead to It crosses and inquires the scene speech database and can extract the scene phonetic representation data for meeting simultaneous interpretation scene demand.Obtain scene voice After expressing data, Model voice data are updated according to the scene phonetic representation data, such as by scene phonetic representation data Replace corresponding former expression data, then synthesize to obtain scene voice data, then by simultaneous interpretation user demand to scene voice data into The configuration of row user demand obtains and exports last simultaneous interpretation voice data, to meet output end scene and user Various demands, extend simultaneous interpretation suitable environment, improve simultaneous interpretation effect.

In one embodiment, simultaneous interpretation user demand includes voice sensual pleasure demand and voice style demand；It is used by simultaneous interpretation The step of family demand configures scene voice data, exports simultaneous interpretation voice data includes: by voice sensual pleasure demand Sensual pleasure switching is carried out to scene voice data, obtains the sensual pleasure voice data for meeting voice sensual pleasure demand；According to voice style need It asks and style switching is carried out to sensual pleasure voice data, export simultaneous interpretation voice data.

In the present embodiment, simultaneous interpretation user demand includes voice sensual pleasure demand and voice style demand, wherein voice sensual pleasure needs It includes the sensual pleasure demands such as male voice, female voice and children's sound that asking, which can be, but not limited to,；Voice style demand may include it is cheerful and light-hearted, depressed, The styles demand such as source identical with voice signal to be translated style and excitement.Generally, the voice sound of default output can be set Color and voice style, such as the male voice of source style, by user to default export carry out personal settings, switching voice sensual pleasure and Voice style exports corresponding simultaneous interpretation voice data.Specifically, scene voice data is being matched according to simultaneous interpretation user demand When setting, sensual pleasure switching is carried out to scene voice data according to voice sensual pleasure demand, male voice will be such as defaulted and be switched to female voice, thus To the sensual pleasure voice data for meeting voice sensual pleasure demand, style is carried out to sensual pleasure voice data further according to voice style demand and is cut It changes, source style is such as switched to depressed style, obtain simultaneous interpretation voice data.Pass through the model exported to voice simultaneous interpretation model Voice data carries out voice sensual pleasure according to simultaneous interpretation user demand and voice style switches, and can adapt to make in various simultaneous interpretation output ends The demand of user extends simultaneous interpretation suitable environment, improves simultaneous interpretation effect.

In one embodiment, as shown in figure 4, providing a kind of simultaneous interpretation method, comprising the following steps:

Step S401: it receives to simultaneous interpretation voice data；

Step S402: phonetic feature phoneme is extracted to simultaneous interpretation voice data；

Step S403: preset languages phoneme disaggregated model is inquired；

Step S404: phonetic feature phoneme is inputted in languages phoneme disaggregated model, is obtained corresponding to simultaneous interpretation voice data To simultaneous interpretation languages classification.

In the present embodiment, first terminal 102 translate by what speech signal collection device reception speech source issued Source voice data, what server 104 received that first terminal 102 sends should be to simultaneous interpretation voice data, and it is special therefrom to extract voice Phoneme is levied, can specifically include: treating simultaneous interpretation voice data and carries out digitized processing, obtain digitizing to synchronous data transmission；Logarithm Word waits for that synchronous data transmission carries out endpoint detection processing, and carries out voice point to synchronous data transmission to the digitlization after endpoint detection processing Frame processing, obtains to simultaneous interpretation number of speech frames evidence；Phonetic feature phoneme is extracted in simultaneous interpretation number of speech frames.From to simultaneous interpretation voice It extracts and obtains for judging after simultaneous interpretation voice data after the phonetic feature phoneme of simultaneous interpretation languages classification in data, by by language Sound feature phoneme inputs in languages phoneme disaggregated model, and the languages phoneme disaggregated model is corresponding by the various languages classifications of training Phonetic feature phoneme obtains, and exports to obtain by languages phoneme disaggregated model corresponding to simultaneous interpretation languages class to simultaneous interpretation voice data Not.

Step S405: simultaneous interpretation demand is obtained, simultaneous interpretation demand includes simultaneous interpretation target language and simultaneous interpretation voice output demand.

The second terminal 106 that simultaneous interpretation demand is exported by reception simultaneous interpretation is sent to server 104, and simultaneous interpretation target language is To need the target language classification to the translation output of simultaneous interpretation voice data, simultaneous interpretation voice output demand can be to need to export language The phonetic feature requirement of sound data, the phonetic feature of output voice data is adjusted by simultaneous interpretation voice output demand, can satisfy The actual demand of various scenes, various users.

Step S406: preset voice simultaneous interpretation model library is inquired；

Step S407: from voice simultaneous interpretation model library inquiry with to the corresponding multilingual simultaneous interpretation model of simultaneous interpretation languages classification；

Step S408: output languages configuration is carried out to multilingual simultaneous interpretation model according to simultaneous interpretation target language, it is same to obtain voice Pass model；

Step S409: it will be imported in voice simultaneous interpretation model to simultaneous interpretation voice data, and obtain Model voice data.

Voice simultaneous interpretation model library is stored with various to the corresponding multilingual simultaneous interpretation model of simultaneous interpretation languages classification, multilingual simultaneous interpretation Model is the languages simultaneous interpretation model according to fixed input to simultaneous interpretation languages classification, by the multilingual simultaneous interpretation model according to reality Simultaneous interpretation target language, carry out output languages configuration, the available voice simultaneous interpretation model for meeting simultaneous interpretation target language.Obtain language After sound simultaneous interpretation model, translation processing is carried out to simultaneous interpretation voice data input voice simultaneous interpretation model by received, exports corresponding mould Type voice data.

Step S410: simultaneous interpretation voice output demand includes simultaneous interpretation scene demand and simultaneous interpretation user demand；Inquiry and simultaneous interpretation field Scape demand corresponds to preset scene speech database, and scene speech database is stored with the scene voice for meeting simultaneous interpretation scene demand Express data；

Step S411: Model voice data are updated by scene phonetic representation data, obtain scene voice data；

Step S412: configuring scene voice data by simultaneous interpretation user demand, exports simultaneous interpretation voice data.

In the present embodiment, based on the different application scenarios of simultaneous interpretation, and towards user, can be to simultaneous interpretation The simultaneous interpretation voice data of final output carries out flexible configuration, to be adapted to various actual demands.Specifically, simultaneous interpretation user demand Including voice sensual pleasure demand and voice style demand, carrying out configuration to scene voice data by simultaneous interpretation user demand be can wrap The step of including: being configured by simultaneous interpretation user demand to scene voice data, exporting simultaneous interpretation voice data includes: to pass through Voice sensual pleasure demand carries out sensual pleasure switching to scene voice data, obtains the sensual pleasure voice data for meeting voice sensual pleasure demand；Root Style switching is carried out to sensual pleasure voice data according to voice style demand, obtains simultaneous interpretation voice data.

It should be understood that although each step in the flow chart of Fig. 2-4 is successively shown according to the instruction of arrow, These steps are not that the inevitable sequence according to arrow instruction successively executes.Unless expressly stating otherwise herein, these steps Execution there is no stringent sequences to limit, these steps can execute in other order.Moreover, at least one in Fig. 2-4 Part steps may include that perhaps these sub-steps of multiple stages or stage are not necessarily in synchronization to multiple sub-steps Completion is executed, but can be executed at different times, the execution sequence in these sub-steps or stage is also not necessarily successively It carries out, but can be at least part of the sub-step or stage of other steps or other steps in turn or alternately It executes.

In one embodiment, as shown in figure 5, providing a kind of synchronous translation apparatus, comprising: receive mould to synchronous data transmission Block 501, simultaneous interpretation demand obtain module 503, simultaneous interpretation pattern query module 505, Model voice data acquisition module 507 and simultaneous interpretation language Sound data acquisition module 509, in which:

To synchronous data transmission receiving module 501, for receiving to simultaneous interpretation voice data, and determine corresponding to simultaneous interpretation voice data To simultaneous interpretation languages classification；

Simultaneous interpretation demand obtains module 503, and for obtaining simultaneous interpretation demand, simultaneous interpretation demand includes simultaneous interpretation target language and simultaneous interpretation language Sound exports demand；

Simultaneous interpretation pattern query module 505, it is corresponding preset with to simultaneous interpretation languages classification and simultaneous interpretation target language for inquiring Voice simultaneous interpretation model, voice simultaneous interpretation model is based on to the translation corresponding relationship structure between simultaneous interpretation languages classification and simultaneous interpretation target language It builds to obtain；

Model voice data acquisition module 507 obtains mould for that will import in voice simultaneous interpretation model to simultaneous interpretation voice data Type voice data；

Simultaneous interpretation voice data obtains module 509, for carrying out language to Model voice data according to simultaneous interpretation voice output demand Sound characteristic processing exports simultaneous interpretation voice data.

Above-mentioned synchronous translation apparatus, by determined to synchronous data transmission receiving module receive it is corresponding to simultaneous interpretation voice data It is corresponded to according to this to simultaneous interpretation languages classification and the inquiry of simultaneous interpretation target language to simultaneous interpretation languages classification, and by simultaneous interpretation pattern query module Preset voice simultaneous interpretation model, the voice simultaneous interpretation model is based on to the translation pair between simultaneous interpretation languages classification and simultaneous interpretation target language It should be related to that building obtains, will be obtained after simultaneous interpretation voice data imports the voice simultaneous interpretation model by Model voice data acquisition module To Model voice data, then module is obtained by simultaneous interpretation voice data, Model voice data are carried out by simultaneous interpretation voice output demand Phonetic feature processing, exports simultaneous interpretation voice data, to realize simultaneous interpretation.During simultaneous interpretation, do not need Special simultaneous interpretation personnel carry out human translation, avoid the influence of human factor, effectively increase the efficiency of simultaneous interpretation and same Pass sound effect.

It in one embodiment, include feature phoneme extraction unit, phoneme disaggregated model to synchronous data transmission receiving module 501 Query unit and to simultaneous interpretation languages determination unit, in which: feature phoneme extraction unit, for being extracted to simultaneous interpretation voice data Phonetic feature phoneme；Phoneme disaggregated model query unit, for inquiring preset languages phoneme disaggregated model, the classification of languages phoneme Model is obtained by the corresponding phonetic feature phoneme of the various languages classifications of training；To simultaneous interpretation languages determination unit, it is used for voice Feature phoneme inputs in languages phoneme disaggregated model, obtains corresponding to simultaneous interpretation languages classification to simultaneous interpretation voice data.

In one embodiment, feature phoneme extraction unit includes digitlization subelement, framing subelement and feature phoneme Extract subelement, in which: digitlization subelement carries out digitized processing for treating simultaneous interpretation voice data, obtain digitizing to Synchronous data transmission；Framing subelement, for digitlization to synchronous data transmission carry out endpoint detection processing, and to endpoint detection processing after Digitlization to synchronous data transmission carry out voice sub-frame processing, obtain to simultaneous interpretation number of speech frames evidence；Feature phoneme extracts subelement, uses In the extraction phonetic feature phoneme to simultaneous interpretation number of speech frames evidence.

In one embodiment, simultaneous interpretation pattern query module 505 includes simultaneous interpretation model library query unit, multilingual simultaneous interpretation mould Type query unit and sound simultaneous interpretation model acquiring unit, in which: simultaneous interpretation model library query unit, for inquiring preset voice simultaneous interpretation Model library；Multilingual simultaneous interpretation pattern query unit, it is corresponding with to simultaneous interpretation languages classification for being inquired from voice simultaneous interpretation model library Multilingual simultaneous interpretation model；Sound simultaneous interpretation model acquiring unit, for being carried out according to simultaneous interpretation target language to multilingual simultaneous interpretation model Languages configuration is exported, voice simultaneous interpretation model is obtained.

It in one embodiment, further include speech recognition modeling module, character translation model module, target language voice mould Pattern block, multilingual simultaneous interpretation model module and simultaneous interpretation model library construct module, in which: speech recognition modeling module, for obtaining Correspond to preset speech recognition modeling to simultaneous interpretation languages classification, speech recognition modeling be used for according to the output of simultaneous interpretation voice data to Simultaneous interpretation languages classification is corresponding to simultaneous interpretation languages text；Character translation model module, for according to simultaneous interpretation languages text and together The history passed between the corresponding target language text of target language translates data, constructs character translation model, character translation model For according to the corresponding target language text of simultaneous interpretation languages text output simultaneous interpretation target language；Target language speech model mould Block, for according to target language text and target language text in simultaneous interpretation target language corresponding voice data, construct mesh Poster kind speech model；Multilingual simultaneous interpretation model module is used for speech recognition modeling, character translation model and target language language Sound model successively combines, and obtains multilingual simultaneous interpretation model；Simultaneous interpretation model library constructs module, for being obtained according to multilingual simultaneous interpretation model To voice simultaneous interpretation model library.

In one embodiment, simultaneous interpretation voice output demand includes simultaneous interpretation scene demand and simultaneous interpretation user demand；Simultaneous interpretation language Sound data acquisition module 509 includes that scene speech database query unit, scene voice data acquiring unit and user demand are matched Set unit, in which: scene speech database query unit, for inquiring preset scene voice number corresponding with simultaneous interpretation scene demand According to library, scene speech database is stored with the scene phonetic representation data for meeting simultaneous interpretation scene demand；Scene voice data obtains Unit obtains scene voice data for being updated by scene phonetic representation data to Model voice data；User demand Configuration unit exports simultaneous interpretation voice data for configuring by simultaneous interpretation user demand to scene voice data.

In one embodiment, simultaneous interpretation user demand includes voice sensual pleasure demand and voice style demand；User demand is matched Setting unit includes sensual pleasure switching subelement and style switching subelement, in which: sensual pleasure switching subelement, for passing through voice sensual pleasure Demand carries out sensual pleasure switching to scene voice data, obtains the sensual pleasure voice data for meeting voice sensual pleasure demand；Style switching Unit exports simultaneous interpretation voice data for carrying out style switching to sensual pleasure voice data according to voice style demand.

Specific about synchronous translation apparatus limits the restriction that may refer to above for simultaneous interpretation method, herein not It repeats again.Modules in above-mentioned synchronous translation apparatus can be realized fully or partially through software, hardware and combinations thereof.On Stating each module can be embedded in the form of hardware or independently of in the processor in computer equipment, can also store in a software form In memory in computer equipment, the corresponding operation of the above modules is executed in order to which processor calls.

In one embodiment, a kind of computer equipment is provided, which can be server, internal junction Composition can be as shown in Figure 6.The computer equipment includes processor, memory and the network interface connected by system bus. Wherein, the processor of the computer equipment is for providing calculating and control ability.The memory of the computer equipment includes non-easy The property lost storage medium, built-in storage.The non-volatile memory medium is stored with operating system and computer program.The built-in storage Operation for operating system and computer program in non-volatile memory medium provides environment.The network of the computer equipment connects Mouth with external terminal by network connection for being communicated.To realize that one kind passes in unison when the computer program is executed by processor Translate method.

It will be understood by those skilled in the art that structure shown in Fig. 6, only part relevant to application scheme is tied The block diagram of structure does not constitute the restriction for the computer equipment being applied thereon to application scheme, specific computer equipment It may include perhaps combining certain components or with different component layouts than more or fewer components as shown in the figure.

In one embodiment, a kind of computer equipment, including memory and processor are provided, which is stored with Computer program, the processor perform the steps of when executing computer program

In one embodiment, it also performs the steps of when processor executes computer program to simultaneous interpretation voice data Middle extraction phonetic feature phoneme；Preset languages phoneme disaggregated model is inquired, languages phoneme disaggregated model passes through the various languages of training The corresponding phonetic feature phoneme of kind classification obtains；Phonetic feature phoneme is inputted in languages phoneme disaggregated model, is obtained to simultaneous interpretation Voice data is corresponding to simultaneous interpretation languages classification.

In one embodiment, it is also performed the steps of when processor executes computer program and treats simultaneous interpretation voice data Digitized processing is carried out, obtains digitizing to synchronous data transmission；Endpoint detection processing, and opposite end are carried out to synchronous data transmission to digitlization Digitlization after point detection processing carries out voice sub-frame processing to synchronous data transmission, obtains to simultaneous interpretation number of speech frames evidence；From to simultaneous interpretation Number of speech frames extracts phonetic feature phoneme in.

In one embodiment, it is same that inquiry preset voice is also performed the steps of when processor executes computer program Pass model library；From voice simultaneous interpretation model library inquiry with to the corresponding multilingual simultaneous interpretation model of simultaneous interpretation languages classification；According to simultaneous interpretation Target language carries out output languages configuration to multilingual simultaneous interpretation model, obtains voice simultaneous interpretation model.

In one embodiment, acquisition is also performed the steps of when processor executes computer program to simultaneous interpretation languages class Preset speech recognition modeling is not corresponded to, and speech recognition modeling is used to export according to simultaneous interpretation voice data to simultaneous interpretation languages classification It is corresponding to simultaneous interpretation languages text；According to between simultaneous interpretation languages text and the corresponding target language text of simultaneous interpretation target language History translates data, constructs character translation model, and character translation model is used to export simultaneous interpretation target according to simultaneous interpretation languages text The corresponding target language text of languages；It is corresponding in simultaneous interpretation target language according to target language text and target language text Voice data constructs target language speech model；By speech recognition modeling, character translation model and target language speech model according to Secondary combination obtains multilingual simultaneous interpretation model；Voice simultaneous interpretation model library is obtained according to multilingual simultaneous interpretation model.

In one embodiment, simultaneous interpretation voice output demand includes simultaneous interpretation scene demand and simultaneous interpretation user demand；Processor Inquiry preset scene speech database corresponding with simultaneous interpretation scene demand, field are also performed the steps of when executing computer program Scape speech database is stored with the scene phonetic representation data for meeting simultaneous interpretation scene demand；By scene phonetic representation data to mould Type voice data is updated, and obtains scene voice data；Scene voice data is configured by simultaneous interpretation user demand, it is defeated Simultaneous interpretation voice data out.

In one embodiment, simultaneous interpretation user demand includes voice sensual pleasure demand and voice style demand；Processor executes It is also performed the steps of when computer program and sensual pleasure switching is carried out to scene voice data by voice sensual pleasure demand, expired The sensual pleasure voice data of sufficient voice sensual pleasure demand；Style switching, output are carried out to sensual pleasure voice data according to voice style demand Simultaneous interpretation voice data.

In one embodiment, a kind of computer readable storage medium is provided, computer program is stored thereon with, is calculated Machine program performs the steps of when being executed by processor

In one embodiment, it is also performed the steps of when computer program is executed by processor to simultaneous interpretation voice number According to middle extraction phonetic feature phoneme；Preset languages phoneme disaggregated model is inquired, languages phoneme disaggregated model is various by training The corresponding phonetic feature phoneme of languages classification obtains；Phonetic feature phoneme is inputted in languages phoneme disaggregated model, is obtained to same It is corresponding to simultaneous interpretation languages classification to pass voice data.

In one embodiment, it is also performed the steps of when computer program is executed by processor and treats simultaneous interpretation voice number According to digitized processing is carried out, obtain digitizing to synchronous data transmission；Endpoint detection processing is carried out to synchronous data transmission to digitlization, and right Digitlization after endpoint detection processing carries out voice sub-frame processing to synchronous data transmission, obtains to simultaneous interpretation number of speech frames evidence；From to same It passes number of speech frames and extracts phonetic feature phoneme in.

In one embodiment, inquiry preset voice is also performed the steps of when computer program is executed by processor Simultaneous interpretation model library；From voice simultaneous interpretation model library inquiry with to the corresponding multilingual simultaneous interpretation model of simultaneous interpretation languages classification；According to same It passes target language and output languages configuration is carried out to multilingual simultaneous interpretation model, obtain voice simultaneous interpretation model.

In one embodiment, acquisition is also performed the steps of when computer program is executed by processor to simultaneous interpretation languages Classification corresponds to preset speech recognition modeling, and speech recognition modeling is used to export according to simultaneous interpretation voice data to simultaneous interpretation languages class It is not corresponding to simultaneous interpretation languages text；According to between simultaneous interpretation languages text and the corresponding target language text of simultaneous interpretation target language History translate data, construct character translation model, character translation model be used for according to simultaneous interpretation languages text export simultaneous interpretation mesh The corresponding target language text of poster kind；It is corresponding in simultaneous interpretation target language according to target language text and target language text Voice data, construct target language speech model；By speech recognition modeling, character translation model and target language speech model It successively combines, obtains multilingual simultaneous interpretation model；Voice simultaneous interpretation model library is obtained according to multilingual simultaneous interpretation model.

In one embodiment, simultaneous interpretation voice output demand includes simultaneous interpretation scene demand and simultaneous interpretation user demand；Computer Inquiry preset scene speech database corresponding with simultaneous interpretation scene demand is also performed the steps of when program is executed by processor, Scene speech database is stored with the scene phonetic representation data for meeting simultaneous interpretation scene demand；Pass through scene phonetic representation data pair Model voice data are updated, and obtain scene voice data；Scene voice data is configured by simultaneous interpretation user demand, Export simultaneous interpretation voice data.

In one embodiment, simultaneous interpretation user demand includes voice sensual pleasure demand and voice style demand；Computer program It is also performed the steps of when being executed by processor and sensual pleasure switching is carried out to scene voice data by voice sensual pleasure demand, obtained Meet the sensual pleasure voice data of voice sensual pleasure demand；Style switching is carried out to sensual pleasure voice data according to voice style demand, it is defeated Simultaneous interpretation voice data out.

Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with Relevant hardware is instructed to complete by computer program, the computer program can be stored in a non-volatile computer In read/write memory medium, the computer program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, To any reference of memory, storage, database or other media used in each embodiment provided herein, Including non-volatile and/or volatile memory.Nonvolatile memory may include read-only memory (ROM), programming ROM (PROM), electrically programmable ROM (EPROM), electrically erasable ROM (EEPROM) or flash memory.Volatile memory may include Random access memory (RAM) or external cache.By way of illustration and not limitation, RAM is available in many forms, Such as static state RAM (SRAM), dynamic ram (DRAM), synchronous dram (SDRAM), double data rate sdram (DDRSDRAM), enhancing Type SDRAM (ESDRAM), synchronization link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic ram (DRDRAM) and memory bus dynamic ram (RDRAM) etc..

Each technical characteristic of above embodiments can be combined arbitrarily, for simplicity of description, not to above-described embodiment In each technical characteristic it is all possible combination be all described, as long as however, the combination of these technical characteristics be not present lance Shield all should be considered as described in this specification.

The several embodiments of the application above described embodiment only expresses, the description thereof is more specific and detailed, but simultaneously It cannot therefore be construed as limiting the scope of the patent.It should be pointed out that coming for those of ordinary skill in the art It says, without departing from the concept of this application, various modifications and improvements can be made, these belong to the protection of the application Range.Therefore, the scope of protection shall be subject to the appended claims for the application patent.

Claims

1. a kind of simultaneous interpretation method, which comprises

It receives to simultaneous interpretation voice data, and determines described corresponding to simultaneous interpretation languages classification to simultaneous interpretation voice data；

Simultaneous interpretation demand is obtained, the simultaneous interpretation demand includes simultaneous interpretation target language and simultaneous interpretation voice output demand；

Inquiry is with described to simultaneous interpretation languages classification and the corresponding preset voice simultaneous interpretation model of the simultaneous interpretation target language, the voice Simultaneous interpretation model constructs to obtain based on the translation corresponding relationship between simultaneous interpretation languages classification and the simultaneous interpretation target language；

It described will be imported in the voice simultaneous interpretation model to simultaneous interpretation voice data, and obtain Model voice data；

According to the simultaneous interpretation voice output demand, phonetic feature processing is carried out to the Model voice data, exports simultaneous interpretation Voice data.

2. the method according to claim 1, wherein corresponding to same to simultaneous interpretation voice data described in the determination Pass languages classification the step of include:

From described to extract phonetic feature phoneme in simultaneous interpretation voice data；

Preset languages phoneme disaggregated model is inquired, the languages phoneme disaggregated model is corresponding by the various languages classifications of training Phonetic feature phoneme obtains；

The phonetic feature phoneme is inputted in the languages phoneme disaggregated model, is obtained described corresponding to simultaneous interpretation voice data To simultaneous interpretation languages classification.

3. according to the method described in claim 2, it is characterized in that, described special to extract voice in simultaneous interpretation voice data from described Levy phoneme the step of include:

Digitized processing is carried out to simultaneous interpretation voice data to described, obtains digitizing to synchronous data transmission；

Endpoint detection processing is carried out to synchronous data transmission to the digitlization, and to the digitlization after endpoint detection processing to same It passes data and carries out voice sub-frame processing, obtain to simultaneous interpretation number of speech frames evidence；

Phonetic feature phoneme is extracted in simultaneous interpretation number of speech frames from described.

4. the method according to claim 1, wherein the inquiry with it is described to simultaneous interpretation languages classification and described same Passing the step of target language corresponds to preset voice simultaneous interpretation model includes:

Inquire preset voice simultaneous interpretation model library；

Inquiry is with described to the corresponding multilingual simultaneous interpretation model of simultaneous interpretation languages classification from the voice simultaneous interpretation model library；

Output languages configuration is carried out to the multilingual simultaneous interpretation model according to the simultaneous interpretation target language, obtains voice simultaneous interpretation mould Type.

5. according to the method described in claim 4, it is characterized in that, the inquiry preset voice simultaneous interpretation model library the step of Before, further includes:

Preset speech recognition modeling is corresponded to simultaneous interpretation languages classification described in obtaining, the speech recognition modeling is used for according to It is described corresponding to simultaneous interpretation languages text to simultaneous interpretation languages classification to the output of simultaneous interpretation voice data；

According to the history translation between simultaneous interpretation languages text and the corresponding target language text of the simultaneous interpretation target language Data, construct character translation model, and the character translation model is used to export the simultaneous interpretation to simultaneous interpretation languages text according to described The corresponding target language text of target language；

According to the target language text and the target language text in the simultaneous interpretation target language corresponding voice number According to building target language speech model；

It successively combines the speech recognition modeling, the character translation model and the target language speech model, obtains institute State multilingual simultaneous interpretation model；

The voice simultaneous interpretation model library is obtained according to the multilingual simultaneous interpretation model.

6. according to claim 1 to method described in 5 any one, which is characterized in that the simultaneous interpretation voice output demand includes Simultaneous interpretation scene demand and simultaneous interpretation user demand；It is described according to the simultaneous interpretation voice output demand, to the Model voice data into Row phonetic feature processing, export simultaneous interpretation voice data the step of include:

Preset scene speech database corresponding with the simultaneous interpretation scene demand is inquired, the scene speech database is stored with full The scene phonetic representation data of the foot simultaneous interpretation scene demand；

The Model voice data are updated by the scene phonetic representation data, obtain scene voice data；

The scene voice data is configured by the simultaneous interpretation user demand, exports the simultaneous interpretation voice data.

7. according to the method described in claim 6, it is characterized in that, the simultaneous interpretation user demand includes voice sensual pleasure demand and language Sound style demand；Described to be configured by the simultaneous interpretation user demand to the scene voice data, output is described to be passed in unison The step of translating voice data include:

Sensual pleasure switching is carried out to the scene voice data by the voice sensual pleasure demand, obtains meeting the voice sensual pleasure need The sensual pleasure voice data asked；

Style switching is carried out to the sensual pleasure voice data according to the voice style demand, exports the simultaneous interpretation voice number According to.

8. a kind of synchronous translation apparatus, which is characterized in that described device includes:

To synchronous data transmission receiving module, for receiving to simultaneous interpretation voice data, and determine described corresponding to simultaneous interpretation voice data To simultaneous interpretation languages classification；

Simultaneous interpretation demand obtains module, and for obtaining simultaneous interpretation demand, the simultaneous interpretation demand includes simultaneous interpretation target language and simultaneous interpretation voice Output demand；

Simultaneous interpretation pattern query module, for inquire with it is described corresponding preset to simultaneous interpretation languages classification and the simultaneous interpretation target language Voice simultaneous interpretation model, the voice simultaneous interpretation model is based on described to turning between simultaneous interpretation languages classification and the simultaneous interpretation target language Corresponding relationship is translated to construct to obtain；

Model voice data acquisition module, for that described will be imported in the voice simultaneous interpretation model to simultaneous interpretation voice data, output Model voice data；

Simultaneous interpretation voice data obtains module, for being carried out to the Model voice data according to the simultaneous interpretation voice output demand Phonetic feature processing, exports simultaneous interpretation voice data.

9. a kind of computer equipment, including memory and processor, the memory are stored with computer program, feature exists In the step of processor realizes any one of claims 1 to 7 the method when executing the computer program.

10. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program The step of method described in any one of claims 1 to 7 is realized when being executed by processor.