CN107315742A - The Interpreter's method and system that personalize with good in interactive function - Google Patents

The Interpreter's method and system that personalize with good in interactive function Download PDF

Info

Publication number
CN107315742A
CN107315742A CN201710535661.8A CN201710535661A CN107315742A CN 107315742 A CN107315742 A CN 107315742A CN 201710535661 A CN201710535661 A CN 201710535661A CN 107315742 A CN107315742 A CN 107315742A
Authority
CN
China
Prior art keywords
personalizes
translation
text
language
scene
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710535661.8A
Other languages
Chinese (zh)
Inventor
陈炜
王峰
徐爽
徐波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zidong Cognitive Technology Co Ltd
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN201710535661.8A priority Critical patent/CN107315742A/en
Publication of CN107315742A publication Critical patent/CN107315742A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/55Rule-based translation
    • G06F40/56Natural language generation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/005Language recognition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1807Speech classification or search using natural language modelling using prosody or stress
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use

Abstract

The invention provides a kind of Interpreter's method that personalizes with good in interactive function, it comprises the following steps:Intelligent sound identification is carried out to source language speech, source language text is obtained;Source language text and session operational scenarios are handled, the human-computer dialogue communication that personalizes is carried out;Machine translation is carried out, translation result is obtained.Present invention also offers a kind of oral translation system that personalizes with good in interactive function.The present invention is needed to carry out human-computer dialogue with user when necessary according to translation duties, and the translation experience of user under complex application context can be obviously improved by definitely obtaining, and improve the semantic degree of accuracy of translation.

Description

The Interpreter's method and system that personalize with good in interactive function
Technical field
Added the present invention relates to computer and artificial intelligence field, more particularly to a kind of human-computer dialogue mechanism personalized Interpreter's method and corresponding system in translation process.
Background technology
With internet popularization and application and globalization quickly propel, Interpreter as to the high cost of human translation, The effective solution of the problems such as high threshold, imbalance of supply and demand, under multiple scenes such as daily life, business meeting, international exchange With the vigorous market demand.
Macaronic Interpreter's technology is constituted as shown in Figure 1, includes speech recognition, the language of original language and object language Sound is synthesized and two-way translation technology.Wherein double-directional speech identification and two-way translation is the technology that must be included, and phonetic synthesis is then It is optional depending on translation application scene and equipment.
Traditional spoken automatic translating method, is typically inputted source language speech to be translated by user, and automatic identification is simultaneously turned over Translate and the natural-sounding of object language is directly presented to the other user afterwards, it is only one that spoken language, which is recognized or translated from user perspective, Plant software (as shown in Figure 2) end to end.
Lock into complexity and the polytropy of human communication's language, even if human translation person also can by various modes with it is right Words person is linked up, in the hope of obtaining the accurate intension to required translated speech.And current machine Interpreter's method, it is a kind of The end-to-end presentation interpretation method not handled actual scene complexity and semantic complexity situation, it is clear that be difficult to meet accurate Exactness requirement.Simultaneously because man-machine communication of the translation as a software service shortage and user, in practical application scene It is difficult to the requirement for meeting scene friendly.It is current for how improving Interpreter's accuracy rate and Consumer's Experience under actual complex scene The problem of needing to solve.
The content of the invention
(1) technical problem to be solved
In view of above-mentioned technical problem, the invention provides a kind of Interpreter's method that personalizes with good in interactive function And system.The core point of the present invention is on the basis of original speech recognition and translation, to add a personal-machine session module, the mould Block catches, handles and recognize acoustics scene, words person's scene, rhythm scene, language contextses at that time etc., according to translation duties needs Human-computer dialogue is carried out with user when necessary, the translation body of user under complex application context can be obviously improved by definitely obtaining Test, and improve the semantic degree of accuracy of translation.
(2) technical scheme
According to an aspect of the invention, there is provided a kind of Interpreter's method that personalizes with good in interactive function, It comprises the following steps:Intelligent sound identification is carried out to source language speech, source language text is obtained;To source language text and right Words scene is handled, and carries out the human-computer dialogue communication that personalizes;Machine translation is carried out, translation result is obtained.
According to another aspect of the present invention, a kind of Interpreter system that personalizes with good in interactive function is additionally provided System, it includes:Sound identification module, human-computer dialogue management module, machine translation module, sound identification module are used for original language Voice carries out intelligent sound identification, obtains source language text;Human-computer dialogue management module is used for source language text and dialogue Scene is handled, and carries out the human-computer dialogue communication that personalizes;Machine translation module is used to carry out machine translation, obtains translation knot Really.
(3) beneficial effect
It can be seen from the above technical proposal that the present invention personalizing with good in interactive function and is at Interpreter's method System at least has the advantages that one of them:
(1) present invention can be obviously improved the accuracy that performance is translated under complex application context;
(2) it is not required to do other any redundant operations again during talk using more convenient the invention enables user;
(3) the invention enables user's translation and interactive experience are more intelligent, more humane.
Brief description of the drawings
Fig. 1 is the macaronic Interpreter's technology schematic diagram of prior art.
Fig. 2 is the spoken automatic translation system schematic diagram of prior art.
Fig. 3 is a kind of oral translation system schematic diagram that personalizes with good in interactive function of the present invention.
Fig. 4 is a kind of knot of the sound identification module of the oral translation system that personalizes with good in interactive function of the present invention Structure schematic diagram.
Fig. 5 is a kind of detailed maps of the oral translation system that personalizes with good in interactive function of the present invention.
Fig. 6 is the schematic diagram of the source language speech method of acquisition speaker's input in first embodiment of the invention.
Fig. 7 is the schematic diagram in first embodiment of the invention with speaker's progress interactive method.
Fig. 8 be first embodiment of the invention in visualize to speaker show current system conditions method schematic diagram.
Fig. 9 be first embodiment of the invention in dialogue the opposing party intelligently export translation result method schematic diagram.
Figure 10 be second embodiment of the invention in obtain conferencing information and create meeting method schematic diagram.
Figure 11 be second embodiment of the invention in intelligently preside over the meeting process method schematic diagram.
Figure 12 be second embodiment of the invention in visualize to participant show active conference state method schematic diagram.
Figure 13 is a kind of translation side based on the oral translation system that personalizes without screen display of third embodiment of the invention Method schematic diagram.
Embodiment
For the object, technical solutions and advantages of the present invention are more clearly understood, below in conjunction with specific embodiment, and reference Accompanying drawing, the present invention is described in more detail.Obviously, described embodiment is a part of embodiment of the invention, rather than entirely The embodiment in portion.Based on the embodiment in the present invention, those of ordinary skill in the art are obtained under the premise of no creative work The every other embodiment obtained, belongs to the scope of protection of the invention.
The invention provides a kind of Interpreter's method that personalizes with good in interactive function, as shown in figure 3, it includes Following steps:Obtain source language speech (i.e. user A languages);Intelligent sound identification is carried out to source language speech, source language is obtained Say text;Source language text and session operational scenarios are handled, the human-computer dialogue communication that personalizes is carried out;Carry out machine translation, Obtain target language text;Phonetic synthesis is carried out, object language voice (i.e. user B languages) is obtained;Export target language speech Sound.
Special instruction, for the source language speech in Fig. 3 and object language voice, before source language speech is as translation Languages, object language voice as the languages after translation, both be comparatively, for same user, the source language sent Speech sound and obtained object language voice are same languages, for example, user A languages are Chinese, user B languages are English Text, user A sends Chinese (source language speech), by translation, and user B is obtained English (object language voice);User B sends English Literary (source language speech), by translation, user A is obtained Chinese (object language voice).
When carrying out intelligent sound identification to source language speech, intelligent sound detection means and languages are utilized on translation interface Means of identification, the differentiation of automatically do not engage in the dialogue person's voice and language information so that need not carry out base using the people of translation system Engaged in the dialogue in macaronic input button;Specifically, the probability score of every frame speech/non-speech is sets forth, often The probability score of frame source languages/target language, and it is synchronous to the bilingual decoding of voice progress, it is defeated using informix on the basis of this Go out significant recognition result.As shown in figure 4, source language speech and object language the voice language voice that to be two kinds different, all Refer to the language voice before translation.
As shown in figure 5, when handling source language text and session operational scenarios, including to acoustics scene, words person's scene, Rhythm scene, language contextses are handled, and by personalizing, human-computer dialogue is linked up, and complicated answer can be obviously improved by definitely obtaining With the information of the translation experience of user under scene.Specifically, (1) carries out acoustics context-aware to the scene residing for translation system, Treat that perception information includes but is not limited to detection dynamic background noise information (such as signal to noise ratio, noise class) and acoustics background sense The information and Intelligent treatment result known carry out integrated treatment;(2) interlocutor's scene carry out Intellisense, treat perception information include but It is not limited to speaker information, language information, whether there are many people to speak and many people's speech utterances separation information and other speakers Scene information etc., and these results (words person's scene perception information and Intelligent treatment result) are carried out integrated treatment;(3) to the rhythm Scene carries out Intellisense, and perception information includes but is not limited to input the intermediate hold of voice, punctuate border, word speed, fundamental frequency, altogether The Supersonics such as peak section prosodic features and the prosodic analysis confidence level of shaking and the Intelligent treatment on translation bout border etc., and these results (rhythm scene perception information and Intelligent treatment result) carries out integrated treatment;(4) Intellisense is carried out to language contextses, i.e., to source Language text carries out context Intelligent treatment, includes but is not limited to:Name, place name and the mechanism name extracted in text to be translated; The vocabulary for the identification mistake that may be included in text to be translated;The colloquial style language fragment included in text to be translated and repetition; The obvious composition missing included in text to be translated;The anachronies included in text to be translated;Included in text to be translated Time figure phrase language message;The industry slang that is included in text to be translated, daily abbreviation, network neologisms, classic poetry, into Language, common saying, two-part allegorical saying etc.;And these results (context Intelligent treatment result) are carried out integrated treatment.
The content that human-computer dialogue is linked up includes but is not limited to:If needing to carry user progress close friend after feature extraction above Show, this, which includes prompting user, improves use environment, words person's environment, prompting proper use of method of user etc.;Or to the nature The result of determination of language voice needs further semantic description, then start human-computer dialogue with obtain the semantic description information of user with Just correctly translated.Have in above-mentioned engage in the dialogue in the translation interface that personalizes and clearly point out, this prompting includes But it is not limited to the modes such as sound, figure, icon.
Furthermore it is possible to which the different shapes of translation system during use are shown to user by visual or non-visual form State, these forms include but is not limited to:The different conditions of translation system are shown to user by visualizing the image that personalizes;Pass through Non- visual voice intermediary shows the different conditions of translation system to user.
In the present invention, user is obtained to the feedback of human-computer dialogue content, these shapes by contact or non-contacting form Formula includes but is not limited to:Realized by clicking on, touching Hardware equipment involved in the embodiment of the present invention to human-computer dialogue The feedback of content or confirmation;Feedback or the confirmation of human-computer dialogue content are realized by interactive voice.
Present invention also offers a kind of oral translation system that personalizes with good in interactive function, it includes:Input mould Block, for obtaining source language speech (i.e. user A languages);Sound identification module, for carrying out intelligent language to source language speech Sound is recognized, obtains source language text;Human-computer dialogue management module, for handling source language text and session operational scenarios, Progress personalize human-computer dialogue communication;Machine translation module, for carrying out machine translation, obtains target language text;Voice is closed Into module, for carrying out phonetic synthesis, object language voice (i.e. user B languages) is obtained;Output module, for exporting target Language voice.
First embodiment:Both sides Interpreter conversational system on mobile phone
In the present embodiment there is provided both sides' Interpreter's conversational system on a kind of mobile phone, the system is provided to dialogue both sides End-to-end Interpreter's dialogue function, and initiate human-computer dialogue to lift the translation experience of user to user when necessary.
(1) obtain the source language speech of speaker's input, at the same input mode according to the use environment of speaker, use habit Used etc. optional (as shown in Figure 6).
It is unfavorable for directly using phonetic entry if speaker is presently in environment, the system provides and directly inputs source language Say the alternative of word;
If speaker gets used to specifying the languages of dialogue both sides' language manually, the system provides languages and specifies language manually The button planted, while allowing the manual switching that languages are carried out during speaker's change;
If speaker gets used to utilizing the current languages of system automatic identification, the system provides language in two-party conversation The function of automatically switching is planted, the languages of current input voice are specified manually without speaker;
If speaker gets used to clicking on manually determining the border of phonetic entry, the system provides phonetic entry and pressed Button, operates the state of the button to obtain the border that speaker inputs voice using speaker, meanwhile, whether selected according to speaker Select system automatic identification languages to determine the quantity of phonetic entry button, if languages, voice are specified in speaker's selection manually Load button is two, and dialogue both sides operate respective phonetic entry button respectively, if speaker selects system automatic identification Languages, then talk with both sides and share same phonetic entry button;
If speaker gets used to the border using the system automatic identification phonetic entry, the system provides phonetic entry The function of breakpoint automatic detection, so that when speaker suspends or stops phonetic entry, the border of automatic identification phonetic entry will The current phonetic entry obtained meets at follow-up processing flow;
(2) human-computer dialogue is carried out with speaker when necessary, interactive content is according to the different acoustic field of speaker Scape, words person's scene, language contextses etc. are optional (as shown in Figure 7).
If the Mediator dynamic calculation background noise levels when obtaining speaker's phonetic entry that personalize exceed set Threshold value, then the system by advise speaker re-enter voice or change input mode;
If the Mediator that personalizes judges to have more than the language of languages classification set by dialogue both sides by handling input voice Sound is inputted, then the system will advise that speaker resets languages option;
If the Mediator that personalizes there are multiple speakers while inputting voice by handling input voice, judgement, then should System will advise that speaker sequentially inputs voice and experienced to obtain preferably translation;
If the Mediator that personalizes is by handling the text to be translated that input voice is obtained after automatic speech recognition, together When analyze the language contextses and semantic degree of aliasing of text to be translated, when the complexity or semantic degree of aliasing of language contextses exceed it is default During threshold value, then startup human-computer dialogue is obscured the further of part by the system to obtain speaker to complex language scene and semanteme Illustrate, complex scene and semanteme here are obscured part and included but is not limited to:
If in text to be translated comprising name, place name and mechanism name, and its own or there is ambiguity with its context, Then the system will advise the word boundary and structure of speaker verification's name, place name and mechanism name;
If may include the vocabulary of identification mistake in text to be translated, the system will advise speaker verification's vocabulary It is whether consistent with its true input;
If including more colloquial style language fragment and repetition in text to be translated, the system will be automatic to be translated Text carries out language parsing and reconstruct, while the language performance of reconstruct is submitted into speaker verification, will reconstruct if being identified through Language performance meet at follow-up process processing, if speaker veto, the system will advise speaker reorganize language simultaneously It is expressed in more smooth mode again semantic;
If lacked in text to be translated comprising obvious composition, automatic cypher text for the treatment of is carried out composition by the system Completion, while the language performance of completion is submitted into speaker verification, if being identified through, the language performance of completion is met at subsequently Flow processing, if speaker vetos, it is defeated that the system will advise that speaker is re-started with more complete language construction expression Enter;
If including anachronies in text to be translated, the system is by the sequential of adjust automatically text to be translated, simultaneously The language performance of adjustment is submitted into speaker verification, if being identified through, the language performance of adjustment is met at into follow-up process processing, If speaker vetos, the system will advise speaker and re-start input with normal sequential;
If including time figure phrase in text to be translated, and phrase itself or there is ambiguity between its context, then The system will advise the phrasal boundary and structure of the digital phrase of speaker's acknowledging time, and the time figure phrase includes but do not limited In:Radix, ordinal number, decimal, fraction, probability word, multiplicative, approximate number, individual measure word, measurement word, compound classifier, indefinite measure word, Momentum word, when measure word, name measure word, the time, duration, season, month, week, solar term, red-letter day, the way of numbering the years;
If including proprietary phrase in text to be translated, and phrase itself or there is ambiguity between its context, then this is The phrasal boundary and structure united the proprietary phrase of speaker verification is advised, the proper noun include but is not limited to:Industry slang, Daily abbreviation, network neologisms, classic poetry, Chinese idiom, common saying, two-part allegorical saying.
Meanwhile, during above-mentioned human-computer dialogue, interactive communication interactive mode includes but is not limited to:Pass through voice Question and answer realize man-machine communication interaction;Prompting and the demand to speaker are shown by text;Obtained by screen touch, click etc. The confirmation and reply of speaker.
If the Mediator that personalizes obtains the rhythm scene information of input voice by processing and identified input voice, Then the system helps to lift the translation service experience of speaker using rhythm scene information, and rhythm scene here includes but do not limited In:
The intermediate hold information of voice is inputted, if the Mediator that personalizes passes through processing and identified input voice, collection The intermediate hold information of input voice is obtained, then the system will be according between described information intelligent decision speaker's spoken voice fragment Semantic association and semantic center of gravity, and in this, as the optimization foundation of follow-up process processing;
The punctuate border of voice is inputted, if the Mediator that personalizes is by processing and identified input voice, collection is obtained The punctuate boundary information of voice is inputted, then the system is by the chapter cutting according to described information intelligent decision text to be translated, and Segmental information is provided as follow-up process processing;
The intonation emotion of voice is inputted, if the Mediator that personalizes is by processing and identified input voice, collection is obtained The intonation emotion information of voice is inputted, then the system is by the semantic center of gravity and sentence according to described information intelligent decision text to be translated In with sentence tail tag point, and provide semantic emotion information as follow-up process;
The translation bout border of voice is inputted, if the Mediator that personalizes passes through processing and identified input voice, collection The translation bout boundary information of input voice is obtained, then the system will intelligently reset memory module according to described information, and with this It is used as the foundation for opening New Round translation dialogue.
(3) visualize to speaker's displaying current system conditions (as shown in Figure 8).Special personification is set in visualization interface Change image and show current system conditions to speaker, the special image that personalizes includes but is not limited to:Casper, star, animal, Robot etc.;The different phase of translation service is provided in the system, the special state for personalizing image includes but is not limited to:
It is special to personalize image to input the languages of voice or say when talking with one of both sides as speaker's input voice Artificial foundation is talked about, to listen to the state of input voice towards speaker direction;
When initiating human-computer dialogue to speaker, the special image that personalizes is according to interactive actual scene, to ask The states such as answer, friendly prompting, intelligent decision are towards speaker direction;
It is special to personalize in answer of the form according to speaker when answer of the acquisition speaker to human-computer dialogue content Hold, to listen to the states such as answer, understanding, thanks towards speaker direction;
When exporting translation result to the opposing party, the special image that personalizes is to lift up one's voice, the state of communication exchange towards Other direction.
(4) translation result (as shown in Figure 9) is intelligently exported to dialogue the opposing party.It is man-machine right that the Mediator that personalizes passes through Words, Intelligent treatment obtain the information such as above-mentioned acoustics, semanteme, the rhythm of speaker, are additional to when necessary in translation result synchronous Export and give dialogue the opposing party, the way of output includes but is not limited to:Output translation is marked using the modes such as the red, overstriking of text mark are exported As a result intermediate portions;The emotion and semantic center of gravity of speaker is shown using modes such as stress, the repetitions for exporting voice;Using attached Plus the mode illustrated automatically is explained the uncommon word in output text, professional conceptual.
Second embodiment:Multi-party Interpreter's conference system on mobile phone
In the present embodiment there is provided multi-party Interpreter's conference system on a kind of mobile phone, the system is provided to participant Spoken meeting interpretative function multi-party end to end presides over function there is provided intelligent meeting, and initiates man-machine to participant when necessary Talk with and experienced with the translation for lifting meeting.
(1) obtain conferencing information and create meeting (as shown in Figure 10).Meeting identification code, meeting are specified by conference creation person Identification code is the uniqueness basis of characterization of meeting, and other meeting participants, which pass through, inputs the specified meeting of meeting identification code participation;By Conference creation person specifies meeting title, and the content of the entitled meeting of meeting is summarized or the information of participant embodies;By conference creation Person specifies all languages of meeting, and participant can only select itself languages in the languages that conference creation person selectes;Participant distinguishes My name is inputted, I talks at name as the basis of characterization of participant in meeting at meeting interface and the Mediator that personalizes In emerge from.
(2) Mediator that personalizes starts multi-party Interpreter's meeting and process of intelligently presiding over the meeting (as shown in figure 11). If conference creation person's selection is voluntarily presided over the meeting by participant, the Mediator that personalizes is starting multi-party Interpreter's meeting Afterwards, conference creation person and participant is transferred to voluntarily to control the function of presiding over the meeting, these functions include but is not limited to:
If conference creation person's selection participant's speech pattern is that microphone fights for pattern, order of speech by participant from Row determines that, when participant's speech, the system will refuse the request of other participant's request floors, until participant hair Untill speech terminates, after participant speech terminates, other participants can apply for hair simultaneously with request floor if there is many people Speech, the then priority that the system is reached by application request determines next bit spokesman;
If conference creation person's selection participant's speech pattern is microphone designated mode, order of speech is by conference creation Person specifies, and when a participant is to conference creation person's request floor, conference creation person can grant the floor to that the participant, Other participants are during the participant makes a speech simultaneously, it is impossible to which request floor is weighed;
If the manual specified speech time span limitation of conference creation person, the system is long in participant's speech arrival time Prompting is made to participant during degree limitation;
If specified speech time span is not limited conference creation person, the system will not be made to participant's time limit of speech Limitation, is voluntarily controlled by participant;
If conference creation person is not provided with acoustics scene, words person's scene, language contextses and the rhythm of participant's speech voice The automatic monitoring of scene, the then Mediator that personalizes will not be done at any intellectuality to participant's speech voice and its translation result Reason, is directly distributed to other participants by translation result;
If conference creation person sets acoustics scene, words person's scene, language contextses and the rhythm of participant's speech voice The automatic monitoring of scape, the then Mediator that personalizes will do intelligent processing method to participant's speech voice and its translation result, and will Result returns to conference creation person, is decided whether further to be linked up or confirmed with spokesman by conference creation person, The automatic monitoring of wherein acoustics scene, words person's scene, language contextses and rhythm scene includes but is not limited to institute in first embodiment What is be related to recognizes and handles to the every of input voice, text to be translated or translation result.
If conference creation person's selection is intelligently presided over the meeting by the Mediator that personalizes, the Mediator that personalizes is in startup After multi-party Interpreter's meeting, the function that unlatching is presided over the meeting, these functions include but is not limited to:
The Mediator that personalizes passes through Intelligent Recognition and the processing to meeting process and minutes, while according to participant The order information of request floor, intelligence determines order of speech;
The Mediator that personalizes passes through the intelligent decision of speech content to current speaker, word speed and meeting process, right Time limit of speech length, content-length, word speed of current speaker etc. carry out intelligent reminding, and these intelligent remindings include:
If the time limit of speech of spokesman is long, the Mediator that personalizes is automatically reminded to spokesman and notes time limit of speech;
If the speech content of spokesman is long, the Mediator that personalizes is automatically reminded to spokesman and speech content is carried out Cutting preferably translates performance to reach;
If the word speed of spokesman is too fast, the Mediator that personalizes is automatically reminded to spokesman's reduction word speed, with gentler Rhythm expressed.
Personalize Mediator by current speaker is inputted the acoustics scene of voice, words person's scene, language contextses and The intellectual monitoring of rhythm scene, and to the Intelligent Recognition and processing of meeting process and minutes, dynamic decision enters with spokesman Row interactive necessity, wherein, the Mediator dynamic decisions that personalize include but is not limited to reference to factor:
If the Mediator Intellisenses participant that personalizes is higher to Session Topic familiarity, the system will heighten into The interactive threshold value OR gate limit of row, if the Mediator Intellisenses participant that personalizes is relatively low to Session Topic familiarity, Then the system, which will be reduced, carries out interactive threshold value OR gate limit;
If the Mediator Intellisense meetings process that personalizes is urgent higher, it is man-machine right that the system will heighten progress Words threshold value OR gate limit, if the Mediator Intellisense meeting process urgency that personalizes is relatively low, the system will reduce into The interactive threshold value OR gate limit of row.
The Mediator that personalizes inputs acoustics scene, words person's scene, language contextses and the rhythm of voice to current speaker The intellectual monitoring of scene includes but is not limited in first embodiment involved to input voice, text to be translated or translation result Every identification and handle.
(3) visualize to participant's displaying active conference state (as shown in figure 12).Special plan is set in visualization interface Peopleization image shows active conference state to participant, and the special image that personalizes includes but is not limited to:It is casper, star, dynamic Thing, robot etc.;Under the different conditions of meeting, the state of the special image that personalizes while change, including but not limited to therewith:
When participant's request floor, the special image that personalizes is to wait the state of listening towards applicant, while by Shen The participant's information that please be made a speech notifies other participants;
When participant is made a speech, the special state for personalizing image to listen to is towards spokesman, while with speech State is towards other participants;
When the Mediator that personalizes carries out human-computer dialogue with spokesman, the special image that personalizes is according to interactive reality Border scene, to ask the states such as answer, friendly prompting, intelligent decision towards spokesman direction, while with the state of wait towards Other participants;
When conference creation person's modification meeting correlation is set, modification content is notified other to attend a meeting by the special image that personalizes Person, modification content includes but is not limited to:Session Topic, meeting identification code, meeting languages, meeting presiding side, time limit of speech length Limitation, speech pattern, meeting personnel participating in the meeting.
(4) the Mediator intelligent decisions that personalize terminate meeting
If conference creation person's selection terminates meeting manually, pass through and click on the function button for terminating meeting come the meeting of termination View, while other participants will be forced to terminate meeting;
If conference creation person's selection terminates meeting by the Mediator intelligent decisions that personalize, the Mediator that personalizes leads to Identification and the processing to meeting process and minutes are crossed, the border of intelligent decision meeting process is whole after meeting process terminates Only meeting.
In addition, after meeting adjourned, the system provides meeting relevant information to participant, include but is not limited to:Minutes, Meeting statistical information, meeting staff list, meeting summary.
3rd embodiment:Based on the oral translation system that personalizes without screen display
There is provided a kind of based on the oral translation system that personalizes without screen display in the present embodiment, the system is to user There is provided without anthropomorphic translation service end to end in the case of screen, the system is adopted the following technical scheme that (as shown in figure 13):
(1) relevant information of speaker is obtained
In the case of without screen display, the system obtains speaker by the Intelligent treatment to speaker's input voice Relevant information, the relevant information includes but is not limited to:
Optionally, the system asks all speakers to say a common-use words successively on startup, is participated in so as to obtain dialogue The language information of side;
The system is automatic to carry out Speaker Identification to speaker, and recognition result is recognized as languages, human-computer dialogue and Distinguish the important evidence of different speaker's session logs.
(2) source language speech of speaker's input is obtained
After beginning of conversation, the system obtains complete source language speech by the Intelligent treatment to speaker's input voice, The Intelligent treatment of described pair of input voice includes but is not limited to:
The system is automatic to carry out breaking point detection to the voice that speaker inputs, so that Intelligent Recognition inputs the border of voice, Obtain complete sound bite.
(3) human-computer dialogue is carried out with speaker when necessary, interactive content is according to the different acoustic field of speaker Scape, words person's scene, language contextses etc. are optional
The Mediator that personalizes passes through acoustics scene, words person's scene, language contextses and the rhythm to current input voice The intellectual monitoring of scape, opens human-computer dialogue when necessary, and the Intelligent Measurement and human-computer dialogue include but is not limited to the first implementation Involved content in example.
(4) in the case of without screen, the Mediator that personalizes shows current session by the different conditions of sound to user State
Mediator personalize using sound as medium, the state of current session is shown to user, it is described using sound as medium Mode include but is not limited to:
The Mediator that personalizes supports to distinguish the different shapes such as human-computer dialogue and translation result output by the sex of sound State;
The Mediator that personalizes supports to distinguish the different shapes such as human-computer dialogue and translation result output by the tone of sound State, carry out human-computer dialogue when, the Mediator that personalizes using softly discussion, Imperative Mood, and export translation result when, The Mediator that personalizes then uses objective, the serious statement tone;
The Mediator that personalizes supports to distinguish human-computer dialogue and translation result output by the prefix background music of sound Deng different conditions, when carrying out human-computer dialogue, the Mediator that personalizes will insert the brisk of brief inspiration meaning before dialogue Music, and when exporting translation result, the Mediator that personalizes then will insert the brief thick and heavy sound for informing meaning before output It is happy.
(5) translation result voice is intelligently exported to dialogue the opposing party
The Mediator that personalizes obtains the letters such as above-mentioned acoustics, semanteme, the rhythm of speaker by human-computer dialogue, Intelligent treatment Breath, is additional to synchronism output in translation result and gives dialogue the opposing party, the way of output includes but is not limited to when necessary:Utilize output The modes such as stress, the repetition of voice show the emotion and semantic center of gravity of speaker.
It should be noted that in accompanying drawing or specification text, the implementation for not illustrating or describing is affiliated technology Form known to a person of ordinary skill in the art, is not described in detail in field.In addition, the above-mentioned definition to each element and method is simultaneously Various concrete structures, shape or the mode mentioned in embodiment are not limited only to, those of ordinary skill in the art can carry out letter to it Singly change or replace.
In summary, the present invention provides a kind of Interpreter's method and system that personalize with good in interactive function.This The core point of invention is on the basis of original speech recognition and translation, to add a personal-machine session module, and the module is caught, place Acoustics scene, words person's scene, rhythm scene, the language contextses of reason and identification at that time etc., need when necessary according to translation duties Human-computer dialogue is carried out with user, the translation experience of user under complex application context can be obviously improved by definitely obtaining, and be improved Translate the semantic degree of accuracy.
Particular embodiments described above, has been carried out further in detail to the purpose of the present invention, technical scheme and beneficial effect Describe in detail it is bright, should be understood that the foregoing is only the present invention specific embodiment, be not intended to limit the invention, it is all Within the spirit and principles in the present invention, any modification, equivalent substitution and improvements done etc., should be included in the guarantor of the present invention Within the scope of shield.

Claims (10)

1. a kind of Interpreter's method that personalizes with good in interactive function, it comprises the following steps:
Intelligent sound identification is carried out to source language speech, source language text is obtained;
Source language text and session operational scenarios are handled, the human-computer dialogue communication that personalizes is carried out;
Machine translation is carried out, translation result is obtained.
2. the Interpreter's method according to claim 1 that personalizes, wherein, intelligent sound identification is carried out to source language speech When, specifically include following steps:
The probability score of every frame speech/non-speech, the probability score per frame source languages/target language are provided respectively;
It is synchronous that bilingual decoding is carried out to voice;
Significant recognition result is exported using informix.
3. the Interpreter's method according to claim 1 that personalizes, wherein, source language text and session operational scenarios are carried out During processing, including acoustics scene, words person's scene, rhythm scene, language contextses are perceived.
4. the Interpreter's method according to claim 3 that personalizes, wherein, when perceiving acoustics scene, wait to perceive Information includes:Dynamic background noise information.
5. the Interpreter's method according to claim 3 that personalizes, wherein, when interlocutor's scene is perceived, wait to perceive Information includes:Speaker information, language information, whether there are many people to speak, many people's speech utterances separation information and other speakers Scene information.
6. the Interpreter's method according to claim 3 that personalizes, wherein, when perceiving rhythm scene, wait to perceive Information includes:Input intermediate hold, punctuate border, word speed, fundamental frequency, formant and prosodic analysis confidence level and the translation of voice Bout border.
7. the Interpreter's method according to claim 3 that personalizes, wherein, when perceiving language contextses, wait to perceive Information includes:
Name, place name and the mechanism name extracted in text to be translated;
The vocabulary for the identification mistake that may be included in text to be translated;
The colloquial style language fragment included in text to be translated and repetition;
The obvious composition missing included in text to be translated;
The anachronies included in text to be translated;
The time figure phrase language message included in text to be translated;
Industry slang, daily abbreviation, network neologisms, classic poetry, Chinese idiom, common saying, the two-part allegorical saying included in text to be translated.
8. the Interpreter's method according to claim 1 that personalizes, wherein, progress personalizes human-computer dialogue when linking up, its Communication includes:
The prompting of proper use of method, the prompting for improving use environment or words person's environment;
Need to obtain semantic description information during further semantic description.
9. a kind of oral translation system that personalizes with good in interactive function, it includes:
Sound identification module, for carrying out intelligent sound identification to source language speech, obtains source language text;
Human-computer dialogue management module, for handling source language text and session operational scenarios, carries out the human-computer dialogue that personalizes Link up;
Machine translation module, for carrying out machine translation, obtains translation result.
10. the oral translation system according to claim 9 that personalizes, wherein, turned over by visualizing the image display that personalizes The different conditions of system are translated, or pass through the different conditions of non-visual voice intermediary displaying translation system.
CN201710535661.8A 2017-07-03 2017-07-03 The Interpreter's method and system that personalize with good in interactive function Pending CN107315742A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710535661.8A CN107315742A (en) 2017-07-03 2017-07-03 The Interpreter's method and system that personalize with good in interactive function

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710535661.8A CN107315742A (en) 2017-07-03 2017-07-03 The Interpreter's method and system that personalize with good in interactive function

Publications (1)

Publication Number Publication Date
CN107315742A true CN107315742A (en) 2017-11-03

Family

ID=60180482

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710535661.8A Pending CN107315742A (en) 2017-07-03 2017-07-03 The Interpreter's method and system that personalize with good in interactive function

Country Status (1)

Country Link
CN (1) CN107315742A (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108536655A (en) * 2017-12-21 2018-09-14 广州市讯飞樽鸿信息技术有限公司 Audio production method and system are read aloud in a kind of displaying based on hand-held intelligent terminal
CN108806688A (en) * 2018-07-16 2018-11-13 深圳Tcl数字技术有限公司 Sound control method, smart television, system and the storage medium of smart television
CN108831436A (en) * 2018-06-12 2018-11-16 深圳市合言信息科技有限公司 A method of text speech synthesis after simulation speaker's mood optimization translation
CN109166594A (en) * 2018-07-24 2019-01-08 北京搜狗科技发展有限公司 A kind of data processing method, device and the device for data processing
CN109286725A (en) * 2018-10-15 2019-01-29 华为技术有限公司 Interpretation method and terminal
CN109359305A (en) * 2018-09-05 2019-02-19 盛云未来(北京)科技有限公司 A kind of method and apparatus of multilingual intertranslation in unison
CN109710949A (en) * 2018-12-04 2019-05-03 深圳市酷达通讯有限公司 A kind of interpretation method and translator
CN109933807A (en) * 2017-12-19 2019-06-25 葫芦科技有限公司 A method of it is translated by smartwatch physical button and carries out voice literal translation
CN110008481A (en) * 2019-04-10 2019-07-12 南京魔盒信息科技有限公司 Translated speech generation method, device, computer equipment and storage medium
CN111709431A (en) * 2020-06-15 2020-09-25 厦门大学 Instant translation method and device, computer equipment and storage medium
CN112055876A (en) * 2018-04-27 2020-12-08 语享路有限责任公司 Multi-party dialogue recording/outputting method using voice recognition technology and apparatus therefor
CN112435690A (en) * 2019-08-08 2021-03-02 百度在线网络技术(北京)有限公司 Duplex Bluetooth translation processing method and device, computer equipment and storage medium
CN117275455A (en) * 2023-11-22 2023-12-22 深圳市阳日电子有限公司 Sound cloning method for translation earphone

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101071565A (en) * 2006-05-12 2007-11-14 摩托罗拉公司 Method for correcting voice identification system
CN101727904A (en) * 2008-10-31 2010-06-09 国际商业机器公司 Voice translation method and device
CN102779508A (en) * 2012-03-31 2012-11-14 安徽科大讯飞信息科技股份有限公司 Speech corpus generating device and method, speech synthesizing system and method
CN103744843A (en) * 2013-12-25 2014-04-23 北京百度网讯科技有限公司 Online voice translation method and device
CN105786880A (en) * 2014-12-24 2016-07-20 中兴通讯股份有限公司 Voice recognition method, client and terminal device
CN106098060A (en) * 2016-05-19 2016-11-09 北京搜狗科技发展有限公司 The correction processing method of voice and device, the device of correction process for voice

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101071565A (en) * 2006-05-12 2007-11-14 摩托罗拉公司 Method for correcting voice identification system
CN101727904A (en) * 2008-10-31 2010-06-09 国际商业机器公司 Voice translation method and device
CN102779508A (en) * 2012-03-31 2012-11-14 安徽科大讯飞信息科技股份有限公司 Speech corpus generating device and method, speech synthesizing system and method
CN103744843A (en) * 2013-12-25 2014-04-23 北京百度网讯科技有限公司 Online voice translation method and device
CN105786880A (en) * 2014-12-24 2016-07-20 中兴通讯股份有限公司 Voice recognition method, client and terminal device
CN106098060A (en) * 2016-05-19 2016-11-09 北京搜狗科技发展有限公司 The correction processing method of voice and device, the device of correction process for voice

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109933807A (en) * 2017-12-19 2019-06-25 葫芦科技有限公司 A method of it is translated by smartwatch physical button and carries out voice literal translation
CN108536655A (en) * 2017-12-21 2018-09-14 广州市讯飞樽鸿信息技术有限公司 Audio production method and system are read aloud in a kind of displaying based on hand-held intelligent terminal
CN112055876A (en) * 2018-04-27 2020-12-08 语享路有限责任公司 Multi-party dialogue recording/outputting method using voice recognition technology and apparatus therefor
CN108831436A (en) * 2018-06-12 2018-11-16 深圳市合言信息科技有限公司 A method of text speech synthesis after simulation speaker's mood optimization translation
CN108806688A (en) * 2018-07-16 2018-11-13 深圳Tcl数字技术有限公司 Sound control method, smart television, system and the storage medium of smart television
WO2020019610A1 (en) * 2018-07-24 2020-01-30 北京搜狗科技发展有限公司 Data processing method, apparatus, and apparatus used for data processing
CN109166594A (en) * 2018-07-24 2019-01-08 北京搜狗科技发展有限公司 A kind of data processing method, device and the device for data processing
CN109359305A (en) * 2018-09-05 2019-02-19 盛云未来(北京)科技有限公司 A kind of method and apparatus of multilingual intertranslation in unison
US11893359B2 (en) 2018-10-15 2024-02-06 Huawei Technologies Co., Ltd. Speech translation method and terminal when translated speech of two users are obtained at the same time
CN109286725B (en) * 2018-10-15 2021-10-19 华为技术有限公司 Translation method and terminal
CN109286725A (en) * 2018-10-15 2019-01-29 华为技术有限公司 Interpretation method and terminal
CN109710949A (en) * 2018-12-04 2019-05-03 深圳市酷达通讯有限公司 A kind of interpretation method and translator
CN109710949B (en) * 2018-12-04 2023-06-23 深圳市酷达通讯有限公司 Translation method and translator
CN110008481B (en) * 2019-04-10 2023-04-28 南京魔盒信息科技有限公司 Translated voice generating method, device, computer equipment and storage medium
CN110008481A (en) * 2019-04-10 2019-07-12 南京魔盒信息科技有限公司 Translated speech generation method, device, computer equipment and storage medium
CN112435690A (en) * 2019-08-08 2021-03-02 百度在线网络技术(北京)有限公司 Duplex Bluetooth translation processing method and device, computer equipment and storage medium
CN111709431B (en) * 2020-06-15 2023-02-10 厦门大学 Instant translation method and device, computer equipment and storage medium
CN111709431A (en) * 2020-06-15 2020-09-25 厦门大学 Instant translation method and device, computer equipment and storage medium
CN117275455A (en) * 2023-11-22 2023-12-22 深圳市阳日电子有限公司 Sound cloning method for translation earphone
CN117275455B (en) * 2023-11-22 2024-02-13 深圳市阳日电子有限公司 Sound cloning method for translation earphone

Similar Documents

Publication Publication Date Title
CN107315742A (en) The Interpreter's method and system that personalize with good in interactive function
US9614969B2 (en) In-call translation
US20200125920A1 (en) Interaction method and apparatus of virtual robot, storage medium and electronic device
US20150347399A1 (en) In-Call Translation
WO2015062284A1 (en) Natural expression processing method, processing and response method, device, and system
CN110299152A (en) Interactive output control method, device, electronic equipment and storage medium
US20070285505A1 (en) Method and apparatus for video conferencing having dynamic layout based on keyword detection
MXPA04005121A (en) Semantic object synchronous understanding for highly interactive interface.
KR102212298B1 (en) Platform system for providing video communication between non disabled and hearing impaired based on artificial intelligence
CN105798918A (en) Interactive method and device for intelligent robot
CN109256133A (en) A kind of voice interactive method, device, equipment and storage medium
CN109005190B (en) Method for realizing full duplex voice conversation and page control on webpage
WO2023226914A1 (en) Virtual character driving method and system based on multimodal data, and device
WO2018230345A1 (en) Dialogue robot, dialogue system, and dialogue program
CN116524791A (en) Lip language learning auxiliary training system based on meta universe and application thereof
US20210312143A1 (en) Real-time call translation system and method
Inupakutika et al. Integration of NLP and Speech-to-text Applications with Chatbots
CN107783650A (en) A kind of man-machine interaction method and device based on virtual robot
JP2021051693A (en) Utterance system, utterance recommendation device, utterance recommendation program, and utterance recommendation method
CN116009692A (en) Virtual character interaction strategy determination method and device
EP4006903A1 (en) System with post-conversation representation, electronic device, and related methods
US20070005812A1 (en) Asynchronous communicative exchange
US20240154833A1 (en) Meeting inputs
KR20230102753A (en) Method, computer device, and computer program to translate audio of video into sign language through avatar
JP6610965B2 (en) Dialogue method, dialogue system, dialogue apparatus, and program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20190729

Address after: 100190 52513, 5th floor, 6 Nansanjie, Zhongguancun, Haidian District, Beijing

Applicant after: Beijing Zidong Cognitive Technology Co., Ltd.

Address before: 100190 Zhongguancun East Road, Beijing, No. 95, No.

Applicant before: Institute of Automation, Chinese Academy of Sciences

TA01 Transfer of patent application right
RJ01 Rejection of invention patent application after publication

Application publication date: 20171103

RJ01 Rejection of invention patent application after publication