CN109033423A

CN109033423A - Simultaneous interpretation caption presentation method and device, intelligent meeting method, apparatus and system

Info

Publication number: CN109033423A
Application number: CN201810906577.7A
Authority: CN
Inventors: 王宁
Original assignee: Beijing Sogou Technology Development Co Ltd; Sogou Hangzhou Intelligent Technology Co Ltd
Current assignee: Beijing Sogou Technology Development Co Ltd; Sogou Hangzhou Intelligent Technology Co Ltd
Priority date: 2018-08-10
Filing date: 2018-08-10
Publication date: 2018-12-18

Abstract

The invention discloses a kind of simultaneous interpretation caption presentation method and devices, this method comprises: obtaining the version of current statement；Determine expression content to be showed；Obtain expression file corresponding with the expression content；The version is shown on the screen, and the expression file is synchronized with the version and is showed.Invention additionally discloses a kind of intelligent meeting method, apparatus and systems.Using the present invention, it can greatly enrich and show content, make up since translation result is undesirable to user's bring visual impact, promote user experience.

Description

Simultaneous interpretation caption presentation method and device, intelligent meeting method, apparatus and system

Technical field

The present invention relates to automatic translation fields, and in particular to a kind of simultaneous interpretation caption presentation method and device further relate to one kind Intelligent meeting method, apparatus and system.

Background technique

Simultaneous interpretation is high-efficient as a kind of interpretative system, maximum feature, it is ensured that talker makees coherent hair Speech, the thinking without will affect or interrupt talker are conducive in the whole text understanding of the audience to speech full text, and therefore, simultaneous interpretation becomes The generally popular interpretative system in the world today, the international conference of especially some large sizes, what is often used is all simultaneous interpretation.Mesh Before, simultaneous interpretation is main or is completed by artificial interpreter, and working strength and pressure are very big.

With speech recognition and the development of machine translation mothod, industry also occurs replacing artificial realization simultaneous interpretation by machine Product, can by translation result with written form simultaneous display on the screen.But since current properties of product are mostly not ideal enough, The case where being usually present translation inaccuracy or can not translating, the subtitle of display can not only bring doubt to user, or even can make User misunderstands to speech content, influences user experience.

Summary of the invention

On the one hand the embodiment of the present invention provides a kind of simultaneous interpretation caption presentation method and device, to make up due to translation result not Ideal occlusion user's bring visual impact promotes user experience.

On the other hand the embodiment of the present invention provides a kind of intelligent meeting method, apparatus and system, automatic realize sends out meeting-place The simultaneous interpretation of speech reduces human cost, and promotes the visual experience of user.

For this purpose, the invention provides the following technical scheme:

A kind of simultaneous interpretation caption presentation method, which comprises

Obtain the version of current statement；

Determine expression content to be showed；

Obtain expression file corresponding with the expression content；

The version is shown on the screen, and the expression file is synchronized with the version and is showed.

Optionally, the method also includes: obtain the confidence level of the version；

Determination expression content to be showed includes: that the first expression content to be showed is determined according to the confidence level.

Optionally, the method also includes: to the current statement carry out tone identification, according to recognition result determine make a speech Human feelings thread；

Determination expression content to be showed includes: to be determined in the second expression to be showed according to spokesman's mood Hold.

Optionally, the method also includes: obtain the source text of current statement；It detects in the source text and whether wraps Containing specific word, testing result is obtained；

Determination expression content to be showed includes:

Expression content to be showed is determined according to the testing result；Or according to the confidence level and the testing result Determine the first expression content to be showed；Or to be showed second is determined according to the testing result and spokesman's mood Expression content.

Optionally, the method also includes: by the source text and the version simultaneous display.

Optionally, the expression file is the file of any one or more following form: picture, animation.

A kind of simultaneous interpretation subtitling display equipment, described device include:

Translation obtains module, for obtaining the version of current statement；

Expression determining module, for determining expression content to be showed；

File acquisition module, for obtaining expression file corresponding with the expression content；

Display control module, for showing the version on the screen, and by the expression file and the translation Text, which synchronizes, to be showed.

Optionally, the translation obtains module, is also used to obtain the confidence level of the version；The expression determines mould Block, specifically for determining expression content to be showed according to the confidence level.

Optionally, described device further include: mood analysis module, for carrying out tone identification, root to the current statement Spokesman's mood is determined according to recognition result；

The expression determining module, for determining the second expression content to be showed according to spokesman's mood.

Optionally, described device further include: original text obtains module, for obtaining the source text of current statement；Word detection Module obtains testing result for whether detecting in the source text comprising specific word；

The expression determining module, specifically for determining expression content to be showed according to the testing result；Or root Dynamic effect state is determined according to the testing result that the confidence level and the detection module export.

Optionally, the display control module is also used to the source text and the version simultaneous display.

A kind of intelligent meeting method, which comprises

The voice data of real-time reception spokesman；

Speech recognition is carried out to the voice data, obtains source text；

The source text is translated, version is obtained；

Determine expression content to be showed；

Obtain expression file corresponding with the expression content；

Optionally, the method also includes: acquire the face-image of the spokesman；According to the face-image, determine The emotional state and/or mood of the spokesman；

Determination expression content to be showed include: according to the emotional state and/or mood of the spokesman determine to The the second expression content showed.

Optionally, the method also includes: to the voice data carry out tone identification, according to recognition result determine make a speech Human feelings thread；

Optionally, the method also includes: whether detect in the source text comprising specific word, obtain testing result；

Determination expression content to be showed includes: that expression content to be showed is determined according to the testing result；Or Person determines the first expression content to be showed according to the confidence level and the testing result；Or according to spokesman's mood And the testing result determines the second expression content to be showed.

Optionally, the method also includes: when meeting starts, show start dynamic effect on the screen；And/or in meeting knot Shu Shi shows the dynamic effect of end on the screen.

Optionally, the method also includes: whether detect in the source text comprising indicating that meeting starts and/or indicates The word that meeting adjourned；If it is, determining that meeting starts and/or meeting adjourned.

A kind of intelligent meeting device, described device include:

Voice acquisition module, the voice data for real-time reception spokesman；

Speech recognition module obtains source text for carrying out speech recognition to the voice data；

Translation module obtains version for translating to the source text；

Optionally, the translation module is also used to obtain the confidence level of the version；The expression determining module, Specifically for determining the first expression content to be showed according to the confidence level.

Optionally, described device further include: image capture module, for acquiring the face-image of the spokesman；Image Analysis module, the face-image for being acquired according to described image acquisition module, determine the spokesman emotional state and/or Mood；

The expression determining module, specifically for being determined according to the emotional state and/or mood of the spokesman wait show The second expression content.

Optionally, described device further include: mood analysis module, for carrying out tone identification, root to the voice data The mood of spokesman is determined according to recognition result；

The expression determining module, specifically for determining the second expression content to be showed according to spokesman's mood.

Optionally, described device further include: word detection module, for whether detecting in the source text comprising specific Word, output test result；

The expression determining module determines to be showed specifically for the testing result exported according to institute's predicate detection module Expression content；Or the first expression to be showed is determined according to the testing result that the confidence level and institute's predicate detection module export Content；Or the second expression to be showed is determined according to the testing result that spokesman's mood and institute's predicate detection module export Content.

Optionally, the display control module is also used to when meeting starts, and shows start dynamic effect on the screen；And/or When meeting adjourned, show the dynamic effect of end on the screen.

Optionally, whether institute's predicate detection module is also used to detect in the detection source text comprising indicating meeting Beginning and/or the expression word that meeting adjourned；Start and/or meeting knot if it is, sending meeting to the display control module Beam trigger signal.

A kind of intelligent meeting system, the system comprises: user terminal and server；The user terminal includes: voice Acquisition module, first communication module, expression determining module, file acquisition module and display control module；The server packet It includes: speech recognition module, translation module and second communication module；

The voice acquisition module, the voice data for real-time reception spokesman；

The first communication module, for sending server for the voice data；

The speech recognition module obtains source text for identifying to the voice data；

The translation module obtains version for translating to the source text；

The second communication module, for the source text, version and its confidence level to be sent to user terminal；

The expression determining module, for determining expression content to be showed；

The file acquisition module, for obtaining expression file corresponding with the expression content；

The display control module, for showing the version on the screen, and by the expression file with it is described Version, which synchronizes, to be showed.

Optionally, the translation module is also used to obtain the confidence level of the version；

The expression determining module, specifically for determining the first expression content to be showed according to the confidence level.

Optionally, the user terminal further include: image capture module, for acquiring the face-image of the spokesman； Image analysis module, the face-image for being acquired according to described image acquisition module, determines the emotional state of the spokesman And/or mood；

Optionally, the user terminal further include: mood analysis module, for carrying out tone knowledge to the voice data Not, the mood of the spokesman is determined according to recognition result；

Optionally, the user terminal further include: word detection module, for whether detecting in the source text comprising spy Determine word, output test result；

Optionally, institute's predicate detection module, be also used to detect in the source text whether comprising indicate meeting start and/ Or indicate the word that meeting adjourned；If it is, to the display control module send meeting start and/or meeting adjourned triggering Signal.

A kind of computer equipment, comprising: one or more processors, memory；

For the memory for storing computer executable instructions, the processor is executable for executing the computer Instruction, to realize mentioned-above method.

A kind of readable storage medium storing program for executing, is stored thereon with instruction, and described instruction is performed to realize mentioned-above method.

Simultaneous interpretation caption presentation method and device provided in an embodiment of the present invention, it is synchronous when showing version on the screen Show expression file, enrich and show content, improves user's impression.

It is possible to further determine expression content to be showed according to the confidence level of translation, make and the expression content pair The expression file answered mutually is echoed with translation effect, expresses the emotional feedback to translation result well, to translation inaccuracy or The case where person can not translate is made up by vision, and user experience is effectively promoted.

Further, also current dynamic effect state can be adjusted according to whether including some specific words in source text, risen To the effect for promoting user's attention.

Intelligent meeting method, apparatus and system provided in an embodiment of the present invention, by above-mentioned simultaneous interpretation Subtitle Demonstration scheme application In intelligent meeting, further, it can not only show the expression file mutually echoed with translation effect, but also can be by more Kind mode determines the emotional state and/or mood of spokesman, on the screen simultaneous display and the emotional state and/or mood pair The virtual portrait answered.It has greatly enriched and has showed content, improved and show effect.Using the present invention program, can effectively dissolve by Awkward in the insufficient bring of the relevant technologies, guidance audience is adjusted to active mood state, and more for the injection of meeting scene More vigor and interest, auxiliary build better atmosphere, promote the attention, enthusiasm and participation of spectators.

Detailed description of the invention

In order to illustrate the technical solutions in the embodiments of the present application or in the prior art more clearly, below will be to institute in embodiment Attached drawing to be used is needed to be briefly described, it should be apparent that, the accompanying drawings in the following description is only one recorded in the present invention A little embodiments are also possible to obtain other drawings based on these drawings for those of ordinary skill in the art.

Fig. 1 is a kind of flow chart of simultaneous interpretation caption presentation method of the embodiment of the present invention；

Fig. 2 is another flow chart of simultaneous interpretation caption presentation method of the embodiment of the present invention；

Fig. 3 is a kind of structural schematic diagram of simultaneous interpretation subtitling display equipment of the embodiment of the present invention；

Fig. 4 is another structural schematic diagram of simultaneous interpretation subtitling display equipment of the embodiment of the present invention；

Fig. 5 is a kind of flow chart of intelligent meeting method of the embodiment of the present invention；

Fig. 6 is another flow chart of intelligent meeting method of the embodiment of the present invention；

Fig. 7 is a kind of structural schematic diagram of intelligent meeting device of the embodiment of the present invention；

Fig. 8 is another structural schematic diagram of intelligent meeting device of the embodiment of the present invention；

Fig. 9 is a kind of structural schematic diagram of intelligent meeting system of the embodiment of the present invention.

Figure 10 is a kind of block diagram of device for simultaneous interpretation caption presentation method shown according to an exemplary embodiment；

Figure 11 is the structural schematic diagram of server in the embodiment of the present invention.

Specific embodiment

The scheme of embodiment in order to enable those skilled in the art to better understand the present invention with reference to the accompanying drawing and is implemented Mode is described in further detail the embodiment of the present invention.

As shown in Figure 1, be a kind of flow chart of simultaneous interpretation caption presentation method of the embodiment of the present invention the following steps are included:

Step 101, the version of current statement is obtained.

The version is to carry out the text that real-time automatic translation obtains to current statement by translation system, and different turns over System is translated, obtained version might have difference.

Step 102, expression content to be showed is determined.

In practical applications, the confidence level of the version, phase can be obtained when obtaining the version of current statement Ying Di can determine the first expression content to be showed according to confidence level.Specifically, it can preset for different confidence levels Expression content, for example, when simultaneous interpretation result confidence level is high, expression content is the active moods such as pleasure, excitement；When simultaneous interpretation result When confidence level is low, expression content is to sweat, can't bear the states such as direct-view, anxiety, sorry to show, faces technology and is still improved space Status, the undesirable embarrassment of neutralizing translation, guidance user are adjusted to active mood state；When simultaneous interpretation result confidence level is in normal model When enclosing interior, expression content can be the dynamic effect for looking up the PPT content shown on screen, and moulding can not only not hear, moreover it is possible to it understands, Interesting intelligence image.

Furthermore it is also possible to spokesman's mood be determined, according to the speech by carrying out tone identification to the current statement Human feelings thread determines the second expression content to be showed.Tone identification and emotion judgment can use existing the relevant technologies, herein It repeats no more.

It should be noted that in practical applications, can be individually determined in above-mentioned first expression content or the second expression Hold, it is also both available.

Step 103, expression file corresponding with the expression content is obtained.

Each expression can be made into corresponding expression file, for example can be picture file, animation file etc..

It is previously noted that when determining expression content, it can be individually according to the confidence level of version or spokesman's mood It determines, can also be determined according to the two.Correspondingly, different expression contents has corresponded to different expression files, Er Qieji Can be different in the corresponding expression file form of the expression content of different information, it for convenience, will be in first expression Hold corresponding expression file and be known as the first expression file, the corresponding expression file of the second expression content is known as the second expression File.

What first expression file was showed can be the virtual image of simultaneous interpretation translation or some tables of existing smiling face Feelings etc.；What second expression file was showed can be a virtual figure image or is different from the first expression file Other expressions of the smiling face showed etc..

Step 104, the version is shown on the screen, and the expression file and the version are carried out together Step shows.

In practical applications, both first expression file and second expression file can be shown, can also Individually to show one of which.

Simultaneous interpretation caption presentation method provided in an embodiment of the present invention, using with version synchronize the expression file that shows, It has greatly enriched and has showed content, improved user's impression.It is possible to further according to the confidence level of translation and/or speech human feelings Thread determines expression content to be showed, and echoes expression file corresponding with the expression content mutually with translation, allows to very Express well to the emotional feedback of translation result perhaps spokesman talks when mood to translation inaccuracy or can not translate Situation is made up by vision, and user experience is effectively promoted.As shown in Fig. 2, being that simultaneous interpretation subtitle of the embodiment of the present invention is aobvious Show another flow chart of method, comprising the following steps:

Step 201, the source text of current statement, the confidence level of version and the version are obtained.

Step 202, it whether detects in the source text comprising specific word, obtains testing result.

The specific word can be some words in current some network hot words or specific area for example, can be Hot word in relation to social hotspots or correlative technology field hot spot etc..For network hot word, can first pass through in advance on network Text data carry out hot word extract to obtain, specific abstracting method can use the prior art, this embodiment of the present invention is not done It limits.

Specifically, the specific word can be placed in vocabulary in advance, for example, will extract obtain hot word such as " VR ", " AI " etc. is placed in the vocabulary, when whether including specific word in detecting source text, can first be divided source text Word processing, each word for then obtaining word segmentation processing are matched with the specific word in the vocabulary, that is, can determine the original text It whether include specific word in text.

Step 203, expression content to be showed is determined according to the confidence level and the testing result.

In this embodiment, the confidence level and source text content for comprehensively considering version determine corresponding dynamic effect shape State.For example, if the confidence level reaches setting value, and when detecting in source text comprising hot word, then painted eggshell is triggered Dynamic effect or the dynamic effect feedback for giving affirmative, can be improved the interest and vividness that inside story is shown, can also promote the note of user Meaning power.For another example, if the confidence level is lower than the setting value, illustrate that the readability of version is poor, at this time whether No to detect hot word, dynamic effect state, which is still shown, sweats, can't bear the states such as direct-view, anxiety, to dissolve the undesirable embarrassment of translation.

Step 204, expression file corresponding with the expression content is obtained.

Step 205, the version is shown on the screen, and the expression file and the version are carried out together Step shows.

In embodiments of the present invention, the confidence level and source text content for comprehensively considering version determine table to be showed Feelings content can preferably guide concern of the user to Hot Contents.It certainly, in practical applications, can also be individually according to right Whether source text includes that the testing result of specific word determines expression content to be showed.

In addition, in another embodiment of simultaneous interpretation caption presentation method of the present invention, can also comprehensively consider spokesman's mood and Source text content determines the second expression content to be showed, the content that corresponding second expression file can be made to show and dynamic effect It is richer.

In addition, in practical applications, it, can also be by the original for each embodiment of aforementioned present invention simultaneous interpretation caption presentation method Text and the version simultaneous display.In display, can be shown so that up and down or left and right is corresponding, or be shown in screen On different zones, without limitation to this embodiment of the present invention.

It should be noted that the embodiment of the present invention is not limited to specific translation system, obtained based on any translation system Version may be by the embodiment of the present invention scheme carry out simultaneous interpretation Subtitle Demonstration.

Correspondingly, the embodiment of the present invention also provides a kind of simultaneous interpretation subtitling display equipment, as shown in figure 3, being implementation of the present invention A kind of structural schematic diagram of example simultaneous interpretation subtitling display equipment.

In this embodiment, described device includes:

Translation obtains module 301, for obtaining the version of current statement；

Expression determining module 302, for obtaining expression file corresponding with the expression content；

File acquisition module 303, for obtaining corresponding with expression content expression file, the expression file can be with It is picture or animation etc.；

Display control module 304 is translated for showing the version on the screen, and by the expression file with described Text, which synchronizes, to be showed.

Simultaneous interpretation subtitling display equipment provided in an embodiment of the present invention also shows phase when carrying out the display of simultaneous interpretation version The expression file answered, has greatly enriched and has showed content, improves user's impression.

In practical applications, the expression determining module 302 can determine expression content to be showed there are many mode.

For example, obtaining the confidence level that module 301 obtains the version by the translation；The expression determining module 302 determine the first expression content to be showed according to the confidence level.Correspondingly, the acquisition of file acquisition module 303 and institute State corresponding first expression file of the first expression content；The display control module 304 shows the version on the screen When, first expression file is synchronized with the version and is showed.

For another example, mood analysis module (not shown) is set in said device, for carrying out language to the current statement Gas identification, determines spokesman's mood according to recognition result.It specifically can be according to the voice data or text data of current statement Tone identification and emotion judgment are carried out, tone identification and emotion judgment can use existing the relevant technologies, and details are not described herein.

Correspondingly, the expression determining module 302 can determine the second expression to be showed according to spokesman's mood Content；The file acquisition module 303 obtains the second expression file corresponding with the second expression content；The display control When module 304 shows the version on the screen, second expression file and the version are synchronized into exhibition It is existing.

The content and form that first expression file and second expression file are showed have specifically in front Bright, details are not described herein.Moreover, in practical applications, both first expression file and second expression file can To show, one of which can also be individually shown.

Simultaneous interpretation subtitling display equipment provided in an embodiment of the present invention, using with version synchronize the expression file that shows, It has greatly enriched and has showed content, improved user's impression.It is possible to further according to the confidence level of translation and/or speech human feelings Thread determines expression content to be showed, and echoes expression file corresponding with the expression content mutually with translation, allows to very Express well to the emotional feedback of translation result perhaps spokesman talks when mood to translation inaccuracy or can not translate Situation is made up by vision, and user experience is effectively promoted.

As shown in figure 4, being another structural schematic diagram of simultaneous interpretation subtitling display equipment of the embodiment of the present invention.

Unlike embodiment illustrated in fig. 3, in this embodiment, described device further include: original text obtains 305 He of module Word detection module 306.Wherein:

The original text obtains the source text that module 305 is used to obtain current statement；

Institute's predicate detection module 306 obtains testing result for whether detecting in the source text comprising specific word.

Correspondingly, in this embodiment, the expression determining module 302 can be detected according to the confidence level and institute's predicate The testing result that module 306 exports determines expression content to be showed, or is individually determined according to the testing result wait show Expression content.

The simultaneous interpretation subtitling display equipment of the embodiment of the present invention can setting according to version when determining expression content Reliability and/or source text content determine, not only can carry out vision more to translation inaccuracy or the case where can not translating It mends, and by guiding concern of the user to Hot Contents with the dynamic effect of version simultaneous display, further improves user Experience.

In another embodiment of simultaneous interpretation subtitling display equipment of the present invention, above-mentioned original text can also be set in said device and obtained Modulus block 305, word detection module 306 and mood analysis module (not shown) comprehensively consider with spokesman's mood and original text text This content determines that expression content to be showed, the content that second expression file can be made to show and dynamic effect are richer.

Further, in each embodiment of aforementioned present invention simultaneous interpretation subtitling display equipment, the display control module 304 is also It can be by the source text and the version simultaneous display.

Above-mentioned simultaneous interpretation Subtitle Demonstration scheme can be applied to the scenes such as meeting scene, net cast.Correspondingly, of the invention A kind of intelligent meeting method is also provided, as shown in figure 5, be a kind of flow chart of intelligent meeting method of the embodiment of the present invention, including Following steps:

Step 501, the voice data of real-time reception spokesman.

The voice data can be Chinese speech, be also possible to other foreign language voices, such as English Phonetics, French voice Deng.

Step 502, speech recognition is carried out to the voice data, obtains source text.

The identifying processing of voice data can be in local progress, be also possible to enterprising to server by transmission of network Capable, speech recognition can use the prior art, without limitation to this embodiment of the present invention.

Step 503, the source text is translated, obtains version.

Equally, the translation processing of source text can be in local progress, is also possible to through transmission of network to service It is carried out on device, translation processing can use the prior art, without limitation to this embodiment of the present invention.

In embodiments of the present invention, the object language after translation can be by manually presetting, and for real-time reception The voice data of current speaker is referred to as source language data for convenience, in practical applications, can be by artificial Type of original language, such as English, French, Chinese etc. are preset, it can also be by translation system automatic identification, to this present invention Embodiment is without limitation.

Step 504, expression content to be showed is determined.

Step 505, expression file corresponding with the expression content is obtained.

Step 506, the version is shown on the screen, and the expression file and the version are carried out together Step shows.

In above-mentioned steps 504, expression content to be showed can be determined according to various ways, for example, current obtaining When the version of sentence, the confidence level of the version is obtained, it correspondingly, can be to be showed according to confidence level determination First expression content；Or according to the emotional state of spokesman and/or mood etc., determine the second expression content to be showed, tool Body will be described in detail later.

In practical applications, above-mentioned first expression content or the second expression content can be individually determined, it is also available The two.Correspondingly, in step 506, can and the version synchronize show and the first expression content corresponding first Expression file, or and the version synchronize show the second expression file corresponding with the second expression content；Certainly, may be used With by first expression file and second expression file together with the version synchronize show.First expression What file was showed can be virtual image or some expressions of existing smiling face of simultaneous interpretation translation etc.；Second expression file The other tables that can be a virtual figure image or be different from the smiling face that the first expression file is showed showed Feelings etc..

Intelligent meeting method provided in an embodiment of the present invention, using with version synchronize the expression file that shows, significantly It enriches and shows content, improve user's impression.

It is possible to further determining expression content to be showed according to the confidence level of translation and/or spokesman's mood, make with The corresponding expression file of the expression content is mutually echoed with translation, allows to express the emotional feedback to translation result well Perhaps mood when spokesman talks is made up to translation inaccuracy or the case where can not translating by vision, effectively The user experience is improved.

Further, in another embodiment of intelligent meeting method of the present invention, the confidence of version can also be comprehensively considered Degree and source text content determine expression content to be showed.For example, if the confidence level reaches setting value, and detect When in source text including specific word, then triggers the dynamic effect of painted eggshell or give the dynamic effect feedback of affirmative, it is aobvious that content can be improved The interest and vividness shown, can also promote the attention of personnel participating in the meeting.For another example, if the confidence level is lower than the setting Value illustrates that the readability of version is poor, whether detects specific word at this time, and dynamic effect state, which is still shown, sweats, can't bear The states such as direct-view, anxiety, to dissolve the undesirable embarrassment of translation.Specifically, can detecte in the source text whether include Specific word obtains testing result, and then determines expression content to be showed according to the confidence level and the testing result.In this way Not only vision can be carried out to translation result inaccuracy or the case where can not translating to make up, but also by synchronous with version Concern of the dynamic effect guidance user of display to Hot Contents, further the user experience is improved.Certainly, in practical applications, Expression content to be showed can be individually determined according to the testing result, without limitation to this embodiment of the present invention.

As shown in fig. 6, being another flow chart of intelligent meeting method of the embodiment of the present invention, comprising the following steps:

Step 601, the voice data of real-time reception spokesman and the face-image of the spokesman is acquired.

Step 602, the voice data is identified, obtains source text.

Step 603, the source text is translated, obtains version.

Step 604, according to the face-image, the emotional state and/or mood of the spokesman are determined.

It can determine spokesman emotional state and/or mood by image recognition, analysis, can specifically use existing The relevant technologies.

Step 605, the second expression content to be showed is determined according to the emotional state of the spokesman and/or mood.

Step 606, the second expression file corresponding with the second expression content is obtained.

Step 607, show the version on the screen, and by second expression file and the version into Row, which synchronizes, to be showed.

What second expression file was showed can be a virtual figure image or is different from the first expression Other expressions for the smiling face that file is showed etc. specifically can provide corresponding expression in conjunction with the emotional state of spokesman and show. Such as the virtual figure image of speaker is provided in meeting-place screen, in conjunction with the expression of speaker, virtual portrait shows corresponding table Feelings etc. keep meeting more lively.

It should be noted that the identification and translation of the discriminance analysis of above-mentioned spokesman's face-image and spokesman's voice data It is synchronous progress, i.e., above-mentioned steps 602 are synchronous to step 605 with step 603 and respectively independently carry out.

In practical applications, it can also be determined and be sent out according to recognition result by carrying out tone identification to the voice data Say human feelings thread.Correspondingly, the second expression content to be showed is determined according to spokesman's mood.

Intelligent meeting method provided in an embodiment of the present invention surrounds the design concept of " do not hear, understand, is intelligent, is interesting ", will Simultaneous interpretation product is made up as third party visual angle by deficiency of the vision to simultaneous interpretation product, such as translation inaccuracy or nothing The case where method is translated, provides " sweating ", " anxiety ", " embarrassment " and other effects, by corresponding emotional feedback, reception and registration " not hearing ", The state of " not understanding ", it is virtual for allowing user to perceive simultaneous interpretation product although, but be have it is ideological, and then promoted User experience.Further, also corresponding dynamic effect can be provided, by quasi- according to the speech content and/or facial expression of spokesman The processing of peopleization builds the atmosphere interacted with audience close friend, can effectively promote the attention, enthusiasm and participation of spectators Degree.

In addition, in another embodiment of intelligent meeting method of the present invention, can also comprehensively consider spokesman's emotional state and/ Or whether comprising specific word the second expression content to be showed is determined in mood and source text content, that is, source text, The content and dynamic effect that corresponding second expression file can be made to show are richer.

In intelligent meeting method the various embodiments described above of the present invention, also it can show on the screen beginning when meeting starts Dynamic effect；And/or when meeting adjourned, show the dynamic effect of end on the screen.For example, when meeting starts, show that look around meeting existing The dynamic effect of field, such as the video effect interacted with live personnel participating in the meeting.Meeting beginning and end can pass through detection source text In whether comprising indicate meeting start and/or indicate the word that meeting adjourned；If it is, determining meeting starts and/or meeting Terminate.In addition, in practical applications, it can also be by the source text and the version simultaneous display.It, can be in display The corresponding display of up and down or left and right, or the different zones being displayed on the screen, without limitation to this embodiment of the present invention.

Correspondingly, the embodiment of the present invention also provides a kind of intelligent meeting device, as shown in fig. 7, being a kind of knot of the device Structure schematic diagram.

In this embodiment, the intelligent meeting device includes:

Voice acquisition module 701, the voice data for real-time reception spokesman；

Speech recognition module 702 obtains source text for carrying out speech recognition to the voice data；

Translation module 703 obtains version for translating to the source text；

Expression determining module 704, for determining expression content to be showed；

File acquisition module 705, for obtaining expression file corresponding with the expression content；

Display control module 706 is translated for showing the version on the screen, and by the expression file with described Text, which synchronizes, to be showed.

Above-mentioned expression determining module 704 can specifically determine expression content to be showed according to various ways, for example, described Translation module 703 obtains the confidence level of the version when obtaining the version of current statement, correspondingly, the expression Determining module 704 can determine the first expression content to be showed according to the confidence level；Or 704 basis of expression determining module The emotional state of spokesman and/or mood etc. determine the second expression content to be showed, specifically will be described in detail later.

Intelligent meeting device provided in an embodiment of the present invention, using with version synchronize the expression file that shows, significantly It enriches and shows content, improve user's impression.Further, in another embodiment of intelligent meeting device of the present invention, may be used also It include: word detection module (not shown), for whether detecting in the source text comprising specific word, output test result.Phase Ying Di, the expression determining module 704 can comprehensively consider the confidence level of version and source text content is determined wait show The first expression content.For example, if the confidence level reaches setting value, and detect in source text comprising specific word When, then it triggers the dynamic effect of painted eggshell or gives the dynamic effect feedback of affirmative, the interest and vividness that content is shown can be improved, also The attention of personnel participating in the meeting can be promoted.For another example, if the confidence level is lower than the setting value, illustrate the readable of version Property it is poor, whether detect specific word at this time, dynamic effect state, which is still shown, sweats, can't bear the states such as direct-view, anxiety, to dissolve The undesirable embarrassment of translation.Specifically, it whether can detecte in the source text comprising specific word, obtain testing result, into And the first expression content to be showed is determined according to the confidence level and the testing result.It not only can be to translation result Inaccuracy carries out vision the case where can not translate and makes up, and by guiding user with the dynamic effect of version simultaneous display Concern to Hot Contents, further the user experience is improved.Certainly, in practical applications, the expression determining module 704 Expression content to be showed can be individually determined according to the testing result, without limitation to this embodiment of the present invention.

As shown in figure 8, being another structural schematic diagram of intelligent meeting device of the embodiment of the present invention.

Difference with embodiment illustrated in fig. 7 is, in this embodiment, described device further include: image capture module 707 With image analysis module 708.Wherein:

Image capture module 707 is used to acquire the face-image of the spokesman；

Image analysis module 708 is used for the face-image acquired according to described image acquisition module 707, determines the speech The emotional state and/or mood of people, image analysis can use existing the relevant technologies, not limit this embodiment of the present invention It is fixed.

Correspondingly, in this embodiment, the expression determining module 704 can be according to the emotional state of the spokesman And/or mood determines the second expression content to be showed.The display control module 706 shows the translation text on the screen This, and second expression file is synchronized with the version and is showed.What second expression file was showed can To be that a virtual figure image is either different from other expressions of smiling face etc. that the first expression file is showed, specifically may be used It is shown with combining the expression of spokesman to provide corresponding expression.Such as the virtual figure image of speaker is provided in meeting-place screen, In conjunction with the expression of speaker, virtual portrait shows corresponding expression etc., keeps meeting more lively.Showing the virtual portrait When, the display of above-mentioned dynamic effect state can be stopped, certainly, the two can also be shown, without limitation to this embodiment of the present invention.

In another embodiment of intelligent meeting device of the present invention, setting 707 He of described image acquisition module can also not had to Image analysis module 708, but a mood analysis module (not shown) is set, for carrying out tone knowledge to the voice data Not, the mood of spokesman is determined according to recognition result.Correspondingly, the expression determining module 704 can be according to the mood Spokesman's mood that analysis module is analyzed determines the second expression content to be showed.

In addition, the expression determining module 704 can also be according to institute in another embodiment of intelligent meeting device of the present invention The confidence level for stating the version that translation module 703 obtains determines the first expression content to be showed, and according to described image point The emotional state and/or mood for analysing spokesman described in module 708 determine the second expression content to be showed.Correspondingly, described aobvious Show that control module 706 can control the first expression file corresponding with the first expression content and the second expression content and the second table Feelings file and version synchronize show.

Intelligent meeting device provided in an embodiment of the present invention shows corresponding move by synchronizing when version is shown Effect, can not only reduce the not sufficiently effective bring visual impact of translation of simultaneous interpretation product, but also can be according to the speech of spokesman Content and/or facial expression provide corresponding dynamic effect by the processing to personalize and build the atmosphere interacted with audience close friend It encloses, can effectively promote the attention, enthusiasm and participation of spectators.

In addition, in another embodiment of intelligent meeting device of the present invention, can also comprehensively consider spokesman's emotional state and/ Or whether comprising specific word the second expression content to be showed is determined in mood and source text content, that is, source text, The content and dynamic effect that corresponding second expression file can be made to show are richer.

In addition, in the above embodiments, the display control module 706 can also be opened up on the screen when meeting starts Now start dynamic effect；And/or when meeting adjourned, show the dynamic effect of end on the screen.

Whether meeting beginning and end can be detected in source text by the hot word detection module comprising indicating meeting Start and/or indicate the word that meeting adjourned；If it is, the hot word detection module is sent out to the display control module 706 Meeting is sent to start and/or meeting adjourned trigger signal, so that the display control module 706 will indicate that meeting starts and/or ties The expression file of beam shows on the screen.

About the device in above-described embodiment, wherein modules execute the concrete mode of operation in related this method Embodiment in be described in detail, no detailed explanation will be given here.

Using intelligent meeting method and device provided in an embodiment of the present invention, simultaneous interpretation product can be made as spokesman and seen A tie between crowd, attitude take the initiative when in face of the different situation in scene, actively, positive provide clearly anti- Feedback dissolves embarrassment in the meeting for having emergency situations, makes something perfect even more perfect in the meeting of not emergency situations.Embody simultaneous interpretation product The demonstration tool of simultaneous interpretation subtitle is not only preferably to serve even more with the artificial intelligence product of ideology and empathy User.Moreover, dynamic effect can be determined by a variety of different dimensions, for example, the concrete condition at scene, spokesman is current drills Say that the score value of content and simultaneous interpretation degree of translation confidence determines.In addition, can also be carried out according to field condition to expression file by manually Timely flexibly adjustment.

The problems such as in view of server due to hardware configuration, it will usually there is stronger operational capability than end product and deposit Energy storage power.For this purpose, the embodiment of the present invention also provides a kind of intelligent meeting system, the reception of voice data and Subtitle Demonstration are controlled Partial function is placed on subscriber terminal equipment, and the function of voice recognition processing and translation processing is put on the server, clothes are utilized The superpower hardware performance of business device, effectively improves the accuracy of simultaneous interpretation result.In version display, increased by subscriber terminal equipment Add and synchronize dynamic effect display, promotes the visual experience of user.

As shown in figure 9, being a kind of structural schematic diagram of intelligent meeting system of the embodiment of the present invention.

In this embodiment, the system comprises user terminal 71 and servers 72.The user terminal 71 includes: language Sound acquisition module, expression determining module, file acquisition module and display control module.The server 72 includes: that voice is known Other module, translation module.

Above-mentioned each module is identical as each module in the intelligent meeting device of the present invention of front, is not described in detail herein.

Other than above-mentioned each module, it is additionally provided with first communication module in user terminal 71, in server 72 also It is provided with second communication module, the first communication module and second communication module carry out data transmission, specifically, described first The voice data of the received spokesman of the voice acquisition module is sent server by communication module；Second in server 72 After communication module receives the voice data, the voice data is identified by the speech recognition module, obtains original Text；The source text is translated by the translation module, obtains version.The second communication module is by institute State source text, version is sent to user terminal 71.First communication module in user terminal 71 receives above- mentioned information Afterwards, the version is sent to display control module；The display control module shows the version on the screen, And the expression file that the file acquisition module obtains is synchronized with the version and is showed.

Above-mentioned first communication module and second communication module can be carried out data transmission by wireless network or cable network. Certainly, application demand higher for requirement of real-time, it is also necessary to guarantee the reliability and transmission speed of network transmission.

In practical applications, the expression determining module can specifically determine in expression to be showed according to various ways Hold, concrete mode can refer to the explanation in each embodiment of intelligent meeting device of the present invention of front, and details are not described herein.

In addition, can also further comprise in the user terminal 71 in another embodiment of intelligent meeting system of the present invention: Word detection module, for whether detecting in the source text comprising specific word, output test result.Correspondingly, the expression Determining module can determine expression content to be showed according to the testing result that the confidence level and institute's predicate detection module export.

In addition, the display control module can also show on the screen when meeting starts and start dynamic effect；And/or in meeting At the end of view, show the dynamic effect of end on the screen.Can specifically be detected by institute's predicate detection module in the source text whether Comprising indicating that meeting starts and/or indicate the word that meeting adjourned；If it is, sending meeting to the display control module Begin and/or meeting adjourned trigger signal.

In addition, can also further be wrapped in the user terminal 71 in another embodiment of intelligent meeting system of the present invention Include following module:

Image capture module, for acquiring the face-image of the spokesman；

Image analysis module, the face-image for being acquired according to described image acquisition module, determines the spokesman's Emotional state and/or mood.

Correspondingly, the expression determining module can be determined according to the emotional state and/or mood of the spokesman wait open up The second existing expression content.

In addition, can also further be wrapped in the user terminal 71 in another embodiment of intelligent meeting system of the present invention Mood analysis module (not shown) is included, for carrying out tone identification to the voice data, spokesman is determined according to recognition result Mood.Correspondingly, the expression determining module can be in the spokesman's mood analyzed according to the mood analysis module Determine the second expression content to be showed.

Using intelligent meeting system provided in an embodiment of the present invention, translation quality, Er Qieli not only can be preferably improved With expression file make up translation there are still deficiency, promoted user experience.Further, it also can produce and interact with live audience Effect enlivens meeting-place atmosphere.

Figure 10 is a kind of frame of device 800 for simultaneous interpretation caption presentation method shown according to an exemplary embodiment Figure.For example, device 800 can be mobile phone, computer, digital broadcasting terminal, messaging device, game console put down Panel device, Medical Devices, body-building equipment, personal digital assistant etc..

Referring to Fig.1 0, device 800 may include following one or more components: processing component 802, memory 804, power supply Component 806, multimedia component 808, audio component 810, the interface 812 of input/output (I/O), sensor module 814, and Communication component 816.

The integrated operation of the usual control device 800 of processing component 802, such as with display, telephone call, data communication, phase Machine operation and record operate associated operation.Processing element 802 may include that one or more processors 820 refer to execute It enables, to perform all or part of the steps of the methods described above.In addition, processing component 802 may include one or more modules, just Interaction between processing component 802 and other assemblies.For example, processing component 802 may include multi-media module, it is more to facilitate Interaction between media component 808 and processing component 802.

Memory 804 is configured as storing various types of data to support the operation in equipment 800.These data are shown Example includes the instruction of any application or method for operating on device 800, contact data, and telephone book data disappears Breath, picture, video etc..Memory 804 can be by any kind of volatibility or non-volatile memory device or their group It closes and realizes, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM) is erasable to compile Journey read-only memory (EPROM), programmable read only memory (PROM), read-only memory (ROM), magnetic memory, flash Device, disk or CD.

Electric power assembly 806 provides electric power for the various assemblies of device 800.Electric power assembly 806 may include power management system System, one or more power supplys and other with for device 800 generate, manage, and distribute the associated component of electric power.

Multimedia component 808 includes the screen of one output interface of offer between described device 800 and user.One In a little embodiments, screen may include liquid crystal display (LCD) and touch panel (TP).If screen includes touch panel, screen Curtain may be implemented as touch screen, to receive input signal from the user.Touch panel includes one or more touch sensings Device is to sense the gesture on touch, slide, and touch panel.The touch sensor can not only sense touch or sliding action Boundary, but also detect duration and pressure associated with the touch or slide operation.In some embodiments, more matchmakers Body component 808 includes a front camera and/or rear camera.When equipment 800 is in operation mode, such as screening-mode or When video mode, front camera and/or rear camera can receive external multi-medium data.Each front camera and Rear camera can be a fixed optical lens system or have focusing and optical zoom capabilities.

Audio component 810 is configured as output and/or input audio signal.For example, audio component 810 includes a Mike Wind (MIC), when device 800 is in operation mode, when such as call mode, recording mode, and voice recognition mode, microphone is matched It is set to reception external audio signal.The received audio signal can be further stored in memory 804 or via communication set Part 816 is sent.In some embodiments, audio component 810 further includes a loudspeaker, is used for output audio signal.

I/O interface 812 provides interface between processing component 802 and peripheral interface module, and above-mentioned peripheral interface module can To be keyboard, click wheel, button etc..These buttons may include, but are not limited to: home button, volume button, start button and lock Determine button.

Sensor module 814 includes one or more sensors, and the state for providing various aspects for device 800 is commented Estimate.For example, sensor module 814 can detecte the state that opens/closes of equipment 800, and the relative positioning of component, for example, it is described Component is the display and keypad of device 800, and sensor module 814 can be with 800 1 components of detection device 800 or device Position change, the existence or non-existence that user contacts with device 800,800 orientation of device or acceleration/deceleration and device 800 Temperature change.Sensor module 814 may include proximity sensor, be configured to detect without any physical contact Presence of nearby objects.Sensor module 814 can also include optical sensor, such as CMOS or ccd image sensor, at As being used in application.In some embodiments, which can also include acceleration transducer, gyro sensors Device, Magnetic Sensor, pressure sensor or temperature sensor.

Communication component 816 is configured to facilitate the communication of wired or wireless way between device 800 and other equipment.Device 800 can access the wireless network based on communication standard, such as WiFi, 2G or 3G or their combination.In an exemplary implementation In example, communication component 816 receives broadcast singal or broadcast related information from external broadcasting management system via broadcast channel. In one exemplary embodiment, the communication component 816 further includes near-field communication (NFC) module, to promote short range communication.Example Such as, NFC module can be based on radio frequency identification (RFID) technology, Infrared Data Association (IrDA) technology, ultra wide band (UWB) technology, Bluetooth (BT) technology and other technologies are realized.

In the exemplary embodiment, device 800 can be believed by one or more application specific integrated circuit (ASIC), number Number processor (DSP), digital signal processing appts (DSPD), programmable logic device (PLD), field programmable gate array (FPGA), controller, microcontroller, microprocessor or other electronic components are realized, for executing the above method.

In the exemplary embodiment, a kind of non-transitorycomputer readable storage medium including instruction, example are additionally provided It such as include the memory 804 of instruction, above-metioned instruction can be completed above-mentioned key by the execution of the processor 820 of device 800, and accidentally touching is entangled Wrong method.For example, the non-transitorycomputer readable storage medium can be ROM, random access memory (RAM), CD- ROM, tape, floppy disk and optical data storage devices etc..

The embodiment of the present invention also provides a kind of non-transitorycomputer readable storage medium, the finger in the storage medium Enable when being executed by the processor of mobile terminal so that mobile terminal be able to carry out part in aforementioned present invention embodiment of the method or Overall Steps.

Figure 11 is the structural schematic diagram of server in the embodiment of the present invention.The server 1900 can be different because of configuration or performance And generate bigger difference, may include one or more central processing units (Central Processing Units, CPU) 1922 (for example, one or more processors) and memory 1932, one or more storage application programs 1942 or data 1944 storage medium 1930 (such as one or more mass memory units).Wherein, memory 1932 It can be of short duration storage or persistent storage with storage medium 1930.Be stored in storage medium 1930 program may include one or More than one module (diagram does not mark), each module may include to the series of instructions operation in server.Further Ground, central processing unit 1922 can be set to communicate with storage medium 1930, and storage medium 1930 is executed on server 1900 In series of instructions operation.

Server 1900 can also include one or more power supplys 1926, one or more wired or wireless nets Network interface 1950, one or more input/output interfaces 1958, one or more keyboards 1956, and/or, one or More than one operating system 1941, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM Etc..

The embodiment of the present invention also provides a kind of non-transitorycomputer readable storage medium, the finger in the storage medium Enable by device processor execute when, enable a device to execute aforementioned present invention embodiment of the method in some or all of step Suddenly.

Those skilled in the art after considering the specification and implementing the invention disclosed here, will readily occur to of the invention its Its embodiment.The present invention is directed to cover any variations, uses, or adaptations of the invention, these modifications, purposes or Person's adaptive change follows general principle of the invention and including the undocumented common knowledge in the art of the disclosure Or conventional techniques.The description and examples are only to be considered as illustrative, and true scope and spirit of the invention are by following Claim is pointed out.

It should be understood that the present invention is not limited to the precise structure already described above and shown in the accompanying drawings, and And various modifications and changes may be made without departing from the scope thereof.The scope of the present invention is limited only by the attached claims

The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all in spirit of the invention and Within principle, any modification, equivalent replacement, improvement and so on be should all be included in the protection scope of the present invention.

Claims

1. a kind of simultaneous interpretation caption presentation method, which is characterized in that the described method includes:

Obtain the version of current statement；

Determine expression content to be showed；

Obtain expression file corresponding with the expression content；

2. the method according to claim 1, wherein the method also includes:

Obtain the confidence level of the version；

Determination expression content to be showed includes:

The first expression content to be showed is determined according to the confidence level.

3. the method according to claim 1, wherein the method also includes:

Tone identification is carried out to the current statement, spokesman's mood is determined according to recognition result；

Determination expression content to be showed includes:

The second expression content to be showed is determined according to spokesman's mood.

4. a kind of simultaneous interpretation subtitling display equipment, which is characterized in that described device includes:

Translation obtains module, for obtaining the version of current statement；

Display control module, for showing the version on the screen, and by the expression file and the version It synchronizes and shows.

5. a kind of intelligent meeting method, which is characterized in that the described method includes:

The voice data of real-time reception spokesman；

Speech recognition is carried out to the voice data, obtains source text；

The source text is translated, version is obtained；

Determine expression content to be showed；

Obtain expression file corresponding with the expression content；

6. according to the method described in claim 5, it is characterized in that, the method also includes:

Obtain the confidence level of the version；

Determination expression content to be showed includes:

7. according to the method described in claim 5, it is characterized in that, the method also includes:

Acquire the face-image of the spokesman；

According to the face-image, the emotional state and/or mood of the spokesman are determined；

Determination expression content to be showed includes: to be determined according to the emotional state and/or mood of the spokesman wait show The second expression content.

8. according to the described in any item methods of claim 5 to 7, which is characterized in that the method also includes:

It whether detects in the source text comprising specific word, obtains testing result；

Determination expression content to be showed includes:

Expression content to be showed is determined according to the testing result；Or it is determined according to the confidence level and the testing result First expression content to be showed；Or the second expression to be showed is determined according to spokesman's mood and the testing result Content.

9. a kind of intelligent meeting device, feature is in described device includes:

Voice acquisition module, the voice data for real-time reception spokesman；

Translation module obtains version for translating to the source text；

10. a kind of intelligent meeting system, feature is in the system comprises user terminal and servers；The user terminal packet It includes: voice acquisition module, first communication module, expression determining module, file acquisition module and display control module；It is described Server includes: speech recognition module, translation module and second communication module；

The first communication module, for sending server for the voice data；

The translation module obtains version for translating to the source text；

The display control module, for showing the version on the screen, and by the expression file and the translation Text, which synchronizes, to be showed.