CN104219459A

CN104219459A - Video language translation method and system and intelligent display device

Info

Publication number: CN104219459A
Application number: CN201410522251.6A
Authority: CN
Inventors: 张卓立; 张清华; 丁伯炉; 赵冀杨
Original assignee: Shanghai Moruan Communication Technology Co Ltd
Current assignee: Shanghai Moruan Communication Technology Co Ltd
Priority date: 2014-09-30
Filing date: 2014-09-30
Publication date: 2014-12-17

Abstract

The invention discloses a video language translation method, a video language translation system and an intelligent display device. The video language translation method comprises the following steps of: obtaining current video which includes a current image frame and/or current language; automatically obtaining target language; recognizing characters and/or voice of the current video, playing the video when the recognizing result shows that the target language is met; translating the characters and/or voice of the current image frame, and adding the translating result into the current video; playing the video. According to the video language translation method and system and the intelligent display device provided by the invention, non-native language programs or video can be converted into the programs or video with the language similar to the native language for audiences, thus helping the audiences overcome the language barrier, and improving view experience when the audiences watch the programs or video through networks.

Description

Video language interpretation method, system and intelligent display device

Technical field

The present invention relates to a kind of video language interpretation method, system and intelligent display device.

Background technology

Along with popularizing of intelligent television and network, increasing televiewer can select to watch the video on network.Than traditional TV channel, the TV programme that can watch on network or the kind of video and quantity are all far away more than the former.Magnanimity program or video on network, not only comprised and the program of national this area also comprised a large amount of external programs.This has just brought a problem, and the video language play may not be mother tongue or spectators' mother tongue, and aphasis can affect spectators' viewing experience greatly.

Obviously, if can manage, in progress non-mother tongue program and video, necessarily process, to help spectators to overcome aphasis, can greatly promote spectators' viewing experience, more effectively utilize the resource on the Internet.

Summary of the invention

The technical problem to be solved in the present invention is to be difficult to overcome aphasis thereby the not good defect of perception in order to overcome in prior art when spectators or user watch program or video by network, proposes a kind of video language interpretation method, system and intelligent display device.

The present invention solves above-mentioned technical problem by following technical proposals:

The invention provides a kind of video language interpretation method, it is characterized in that, comprise the following steps:

S ₁, a display device obtains current video, current video comprises current image frame and/or current speech;

S ₂, this display device communicates by letter with a terminal equipment, and obtain and language be set as object language on this terminal equipment;

S ₃, this display device carries out word identification and/or speech recognition to current video, at recognition result for meet this object language in the situation that, using current video as video to be broadcast and perform step S ₅, in the situation that recognition result is not meet default object language to perform step S ₄;

S ₄, this display device is translated as this object language by the word in current image frame and/or current speech, and translation result is added in current video, to form video to be broadcast;

S ₅, this display device plays video to be broadcast

It will be appreciated by those skilled in the art that and adopt any equipment to carry out the broadcasting of video, in fact all need first to obtain video, and then play.When both interval very in short-term, be just equivalent to instant broadcasting for the audience.Above-mentioned current video is the video that equipment obtains, and before video playback or in playing process, needs to carry out above-mentioned steps S ₂～S ₄.

Step S wherein ₂to the automatic acquisition of object language, can be that display language on the terminal equipment arranging according to user obtains automatically, generally think that user arranges on terminal equipment language is spectators' mother tongue, be also object language.Here said terminal equipment can be user's the mobile terminal such as smart mobile phone, panel computer.Through above-mentioned steps S ₄after processing, at least guaranteed in the situation that the video obtaining and mother tongue are not inconsistent, the content that translation is obtained is additional in video to be play, thereby helps spectators to be crossed at least to a certain extent aphasis more to understand the content of broadcasting, improves its viewing experience.

Preferably, current video comprises current image frame and current speech, this step S ₄comprise the following steps:

S ₄₁, current speech is translated as to this object language to form voice to be broadcast;

S ₄₂, by the character translation in current image frame, be this object language, and the word that translation is obtained be added into corresponding former word in current image frame present position around, to form picture frame to be broadcast;

S ₄₃, according to the sequential of voice to be broadcast and picture frame to be broadcast, the two is synthesized to video to be broadcast.

Preferably, current video has a label;

Step S ₄₁for: this display device is searched for the object language voice packet of current video on network according to this label, if search, adopt object language voice packet coupling current speech, to obtain voice to be broadcast, if search, current speech is not translated as to this object language to form voice to be broadcast;

Step S ₄₂for: this display device is searched for the object language captions bag of current video on network according to this label, if search, adopt the word in object language captions bag coupling current image frame, to form picture frame to be broadcast, if do not search, by the character translation in current image frame, be this object language, and the word that translation is obtained be added into corresponding former word in current image frame present position around, to form picture frame to be broadcast.

The label here can be the combination of the information such as the title, network resources address of current video or much information, and it can reflect the feature of current video, thereby is convenient to carry out according to it search of voice packet or captions bag.

The present invention also provides a kind of video language interpretation method, it is characterized in that, comprises the following steps:

S ₂, this display device transfers to a terminal equipment by current video;

S ₃, this terminal equipment read self language is set as object language, and current video is carried out to word identification and/or speech recognition, in the situation that recognition result is to meet default object language, using current video as video to be broadcast and perform step S ₅, in the situation that recognition result is not meet default object language to perform step S ₄;

S ₄, this terminal equipment is translated as this object language by the word in current image frame and/or current speech, and translation result is added in current video, to form video to be broadcast, then video to be broadcast is sent to this display device;

S ₅, this display device plays video to be broadcast.

Should be understood that, although adopt said method to increase the transfer of data between display device and terminal equipment, greatly reduce the performance requirement for display device itself simultaneously.Even if video playback apparatus non intelligentization of display device, comparatively old-fashioned herein, as long as it has ability to play and data transmission capabilities, and do not need data-handling capacity, can realize equally non-mother tongue video is converted to a certain extent to the function for the video of mother tongue.Need the terminal equipment of certain data-handling capacity can adopt smart mobile phone or wearable device etc.

Transfer of data can adopt wired form, such as the transmission such as USB, MHL interface (being Mobile High-Definition Link, is mobile terminal high-definition audio and video standard interface), also can utilize Radio Transmission Technology, such as WIFI.

S ₄₁, this terminal equipment is translated as this object language to form voice to be broadcast by current speech;

S ₄₂, this terminal equipment is this object language by the character translation in current image frame, and the word that translation is obtained be added into corresponding former word in current image frame present position around, to form picture frame to be broadcast;

S ₄₃, this terminal equipment synthesizes video to be broadcast according to the sequential of voice to be broadcast and picture frame to be broadcast by the two;

S ₄₄, this terminal equipment is sent to this display device by video to be broadcast.

Preferably, current video has a label;

Step S ₄₁for: this terminal equipment is searched for the object language voice packet of current video on network according to this label, if search, adopt object language voice packet coupling current speech, to obtain voice to be broadcast, if search, current speech is not translated as to this object language to form voice to be broadcast;

Step S ₄₂for: this terminal equipment is searched for the object language captions bag of current video on network according to this label, if search, adopt the word in object language captions bag coupling current image frame, to form picture frame to be broadcast, if do not search, by the character translation in current image frame, be this object language, and the word that translation is obtained be added into corresponding former word in current image frame present position around, to form picture frame to be broadcast.

The present invention also provides a kind of intelligent display device, comprises a video acquiring module, an object language module, an identification module and a playing module.

This video acquiring module is used for obtaining current video, and current video comprises current image frame and/or current speech.This object language module, for communicating by letter with terminal equipment, is obtained the language that arranges on terminal equipment, is defaulted as object language.

This identification module is for carrying out word identification and/or speech recognition to current video, at recognition result for meet this object language in the situation that, current video is sent to a playing module as video to be broadcast, in the situation that recognition result is not meet default object language, word in current image frame and/or current speech are translated as to this object language, and translation result is added in current video, to form video to be broadcast.This playing module is for playing video to be broadcast.

Preferably, current video comprises current image frame and current speech, and this identification module comprises an Audio Processing Unit, a graphics processing unit and a synthesis unit.

Wherein, this Audio Processing Unit is for being translated as this object language to form voice to be broadcast by current speech, it is this object language by the character translation of current image frame that this graphics processing unit is used for, and the word that translation is obtained is added into corresponding former word, and in current image frame, present position is around, to form picture frame to be broadcast, this synthesis unit is for synthesizing video to be broadcast according to the sequential of voice to be broadcast and picture frame to be broadcast by the two.

Preferably, current video has a label, and this identification module also comprises a web search unit.This web search unit for searching for the object language voice packet of current video on network according to this label, if search, adopt object language voice packet coupling current speech, to obtain voice to be broadcast, and according to this label, on network, search for the object language captions bag of current video, if search, adopt the word in object language captions bag coupling current image frame, to form picture frame to be broadcast, if search object language voice packet, do not enable this Audio Processing Unit, if search object language captions Bao Ze, do not enable this graphics processing unit, if searching, object language voice packet and object language captions Bao Jun enable this synthesis unit.

The present invention also provides a kind of video translation system, comprises a display device and a terminal equipment.

This display device comprises a video acquiring module, a transport module and a playing module, this video acquiring module is used for obtaining current video, current video comprises current image frame and/or current speech, this transport module is for transferring to current video this terminal equipment and receive video to be broadcast from this terminal equipment, and this playing module is used for playing video to be broadcast.

This terminal equipment comprises a transport module and an identification module, this transport module be used for and this display device between video receiving, this identification module is for carrying out word identification and/or speech recognition to current video, at recognition result for meet this object language in the situation that, current video is sent to a playing module as video to be broadcast, in the situation that recognition result is not meet default object language, word in current image frame and/or current speech are translated as to this object language, and translation result is added in current video, to form video to be broadcast.

Preferably, this display device can be intelligent television, and this terminal equipment can be smart mobile phone or wearable device.

This Audio Processing Unit is for being translated as this object language to form voice to be broadcast by current speech, it is this object language by the character translation of current image frame that this graphics processing unit is used for, and the word that translation is obtained is added into corresponding former word, and in current image frame, present position is around, to form picture frame to be broadcast, this synthesis unit is for synthesizing video to be broadcast according to the sequential of voice to be broadcast and picture frame to be broadcast by the two.

Meeting on the basis of this area general knowledge, above-mentioned each optimum condition, can combination in any, obtains the preferred embodiments of the invention.

Positive progressive effect of the present invention is:

Video language interpretation method of the present invention, system and intelligent display device, can be for spectators be by non-mother tongue program or intimate program or the video that is similar to mother tongue that be immediately converted to of video, help spectators to overcome aphasis, improve by the viewing experience of network or television-viewing program or video.

Accompanying drawing explanation

Fig. 1 is the flow chart of the video language interpretation method of the embodiment of the present invention 1.

Fig. 2 is the flow chart of the video language interpretation method of the embodiment of the present invention 2.

Fig. 3 is the schematic diagram of the intelligent display device of the embodiment of the present invention 3.

Fig. 4 is the schematic diagram of the video translation system of the embodiment of the present invention 3.

Embodiment

Below in conjunction with accompanying drawing, provide preferred embodiment of the present invention, to describe technical scheme of the present invention in detail, but therefore do not limit the present invention among described scope of embodiments.

Embodiment 1

As shown in Figure 1, the video language interpretation method of the present embodiment comprises the following steps:

S ₁, obtain current video, current video comprises current image frame and current speech, current video also has a label;

S ₂, automatic acquisition user terminal equipment language is set as object language;

S ₃, current video is carried out to word identification and speech recognition, at recognition result for meet this object language in the situation that, using current video as video to be broadcast and perform step S ₅, in the situation that recognition result is not meet default object language to perform step S ₄₁;

S ₄₁, according to this label, on network, search for the object language voice packet of current video, if search, adopt object language voice packet coupling current speech, to obtain voice to be broadcast, if search, current speech is not translated as to this object language to form voice to be broadcast;

S ₄₂, according to this label, on network, search for the object language captions bag of current video, if search, adopt the word in object language captions bag coupling current image frame, to form picture frame to be broadcast, if do not search, by the character translation in current image frame, be this object language, and the word that translation is obtained be added into corresponding former word in current image frame present position around, to form picture frame to be broadcast;

S ₄₃, according to the sequential of voice to be broadcast and picture frame to be broadcast, the two is synthesized to video to be broadcast;

S ₅, play video to be broadcast.

In the present embodiment, the network resources address that this label is current video, this object language is Chinese.And current video comprises some two field picture frames.

Embodiment 2

Shown in figure 2, the video language interpretation method of the present embodiment comprises the following steps:

S ₁, a display device obtains current video, current video comprises current image frame and current speech, and current video also has a label;

S ₂, this display device transfers to a terminal equipment by current video;

S ₃, this terminal equipment carries out word identification and speech recognition to current video, in the situation that recognition result is to meet default object language, using current video as video to be broadcast and perform step S ₅, in the situation that recognition result is not meet default object language to perform step S ₄₁;

S ₄₁, this terminal equipment searches for the object language voice packet of current video on network according to this label, if search, adopt object language voice packet coupling current speech, to obtain voice to be broadcast, if search, current speech is not translated as to this object language to form voice to be broadcast;

S ₄₂, this terminal equipment searches for the object language captions bag of current video on network according to this label, if search, adopt the word in object language captions bag coupling current image frame, to form picture frame to be broadcast, if do not search, by the character translation in current image frame, be this object language, and the word that translation is obtained be added into corresponding former word in current image frame present position around, to form picture frame to be broadcast;

S ₄₄, this terminal equipment is sent to this display device by video to be broadcast;

S ₅, this display device plays video to be broadcast.

Although the present embodiment adopts said method to increase the transfer of data between display device and terminal equipment, greatly reduce the performance requirement for display device itself simultaneously.And terminal equipment in the present embodiment is a wearing equipment.

Embodiment 3

Shown in figure 3, the intelligent display device of the present embodiment comprises a video acquiring module 1, an object language module 2, an identification module 3 and a playing module 4.

This video acquiring module is used for obtaining current video, and current video comprises current image frame and current speech, and has a label.This object language module is for communicating by letter with terminal equipment, automatic acquisition terminal equipment language is set, using it as object language.

This identification module comprises an Audio Processing Unit, a graphics processing unit, a web search unit and a synthesis unit.

Wherein, this web search unit for searching for the object language voice packet of current video on network according to this label, if search, adopt object language voice packet coupling current speech, to obtain voice to be broadcast, and according to this label, on network, search for the object language captions bag of current video, if search, adopt the word in object language captions bag coupling current image frame, to form picture frame to be broadcast, if search object language voice packet, do not enable this Audio Processing Unit, if search object language captions Bao Ze, do not enable this graphics processing unit, if searching, object language voice packet and object language captions Bao Jun enable this synthesis unit.

This Audio Processing Unit is for being translated as this object language to form voice to be broadcast by current speech, it is this object language by the character translation of current image frame that this graphics processing unit is used for, and the word that translation is obtained is added into corresponding former word, and in current image frame, present position is around, to form picture frame to be broadcast, this synthesis unit is for synthesizing video to be broadcast according to the sequential of voice to be broadcast and picture frame to be broadcast by the two.This playing module is for playing video to be broadcast.

The present embodiment adopts said method to take intelligent display device as main body, has reduced the transfer of data between display device and terminal equipment, but higher to the performance requirement of intelligent display device itself.Intelligent display device in the present embodiment is intelligent television.

Embodiment 4

Shown in figure 4, the video translation system of the present embodiment includes a display device and a terminal equipment.

This display device comprises a video acquiring module 11, a transport module 13 and a playing module 12, and this video acquiring module 11 is for obtaining current video, and current video comprises current image frame and current speech, also has a label.This transport module 13 is for transferring to current video on this terminal equipment and receiving video to be broadcast from this terminal equipment, and this playing module 12 is for playing video to be broadcast.

This terminal equipment comprises a transport module 22 and an identification module 21, this transport module 22 for complete and this display device between video receiving, this identification module 21 is for carrying out word identification and speech recognition to current video, at recognition result for meet this object language in the situation that, current video is sent to a playing module as video to be broadcast, in the situation that recognition result is not meet default object language, word in current image frame and current speech are translated as to this object language, and translation result is added in current video, to form video to be broadcast.

This identification module 21 comprises an Audio Processing Unit, a graphics processing unit, a web search unit and a synthesis unit.

This web search unit for searching for the object language voice packet of current video on network according to this label, if search, adopt object language voice packet coupling current speech, to obtain voice to be broadcast, and according to this label, on network, search for the object language captions bag of current video, if search, adopt the word in object language captions bag coupling current image frame, to form picture frame to be broadcast, if search object language voice packet, do not enable this Audio Processing Unit, if search object language captions Bao Ze, do not enable this graphics processing unit, if searching, object language voice packet and object language captions Bao Jun enable this synthesis unit.

Although more than described the specific embodiment of the present invention, it will be understood by those of skill in the art that these only illustrate, protection scope of the present invention is limited by appended claims.Those skilled in the art is not deviating under the prerequisite of principle of the present invention and essence, can make various changes or modifications to these execution modes, but these changes and modification all fall into protection scope of the present invention.

Claims

1. a video language interpretation method, is characterized in that, comprises the following steps:

S ₅, this display device plays video to be broadcast.

2. video language interpretation method as claimed in claim 1, is characterized in that, current video comprises current image frame and current speech, this step S ₄comprise the following steps:

S ₄₁, this display device is translated as this object language to form voice to be broadcast by current speech;

S ₄₂, this display device is this object language by the character translation in current image frame, and the word that translation is obtained be added into corresponding former word in current image frame present position around, to form picture frame to be broadcast;

S ₄₃, this display device synthesizes video to be broadcast according to the sequential of voice to be broadcast and picture frame to be broadcast by the two.

3. video language interpretation method as claimed in claim 2, is characterized in that, current video has a label;

4. a video language interpretation method, is characterized in that, comprises the following steps:

S ₂, this display device transfers to a terminal equipment by current video;

S ₅, this display device plays video to be broadcast.

5. video language interpretation method as claimed in claim 4, is characterized in that, current video comprises current image frame and current speech, this step S ₄comprise the following steps:

6. video language interpretation method as claimed in claim 5, is characterized in that, current video has a label;

7. an intelligent display device, is characterized in that, comprising:

One video acquiring module, for obtaining current video, current video comprises current image frame and/or current speech;

One object language module, for communicating by letter with terminal equipment, and obtains and language is set as object language on terminal equipment;

One identification module, for current video is carried out to word identification and/or speech recognition, at recognition result for meet this object language in the situation that, current video is sent to a playing module as video to be broadcast, in the situation that recognition result is not meet default object language, word in current image frame and/or current speech are translated as to this object language, and translation result is added in current video, to form video to be broadcast;

This playing module, for playing video to be broadcast.

8. intelligent display device as claimed in claim 7, is characterized in that, current video comprises current image frame and current speech, and this identification module comprises an Audio Processing Unit, a graphics processing unit and a synthesis unit;

9. intelligent display device as claimed in claim 8, is characterized in that, current video has a label, and this identification module also comprises a web search unit;

10. a video translation system, is characterized in that, comprises a display device and a terminal equipment;

This display device comprises a video acquiring module, a transport module and a playing module, this video acquiring module is used for obtaining current video, current video comprises current image frame and/or current speech, this transport module is for transferring to current video this terminal equipment and receive video to be broadcast from this terminal equipment, and this playing module is used for playing video to be broadcast:

11. video translation systems as claimed in claim 10, is characterized in that, current video comprises current image frame and current speech, and this identification module comprises an Audio Processing Unit, a graphics processing unit and a synthesis unit;

12. video translation systems as claimed in claim 11, is characterized in that, current video has a label, and this identification module also comprises a web search unit;

13. video translation systems as described in any one in claim 10-12, is characterized in that, this display device is intelligent television, and this terminal equipment is smart mobile phone or wearable device.