WO2018001088A1

WO2018001088A1 - Method and apparatus for presenting communication information, device and set-top box

Info

Publication number: WO2018001088A1
Application number: PCT/CN2017/088109
Authority: WO
Inventors: 李晓君
Original assignee: 中兴通讯股份有限公司
Priority date: 2016-06-30
Filing date: 2017-06-13
Publication date: 2018-01-04
Also published as: CN107566863A

Abstract

Provided in the embodiments of the present invention are a method and an apparatus for presenting communication information, a device and a set-top box. The method comprises: collecting first communication information presented in a first presentation manner; parsing the first communication information, acquiring a data content corresponding to the first communication information, and acquiring second communication information corresponding to the data content; and presenting the second communication information in a second presentation manner. The embodiments of the present invention can realize the transformation of data between any different presentation manners to help people having different needs communicate, for example, an unimpaired user can select a voice presentation manner and a language-impaired user can select a sign language presentation manner, such that different users only need to present the contents they want to communicate in the manner they commonly use, and after the transformation of the data content, both parties communicating can understand each other and communicate conveniently, with improved user experience.

Description

Method, device and device for displaying exchange information, set top box

Technical field

The present invention relates to the field of user communication, and in particular, to a method, device and device for displaying exchange information, and a set top box.

Background technique

In order to facilitate the communication between normal users and language-disabled users, sign language has appeared, but this method requires normal users and language-disabled users to learn more knowledge and reduce the user experience.

Therefore, most of the existing sign language translations are translated by third-party translators. Even when watching TV, third-party translations are encoded into videos for transmission to end users. In practical applications, unless there is a big breaking news. Or a major live broadcast, there will be a sign language translator for translation, and the ordinary TV program is not translated, which causes the hearing-disabled person to watch the program they want to watch.

Summary of the invention

The embodiment of the invention provides a method, a device and a device for displaying exchange information, and a set top box, so as to facilitate daily communication between a normal user and a language disabled user.

On the one hand, it provides a method for displaying information exchange, including:

Collecting first exchange information displayed by the first display manner;

Parsing the first exchange information, acquiring the data content corresponding to the first exchange information, and acquiring the second exchange information corresponding to the data content;

The second exchange information is displayed by the second display.

In one aspect, an exchange information display device is provided, including:

The acquiring module is configured to collect the first exchange information displayed by the first display manner;

The processing module is configured to parse the first exchange information, obtain the data content corresponding to the first exchange information, and acquire second exchange information corresponding to the data content;

The display module is configured to display the second exchange information by using the second display manner.

In another aspect, an exchange information display device is provided, including: an interaction module and a processor, wherein

The interaction module is configured to collect the first exchange information displayed by the first display manner, and output the information to the processor, and further configured to display the second communication information returned by the processor by using the second display manner;

The processor is configured to parse the first exchange information, obtain the data content corresponding to the first exchange information, acquire the second exchange information corresponding to the data content, and transmit the information to the interaction module.

In another aspect, a set top box is provided, including: a sign language database, an interconnected voice module, a sign language conversion module, and a display module, wherein

The voice module is configured to acquire audio data, and the audio data is identified and processed and corrected to be semantically;

The sign language conversion module is configured to match the sign language to be output corresponding to the audio data in the sign language database according to the semantics;

The display module is set to display the sign language to be output.

In another aspect, a computer storage medium is provided, the computer storage medium storing computer executable instructions, and the computer executable instructions being configured to perform the aforementioned communication information presentation method.

Advantageous effects of embodiments of the present invention:

An embodiment of the present invention provides a method for displaying an exchange information, collecting first exchange information displayed by the first display manner, parsing the first exchange information, acquiring data content corresponding to the first exchange information, and acquiring a second corresponding to the data content. The information is exchanged, and the second exchange information is displayed through the second display mode; the data can be converted between any different display modes, so that different people can communicate with each other, for example, the voice display mode can be selected for the normal user, and the language barrier user can be Choose the sign language display method, so that different users only need to exchange the content they need to communicate. It is displayed in a way that, through the conversion of data content, the exchange parties can understand each other's intentions, conduct convenient communication, and enhance the user experience.

DRAWINGS

1 is a flowchart of a method for displaying an exchange information according to a first embodiment of the present invention;

2 is a schematic structural diagram of an exchange information display device according to a third embodiment of the present invention;

3 is a schematic diagram showing the simple structure of a set top box according to a fifth embodiment of the present invention;

4 is a flowchart of a sign language to speech conversion according to a fifth embodiment of the present invention;

FIG. 5 is a flowchart of conversion of a user's voice to a sign language according to a fifth embodiment of the present invention; FIG.

6 is a flow chart showing a conversion of a speech of a television program to a sign language according to a fifth embodiment of the present invention;

FIG. 7 is a schematic diagram showing the specific structure of a set top box according to a fifth embodiment of the present invention.

detailed description

The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is obvious that the described embodiments are only a part of the embodiments of the present invention, but not all embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.

The invention will now be further illustrated by way of specific embodiments in conjunction with the accompanying drawings.

First embodiment:

1 is a flowchart of a method for displaying an exchange information according to a first embodiment of the present invention. As shown in FIG. 1, the method for displaying an exchange information provided in this embodiment includes:

S101: Collect first exchange information displayed by the first display manner;

S102: Analyze the first exchange information, obtain the data content corresponding to the first exchange information, and obtain Second exchange information corresponding to the data content;

S103: Display the second exchange information by using the second display manner.

In some embodiments, the first display manner in the foregoing embodiment includes a voice mode, and the second display mode includes a picture mode.

Collecting the first exchange information displayed by the first display manner includes: collecting the external voice through the voice recognition device, and/or acquiring the first exchange information by collecting the audio channel;

Displaying the second communication information by the second display manner includes: displaying the second communication information in the form of subtitles and/or gestures on the screen.

In some embodiments, the method for displaying the exchange information in the foregoing embodiment further includes: if the plurality of first exchange information are separately collected through the two paths and the above, respectively, respectively, Second exchange of information.

In some embodiments, the method for displaying the exchange information in the above embodiment further includes determining, according to the degree of importance of each of the first exchange information, a screen position of the second exchange information corresponding to each of the first exchange information.

In some embodiments, the first display manner in the foregoing embodiment includes a picture mode, and the second display mode includes a voice mode;

Collecting the first exchange information displayed by the first display manner includes: collecting and collecting external gestures and/or characters through image recognition, and/or acquiring first exchange information by collecting the image channels;

Displaying the second exchange information by the second display means includes: using the speaker, displaying the second exchange information by analog voice mode.

Second embodiment:

The communication information display device provided in this embodiment includes:

In some embodiments, the first display mode includes a voice mode, and the second display mode includes a picture mode; the collection module in the foregoing embodiment is configured to collect external voice through the voice recognition device, and/or, by collecting the audio channel, Acquiring the first exchange information; the display module in the above embodiment is configured to display the second exchange information in the form of subtitles and/or gestures on the screen.

In some embodiments, the display module in the foregoing embodiment is further configured to: if the plurality of first exchange information are separately collected through the two paths and the multiple paths, respectively display the second corresponding to each of the first exchange information by using multiple screens. exchange information.

In some embodiments, the display module in the above embodiment is further configured to determine, according to the importance degree of each of the first exchange information, a screen position of the second exchange information corresponding to each of the first exchange information.

In some embodiments, the first display mode includes a picture mode, and the second display mode includes a voice mode; the acquisition module in the foregoing embodiment is configured to send an external gesture and/or text by image recognition, and/or The channel is collected to obtain the first exchange information; the display module is set to use the speaker to display the second exchange information through the analog voice mode.

Third embodiment:

2 is a schematic structural diagram of an AC information display device according to a third embodiment of the present invention. As shown in FIG. 2, the AC information display device provided in this embodiment includes: an interaction module 21 and a processor 22, where

The interaction module 21 is configured to collect the first exchange information displayed by the first display manner, and output the information to the processor, and further configured to display the second exchange information returned by the processor by using the second display manner;

The processor 22 is configured to parse the first communication information, acquire the data content corresponding to the first communication information, acquire the second communication information corresponding to the data content, and transmit the information to the interaction module.

In some embodiments, the first display mode includes a voice mode, and the second display mode includes a picture mode. The interaction module 21 in the foregoing embodiment is configured to collect external voice through the voice recognition device, and/or collect the audio channel. , getting the first exchange information is also set to draw The second exchange information is displayed in the form of subtitles and/or gestures.

In some embodiments, the interaction module 21 in the foregoing embodiment is further configured to: if the plurality of first exchange information are collected through the two paths and the multiple paths, respectively, the first communication information is respectively displayed through the plurality of screens. Second, exchange information.

In some embodiments, the interaction module 21 in the foregoing embodiment is further configured to determine, according to the importance degree of each of the first exchange information, a screen position of the second exchange information corresponding to each of the first exchange information.

In some embodiments, the first display mode includes a picture mode, and the second display mode includes a voice mode; the interaction module 21 in the foregoing embodiment is configured to send an external gesture and/or text by image recognition, and/or The image channel is collected to obtain the first exchange information; and is also set to use the speaker to display the second exchange information through the analog voice mode.

Fourth embodiment:

The embodiment provides a set top box, including: a sign language database, an interconnected voice module, a sign language conversion module, and a display module, where

The display module is set to display the sign language to be output.

In some embodiments, the display module in the above embodiment is further configured to display the semantics of the audio data for the user to confirm whether it is content that the normal user wants to express.

In some embodiments, the voice module in the above embodiment is configured to respectively obtain audio data of a live television program and audio data sent by a normal person through a microphone.

In some embodiments, the set top box in the above embodiment further includes an image module;

The image module is configured to adopt a gesture gesture of the user, and after correcting the gesture gesture, the image module is transmitted to the sign language conversion module;

The sign language conversion module is configured to match the corresponding semantics to be output in the sign language database according to the processed gesture posture;

The display module is set to display the semantics to be output.

In some embodiments, the display module in the above embodiment is further configured to display a standard gesture gesture corresponding to the collected user gesture gesture for the user to learn.

In practical applications, all the functional modules involved in the foregoing embodiments may be implemented by an editable logic device that is burned into a specific software program, and may be implemented by a processor and a memory.

Fifth embodiment:

The present invention will be further explained in conjunction with specific application scenarios.

In order to make the hearing-disabled person more convenient to watch TV, to solve the communication problem of the normal person and the hearing-disabled person, to increase the happiness and satisfaction of the special group, and to provide a more satisfactory experience for the customer, the present embodiment provides a A scheme for converting sign language and subtitles on a set top box.

The implementation method for performing sign language and subtitle conversion on the set top box provided by this embodiment includes:

Step A: When the television program is being played, the audio channel data of the live program is obtained, and the data is transmitted to the voice recognition module.

Step B: The speech recognition module analyzes and converts into subtitles, then matches the sign language library, and outputs subtitles or sign language to the user.

Step C: When the normal person speaks, the content is transmitted to the voice recognition module through the voice receiving module of the set top box, and the second audio channel is taken, and the voice recognition module performs data analysis processing to convert the voice into subtitles, and simultaneously matches the sign language picture or animation. .

Step D: Simultaneously display voice and subtitles to the hearing-disabled person. When the hearing-language person sees the subtitle or sign language, if the response is made, the content is transmitted to the image recognition module through the image receiving module of the set-top box.

Step E: The image recognition module performs data analysis processing and then compares with the sign language font library, and then Convert to subtitles and show them to normal people.

Step F: The user communication channel and the video playback channel are two independently displayed channels, which are displayed at different positions, and which channel display is relatively active, and is determined according to different scenarios. If the user communicates frequently, the display of sign language and subtitles should be enlarged, otherwise the subtitles of the TV program should be enlarged.

In this embodiment, the set top box includes: a voice acquisition module, a voice recognition module, a voice conversion module, a sign language matching module, a display module, an image recognition module, an image conversion module, and a central control module. among them,

Voice acquisition module: The set-top box audio is divided into multiple channels, and the voice acquisition module can respectively obtain the audio data of the live TV program and the audio data sent by the normal person through the microphone.

Speech recognition module: The audio data is identified and processed and corrected, and recognized as Chinese.

The voice conversion module: combines the voice recognition module to convert the Chinese data into corresponding subtitle data, and combines the sign language matching module to output the corresponding sign language information.

Display module: Display subtitle information and sign language information on the screen.

The image recognition module: acquires gesture gestures of a hearing-disabled person and analyzes gesture gestures.

Image conversion module: combined with the image recognition module, compares with the sign language font library, corrects the gesture posture, and then outputs the text subtitle information.

Sign language matching module: This module is composed of sign language picture animation and sign language font library, both local and network.

Central Control Module: This module handles the logic of each process and is responsible for the algorithm of subtitle and sign language display.

Compared with the existing solution, the set-top box provided by the embodiment has the feature of interaction, and the display does not conflict with the normal broadcast of the TV program. When we design, it is divided into two paths, and one channel exclusively outputs the interaction process. The TV program is transmitted all the way, and the sound of the TV program is also converted into subtitles and transmitted to the user after the speech recognition. The two-way transmission can realize seamless primary and secondary switching, which greatly improves the convenience of the hearing-disabled person.

The implementation method of the mutual conversion of the caption sign language of the present invention will be further described below with reference to FIGS.

As shown in Figure 3:

The set top box provided in this embodiment mainly includes: a voice recognition module 302, an image recognition module 304, a display module 310, and a central processing module 311. When the normal person chats, the voice is transmitted from 301 to the speech recognition module 302, and the RF 306ts stream is transmitted to the tuner (TUNER) 307 and then transmitted to the demultiplexer 308, and the data is transmitted after demultiplexing the acquired audio data. 302, the speech recognition module 302 analyzes the audio data, corrects the semantics and then converts the text into a subtitle 303 module, and converts it into a sign language 309, and outputs 303 and 309 to the display module 310; The sign language 305 is then converted into the subtitle 303 by the image recognition module 304 and transmitted to the display module 310. During the whole process, the central processing module 311 controls the voice and image recognition module, and the display module, so that the converted display area is different, so that the user is very Good to achieve interactive communication.

As shown in Figure 4:

The conversion method provided in this embodiment includes:

The hearing-language disorder person emits a sign language posture S401, collects a sign language posture image S402 via the camera, transmits the image to the set-top box S403, the set-top box identification image S404, and recognizes a comparison with the local sign language library S405, and matches the entry S406 corresponding to the corresponding sign language posture, if If there is no match, it will go to the network sign language to match S408. If it matches, the subtitle will be output to the subtitle buffer S407, and then displayed on the display memory, and the normal person can view S409.

As shown in Figure 5:

The conversion method provided in this embodiment includes:

The normal person emits a sound S501, collects the sound S502 through a microphone or other recording device, transmits the sound to the set top box S503, and the set top box performs the voice recognition S504. At this time, it is judged whether the channel where the sound is located is the ts stream channel or the recording device is transmitted. S505, if it is transmitted by the recording device, compare S506 with the local text library, match the corresponding vocal entry S507, if there is no match, then go to the network sign language to match S509, if it matches, output the word The screen reaches the subtitle buffer S508, and at the same time, it needs to match the sign language library S510, and output the sign language image and the subtitle information to the memory S511, so that the hearing language disabled person can view the S512.

As shown in Figure 6:

The conversion method provided in this embodiment includes:

Whether it is the ts stream sound S601, if the voice recognizer processes the data of the ts stream audio channel, acquires the audio data S602, inputs the voice recognition S603, and performs the semantic proofreading correction S604, whether it matches the corresponding vocal entry S605, if not If it matches, it will match S607 in the network sign language. If it matches, the subtitle will be output to the subtitle buffer S606. At the same time, it needs to match the sign language library S608, and output the sign language image and subtitle information to the memory S609, so that the hearing language disabled person can watch it. S610.

As shown in Figure 7:

This embodiment implements compatibility between two-way sound and subtitle processing, specifically: ts stream sound 701 is transmitted to the speech recognizer via audio channel 1 (704), and normal person's sound 702 is through audio channel 2 (705) Transmitted to the speech recognizer, then the speech recognizer respectively recognizes 707, and then displays it in two layers. The layer channel 2 displays the text and sign language information corresponding to the ts stream sound, and the layer channel 1 corresponds to the normal person. The result information of the sound conversion, the sign language image 703 is subjected to image recognition 708 after passing through the dedicated codec channel 706, and then converted to the layer channel 3

(711), the last 709, 710, 711 display 712 according to the priority, the positions of the three information display are different, each position indicates the meaning of which party is expressed, and the respective transparency, font size, and sign language size are It can be adjusted. For example, when the user communicates frequently, the corresponding fonts of 710 and 711 will be relatively enlarged, allowing the user to concentrate on chatting. When there is little communication, the 709 font will be slightly enlarged to make the hearing language disorder. More focused on watching TV shows.

In summary, through the implementation of the embodiments of the present invention, at least the following beneficial effects exist:

An embodiment of the present invention provides a method for displaying an exchange information, collecting first exchange information displayed by the first display manner, parsing the first exchange information, acquiring data content corresponding to the first exchange information, and acquiring a second corresponding to the data content. Exchange information and show the second through the second display Exchange information; it can realize the conversion of data between any different display modes, so that people with different needs can communicate. For example, voice display mode can be selected for normal users, and sign language display mode can be selected for language barrier users, so that different users only need The content that needs to be exchanged is displayed in the usual way. Through the conversion based on the data content, the exchange parties can understand the intention of the other party, conduct convenient communication, and enhance the user experience.

Those skilled in the art will appreciate that embodiments of the present invention can be provided as a method, system, or computer program product. Accordingly, the present invention can take the form of a hardware embodiment, a software embodiment, or a combination of software and hardware. Moreover, the invention can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage and optical storage, etc.) including computer usable program code.

The present invention has been described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (system), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or FIG. These computer program instructions can be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing device to produce a machine for the execution of instructions for execution by a processor of a computer or other programmable data processing device. Means for implementing the functions specified in one or more of the flow or in a block or blocks of the flow chart.

Sixth embodiment:

Embodiments of the present invention also provide a storage medium including a stored program, wherein the program described above executes the method of any of the above.

Optionally, in the embodiment, the foregoing storage medium may include, but is not limited to, a USB flash drive, a Read-Only Memory (ROM), and a Random Access Memory (RAM). A variety of media that can store program code, such as a hard disk, a disk, or an optical disk.

Embodiments of the present invention also provide a processor for running a program, wherein the program is executed to perform the steps of any of the above methods.

For example, the specific examples in this embodiment may refer to the examples described in the foregoing embodiments and the optional embodiments, and details are not described herein again.

It will be apparent to those skilled in the art that the various modules or steps of the present invention described above can be implemented by a general-purpose computing device that can be centralized on a single computing device or distributed across a network of multiple computing devices. Alternatively, they may be implemented by program code executable by the computing device such that they may be stored in the storage device by the computing device and, in some cases, may be different from the order herein. The steps shown or described are performed, or they are separately fabricated into individual integrated circuit modules, or a plurality of modules or steps thereof are fabricated as a single integrated circuit module. Thus, the invention is not limited to any specific combination of hardware and software.

The above description is only the preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes can be made to the present invention. Any modifications, equivalent substitutions, improvements, etc. made within the scope of the present invention are intended to be included within the scope of the present invention.

Industrial applicability

The method for displaying the exchange information provided by the embodiment of the present invention collects the first exchange information displayed by the first display manner, parses the first exchange information, acquires the data content corresponding to the first exchange information, and acquires the second corresponding to the data content. The information is exchanged, and the second exchange information is displayed through the second display mode; the data can be converted between any different display modes, so that different people can communicate with each other, for example, the voice display mode can be selected for the normal user, and the language barrier user can be Select the sign language display method, so that different users only need to display the content that needs to be exchanged in their usual way. Through the conversion based on the data content, the exchange parties can understand each other's intentions, conduct convenient communication, and enhance the user experience. .

Claims

A method of displaying information exchange, including:

Collecting first exchange information displayed by the first display manner;

Parsing the first exchange information, acquiring data content corresponding to the first exchange information, and acquiring second exchange information corresponding to the data content;

The second exchange information is displayed by the second display manner.
The method of displaying an exchange information according to claim 1, wherein the first display mode comprises a voice mode, and the second display mode comprises a picture mode;

The collecting the first exchange information displayed by the first display manner includes: collecting the external voice through the voice recognition device, and/or acquiring the first exchange information by collecting the audio channel;

The displaying the second exchange information by using the second display manner includes: displaying the second exchange information in a subtitle form and/or a gesture form on a screen.
The method for displaying an exchange information according to claim 2, further comprising: if the plurality of first exchange information are respectively collected through the two or more paths, respectively displaying the first communication information corresponding to each of the plurality of screens Second, exchange information.
The method for displaying an exchange information according to claim 3, further comprising: determining, based on the degree of importance of each of the first exchange information, a screen position of the second exchange information corresponding to each of the first exchange information.
The method for displaying an exchange information according to any one of claims 1 to 4, wherein the first display mode comprises a picture mode, and the second display mode comprises a voice mode;

The collecting, by the first display, the first exchange information includes: collecting and collecting external gestures and/or characters through image recognition, and/or acquiring the first communication information by collecting the image channels;

The displaying the second communication information by using the second display manner includes: using a speaker, displaying the second communication information by using an analog voice manner.
An exchange information display device, comprising:

The acquiring module is configured to collect the first exchange information displayed by the first display manner;

a processing module, configured to parse the first exchange information, acquire data content corresponding to the first exchange information, and acquire second exchange information corresponding to the data content;

And a display module, configured to display the second exchange information by using a second display manner.
The communication information display device of claim 6, wherein the first display mode comprises a voice mode, and the second display mode comprises a picture mode; the acquisition module is configured to collect an external voice through a voice recognition device, and/ Or acquiring the first exchange information by collecting the audio channel; the display module is configured to display the second exchange information in a subtitle form and/or a gesture form on the screen.
The communication information display device of claim 7, wherein the display module is further configured to display each of the first exchange information through the plurality of screens if the plurality of first exchange information are respectively collected through the two paths and the above paths. Corresponding second exchange information.
The communication information display device according to claim 8, wherein the display module is further configured to determine, according to the importance degree of each of the first exchange information, a screen position of the second exchange information corresponding to each of the first exchange information.
The communication information display device according to any one of claims 6 to 9, wherein the first display mode comprises a picture mode, the second display mode comprises a voice mode; and the acquisition module is configured to send and collect by image recognition. The external communication gesture and/or text, and/or, by acquiring the image channel, acquiring the first communication information; the display module is configured to display the second communication information by using an analog voice mode.
An exchange information display device includes: an interaction module and a processor, wherein

The interaction module is configured to collect the first exchange information displayed by the first display mode, and output the information to the processor, and further configured to display the second exchange information returned by the processor by using the second display manner;

The processor is configured to parse the first communication information, acquire data content corresponding to the first communication information, acquire second communication information corresponding to the data content, and transmit the information to the interaction module.
A set top box includes: a sign language database, an interconnected voice module, a sign language conversion module, and a display module, wherein

The voice module is configured to acquire audio data, and identify and process the audio data to be recognized as semantics;

The sign language conversion module is configured to match, in the sign language database, a sign language to be output corresponding to the audio data according to the semantics;

The display module is configured to display the sign language to be output.
The set top box of claim 12 wherein said display module is further configured to display semantics of said audio data.
The set top box of claim 12, wherein the voice module is configured to separately obtain audio data of a live television program and audio data sent by a normal person through a microphone.
A set top box according to any one of claims 12 to 14, further comprising an image module;

The image module is configured to adopt a gesture gesture of the user, and after performing the proofreading correction process on the gesture gesture, the image module is transmitted to the sign language conversion module;

The sign language conversion module is configured to match corresponding semantics to be output in the sign language database according to the processed gesture gesture;

The display module is configured to display the semantics to be output.
The set top box of claim 15 wherein said display module is further configured to display a standard gesture gesture corresponding to the captured user gesture gesture.
A storage medium, the storage medium comprising a stored program, wherein the program is executed to perform the method of any one of claims 1 to 5.
A processor for running a program, wherein the program is executed to perform the method of any one of claims 1 to 5.