JP6629774B2

JP6629774B2 - Interactive method and apparatus applied to live distribution

Info

Publication number: JP6629774B2
Application number: JP2017023640A
Authority: JP
Inventors: ハオユンフェン
Original assignee: バイドゥオンラインネットワークテクノロジー（ベイジン）カンパニーリミテッド
Priority date: 2016-08-19
Filing date: 2017-02-10
Publication date: 2020-01-15
Anticipated expiration: 2037-02-10
Also published as: JP2018029325A; KR20180020859A; KR101945920B1; CN106303658B; CN106303658A

Description

本願はコンピュータ分野、具体的にネットワーク技術分野、特にライブ配信に応用する対話型方法及び装置に関する。 The present application relates to the field of computers, and in particular to the field of network technology, and more particularly to interactive methods and apparatus applied to live distribution.

ライブ配信において、ライブ配信者（broadcasting jockey）は視聴者とやりとりする必要がある。現在、ライブ配信者と視聴者との間のやりとりは、ライブ配信者が手動で完成する必要がある。例えば、ライブ配信者が視聴者から送られた仮想ギフトに感謝すべき時に、当時のライブ内容を一時的に停止し、文字、ピクチャーを入力して視聴者とやりとりする必要がある。ライブ配信者と視聴者とのやりとりが面倒である一方、ライブ配信者が視聴者とやりとりする必要がある時に、現在のライブ内容を一時的に停止する必要があって、ライブの流暢さに影響する。 In live distribution, a live broadcaster (broadcasting jockey) needs to interact with the viewer. Currently, the interaction between the live distributor and the viewer needs to be completed manually by the live distributor. For example, when a live distributor should appreciate a virtual gift sent from a viewer, it is necessary to temporarily stop the live content at that time and enter characters and pictures to interact with the viewer. Live broadcaster and viewer interaction is cumbersome, but when the live broadcaster needs to interact with the viewer, the current live content needs to be temporarily stopped, affecting live fluency I do.

本願は上記の背景技術部分に存在する技術的課題を解決するために、ライブ配信に応用する対話型方法及び装置を提供する。 The present application provides an interactive method and apparatus applied to live distribution in order to solve the technical problems existing in the above background art part.

第一の態様において、本願はライブ配信に応用する対話型方法を提供し、当該方法は、配信者側のクライアントが送信した、デオストリームとオーディオストリームとを含むライブビデオを受信するステップと、オーディオストリームに対して音声認識を行ってキーワードを取得するステップと、キーワードに対応するインタラクションコマンドを確定するステップと、ライブビデオとインタラクションコマンドを視聴者側のクライアントに送信して視聴者側のクライアントの放送インタフェースにライブビデオと、インタラクションコマンドに対応するインタラクション対象とを表示するステップと、を含んでおり、ここで、ライブビデオは、配信者側のクライアントがリアルタイムに制作することで生成される。 In a first aspect, the present application provides an interactive method applied to live distribution, the method comprising the steps of: receiving a live video including a video stream and an audio stream transmitted by a client on a distributor side; Performing speech recognition on the stream to obtain a keyword, determining an interaction command corresponding to the keyword, transmitting the live video and the interaction command to the client on the viewer side, and broadcasting the client on the viewer side Displaying a live video and an interaction target corresponding to the interaction command on the interface, wherein the live video is generated by a client on the distributor side producing in real time.

第二の態様において、本願はライブ配信に応用する対話型方法を提供し、当該方法は、サーバが送信した、ライブビデオとインタラクションコマンドとを受信するステップと、インタラクションコマンドに対応するインタラクション対象を確定するステップと、放送インタフェースにライブビデオとインタラクション対象とを表示するステップと、を含んでおり、ここで、ライブビデオは、ビデオストリームとオーディオストリームとを含んで、配信者側のクライアントがリアルタイムに制作することで生成され、インタラクションコマンドは、サーバがオーディオストリームに対して音声認識を行った後に取得したキーワードに基づいて確定される。 In a second aspect, the present application provides an interactive method applied to live distribution, the method comprising: receiving a live video and an interaction command transmitted by a server; and determining an interaction target corresponding to the interaction command. And displaying the live video and the interaction target on the broadcast interface, wherein the live video includes a video stream and an audio stream, and is produced in real time by a client on the distributor side. And the interaction command is determined based on the keyword acquired after the server performs speech recognition on the audio stream.

第三の態様において、本願はライブ配信に応用する対話型装置を提供し、当該装置は、配信者側のクライアントが送信した、デオストリームとオーディオストリームとを含むライブビデオを受信するライブビデオ受信ユニットと、
オーディオストリームに対して音声認識を行ってキーワードを取得するために配置される認識ユニットと、キーワードに対応するインタラクションコマンドを確定するために配置される確定ユニットと、ライブビデオとインタラクションコマンドを視聴者側のクライアントに送信して視聴者側のクライアントの放送インタフェースにライブビデオと、インタラクションコマンドに対応するインタラクション対象とを表示する送信ユニットと、を備えており、ここで、ライブビデオは、配信者側のクライアントがリアルタイムに制作することで生成される。 In a third aspect, the present application provides an interactive device applied to live distribution, the device comprising a live video receiving unit for receiving a live video including a video stream and an audio stream transmitted by a client on a distributor side. When,
A recognition unit arranged to perform speech recognition on the audio stream to obtain a keyword, a decision unit arranged to decide an interaction command corresponding to the keyword, and a live video and an interaction command to the viewer side. And a transmission unit that transmits the live video to the broadcast interface of the client on the viewer side and displays an interaction target corresponding to the interaction command. Generated by clients producing in real time.

第四の態様において、本願はライブ配信に応用する対話型装置を提供し、当該装置は、サーバが送信した、ライブビデオとインタラクションコマンドとを受信するために配置される受信ユニットと、インタラクションコマンドに対応するインタラクション対象を確定するために配置されるインタラクション対象確定ユニットと、放送インタフェースにライブビデオとインタラクション対象とを表示するために配置される表示ユニットと、を備えており、ここで、ライブビデオは、ビデオストリームとオーディオストリームとを含んで、配信者側のクライアントがリアルタイムに制作することで生成され、インタラクションコマンドは、サーバがオーディオストリームに対して音声認識を行った後に取得したキーワードに基づいて確定される。 In a fourth aspect, the present application provides an interactive device for application to live distribution, the device comprising: a receiving unit arranged to receive live video and interaction commands transmitted by a server; It has an interaction object determination unit arranged to determine the corresponding interaction object, and a display unit arranged to display live video and the interaction object on the broadcast interface, wherein the live video is , Including the video stream and the audio stream, generated by the client on the distributor side in real time, and the interaction command is determined based on the keyword acquired after the server performs speech recognition on the audio stream. Is done.

本願が提供したライブ配信に応用する対話型方法及び装置は、配信者側のクライアントが送信した、デオストリームとオーディオストリームとを含むライブビデオを受信し、オーディオストリームに対して音声認識を行ってキーワードを取得し、キーワードに対応するインタラクションコマンドを確定し、ライブビデオとインタラクションコマンドを視聴者側のクライアントに送信して視聴者側のクライアントの放送インタフェースにライブビデオと、インタラクションコマンドに対応するインタラクション対象とを表示し、ここで、ライブビデオは、配信者側のクライアントがリアルタイムに制作することで生成される。ライブ配信者が視聴者とやりとりする時にのライブ配信者の操作を簡略化する一方、現在のライブ内容を一時的に停止する必要がなく、ライブ配信の流暢さを保持する。 An interactive method and apparatus applied to live distribution provided by the present application receives a live video including a video stream and an audio stream transmitted by a client on a distributor side, performs speech recognition on the audio stream, and performs keyword recognition. Is obtained, the interaction command corresponding to the keyword is determined, the live video and the interaction command are transmitted to the viewer-side client, and the live video and the interaction target corresponding to the interaction command are transmitted to the broadcast interface of the viewer-side client. Where the live video is generated by the client on the distributor side producing it in real time. While simplifying the live broadcaster's operation when the live broadcaster interacts with the viewer, the current live content does not need to be temporarily stopped, and the live broadcast fluency is maintained.

以下、図面を参照しながら非限定的な実施例を詳細に説明することにより、本発明の他の特徴、目的、および利点は、より明らかになる。 Hereinafter, other features, objects, and advantages of the present invention will become more apparent by describing in detail non-limiting examples with reference to the drawings.

本願に係るライブ配信に応用する対話型方法又は装置の実施例に応用可能な例示的なシステムアーキテクチャを示している。1 illustrates an exemplary system architecture applicable to an embodiment of an interactive method or apparatus applied to live distribution according to the present application. 本願に係るライブ配信に応用する対話型方法の一実施例のフローチャートを示している。5 shows a flowchart of one embodiment of an interactive method applied to live distribution according to the present application. 本願に係るライブ配信に応用する対話型方法の他の一実施例のフローチャートを示している。Fig. 7 shows a flowchart of another embodiment of an interactive method applied to live distribution according to the present application. 本願に係る配信者側のクライアント、サーバ、視聴者側のクライアントの１つの対話の概略図を示している。FIG. 2 shows a schematic diagram of one interaction between a client on the distributor side, a server, and a client on the viewer side according to the present application. 本願に係るライブ配信に応用する対話型方法に適用される１つの例示的なアーキテクチャ図を示している。FIG. 3 shows one exemplary architectural diagram applied to an interactive method applied to live distribution according to the present application. 本願に係るライブ配信に応用する対話型装置の一実施例の構造略図を示している。1 shows a structural schematic diagram of an embodiment of an interactive device applied to live distribution according to the present application. 本願に係るライブ配信に応用する対話型装置の他の一実施例の構造概略図を示している。FIG. 7 shows a structural schematic diagram of another embodiment of an interactive device applied to live distribution according to the present application. 本願に係る実施例を実現するためのライブ配信に応用する対話型装置に適用されるコンピュータシステムの構造概略図を示す。FIG. 1 is a schematic structural diagram of a computer system applied to an interactive device applied to live distribution for realizing an embodiment according to the present application.

以下、図面および実施例を参照しながら、本発明をさらに詳しく説明する。ただし、ここで説明されている具体的な実施例は、係る発明を解釈するためのものに過ぎず、本発明の範囲を制限するものではないと理解されるべきである。また、説明の便宜上、図面に本発明と関連する部分のみが示されている。 Hereinafter, the present invention will be described in more detail with reference to the drawings and examples. However, it should be understood that the specific embodiments described herein are merely for the purpose of interpreting the present invention, and do not limit the scope of the present invention. In addition, for convenience of explanation, only the portions related to the present invention are shown in the drawings.

ただし、衝突がない限り、本願における実施例、および実施例における特徴は、互いに組み合せてもよい。以下、図面を参照しながら実施例に基づいて本発明を詳細に説明する。 However, the embodiments in the present application and the features in the embodiments may be combined with each other as long as no collision occurs. Hereinafter, the present invention will be described in detail based on embodiments with reference to the drawings.

図１は本願に係るライブ配信に応用する対話型方法又は装置の実施例に応用可能な例示的なシステムアーキテクチャ１００を示している。 FIG. 1 shows an exemplary system architecture 100 applicable to an embodiment of an interactive method or apparatus for live distribution according to the present application.

図１に示すように、システムアーキテクチャ１００は配信者側のクライアント１０１、サーバ１０２、及び視聴者側のクライアント１０３を備えてもよい。 As shown in FIG. 1, the system architecture 100 may include a client 101 on the distributor side, a server 102, and a client 103 on the viewer side.

ネットワーク１０４は配信者側のクライアント１０１とサーバ１０２との間に伝送リンクの媒体を提供する。ネットワーク１０４は各種の有線、無線伝送リンクを含んでもよい。ネットワーク１０５はサーバ１０２と視聴者側のクライアント１０３との間に伝送リンクの媒体を提供する。ネットワーク１０５は各種の有線、無線伝送リンクを含んでもよい。 The network 104 provides a medium for a transmission link between the client 101 and the server 102 on the distributor side. Network 104 may include various wired and wireless transmission links. Network 105 provides the medium for the transmission link between server 102 and client 103 on the viewer side. Network 105 may include various wired and wireless transmission links.

配信者側のクライアント１０１のユーザ（ネットワークのライブ配信者とも呼んでもよい）は配信者側のクライアント１０１が位置する端末上のデバイス（例えば、カメラ、マイク）を利用してライブ内容に対応する画像、音声を収集して、リアルタイムにライブビデオを制作することができる。配信者側のクライアント１０１はリアルタイムに制作されるライブビデオをサーバ１０２に送信することができる。サーバ１０２は配信者側のクライアント１０１が送信したライブビデオを受信し、ライブビデオを視聴者側のクライアント１０３に送信することができる。視聴者側のクライアント１０３はライブビデオを受信した後に、ライブビデオを放送することができる。 The user of the client 101 on the distributor side (may also be called a live distributor on the network) uses a device (for example, a camera or a microphone) on a terminal where the client 101 on the distributor side locates an image corresponding to the live content. Gather audio and produce live video in real time. The client 101 on the distributor side can transmit live video produced in real time to the server 102. The server 102 can receive the live video transmitted by the client 101 on the distributor side and transmit the live video to the client 103 on the viewer side. After receiving the live video, the client 103 on the viewer side can broadcast the live video.

本願に係るライブ配信に応用する対話型方法の一実施例のフローチャートを示している図２を参照する。本願実施例が提供するライブ配信に応用する対話型方法は、図１におけるサーバ１０２により実行されてもよく、相応に、ライブ配信に応用する対話型装置はサーバ１０２に設置されてもよいことを説明すべきである。当該方法は、以下のステップを含む。 Please refer to FIG. 2 which shows a flowchart of one embodiment of an interactive method applied to live distribution according to the present application. The interactive method applied to live distribution provided by the embodiment of the present application may be performed by the server 102 in FIG. 1, and the interactive device applied to live distribution may be installed in the server 102 accordingly. Should be explained. The method includes the following steps.

ステップ２０１：配信者側のクライアントが送信したライブビデオを受信する。 Step 201: Receive a live video transmitted by a client on a distributor side.

本実施例において、配信者側のクライアントのユーザ（ネットワークライブ配信者とも呼んでもよい）はライブビデオを制作する時に、配信者側のクライアントが位置する端末のカメラを利用してライブ内容に対応する画像を収集することができ、配信者側のクライアントが所在する端末のマイクを利用して音声（例えば、ネットワークライブ配信者の音声）を収集することができる。配信者側のクライアントが画像と音声を収集した後に、画像と音声をコーディングして、ビデオストリームとオーディオストリームとを含むライブビデオを取得することができる。 In this embodiment, the user of the client on the distributor side (which may also be called a network live distributor) responds to the live content using the camera of the terminal where the client on the distributor side is located when producing the live video. Images can be collected, and audio (for example, audio of a network live distributor) can be collected using a microphone of a terminal where a client on the distributor side is located. After the distributor's client has collected the images and audio, the images and audio can be coded to obtain live video that includes a video stream and an audio stream.

ステップ２０２：オーディオストリームに対して音声認識を行ってキーワードを取得する。 Step 202: Acquire a keyword by performing speech recognition on the audio stream.

実施例において、ステップ２０１によって配信者側のクライアントが送信したライブビデオを受信した後に、ライブビデオにおけるビデオストリームとオーディオストリームのコーディング方式に従って、ライブビデオをデコーディングして、ライブビデオにおけるオーディオストリームを抽出することができる。 In an embodiment, after receiving the live video transmitted by the client on the distributor side in step 201, the live video is decoded according to the coding scheme of the video stream and the audio stream in the live video, and the audio stream in the live video is extracted. can do.

本実施例において、オーディオストリームを抽出した後に、オーディオストリームに対して音声認識を行ってキーワードを取得することができる。本実施例において、キーワードは、視聴者側のクライアントのユーザとのやりとりに関連する単語でもよい。例えば、キーワードは、視聴者側のクライアントのユーザから送られた仮想ギフトに感謝する単語でもよい。オーディオストリームには配信者側のクライアントのユーザの音声が含まれ、配信者側のクライアントのユーザが視聴者側のクライアントのユーザから送られた仮想ギフトに感謝することを例として、オーディオストリームに感謝を表すキーワード、例えば「ありがとうございます」に対応する音声信号が含まれ、オーディオストリームに対して音声認識を行うことで当該キーワードを取得できる。 In the present embodiment, after extracting the audio stream, the audio stream can be subjected to voice recognition to obtain a keyword. In the present embodiment, the keyword may be a word related to the interaction with the user of the client on the viewer side. For example, the keyword may be a word thanking a virtual gift sent from a user of the client on the viewer side. The audio stream contains the audio of the distributor's client user, and thanks to the audio stream, for example, the distributor's client user appreciates the virtual gift sent by the viewer's client user Is included, for example, an audio signal corresponding to “Thank you”, and the keyword can be obtained by performing voice recognition on the audio stream.

本実施例の幾つかの選択可能な実現方式において、オーディオストリームに対して音声認識を行ってキーワードを取得するステップにおいては、オーディオストリームに音声認識を行ってオーディオストリームに対応するステートメントを取得するステップと、ステートメントを分割して単語の集合を取得するステップと、単語の集合のうち予め設定されたキーワードとマッチングするキーワードを検出ステップと、を含む。 In some selectable implementations of the present embodiment, the step of performing speech recognition on the audio stream to obtain a keyword includes the step of performing speech recognition on the audio stream to obtain a statement corresponding to the audio stream. And a step of dividing the statement to obtain a set of words, and a step of detecting a keyword that matches a preset keyword from the set of words.

本実施例において、配信者側のクライアントのユーザと視聴者側のクライアントのユーザがライブ配信のやりとりにおいてよく応用される単語、例えば「ありがとうございます」、「愛している」、「花」を所定のキーワードとして予め配置することができる。受信したライブビデオにおけるオーディオストリームに対して音声認識を行ってオーディオストリームに対応する文を取得することができる。それから、文を分割して、単語の集合を取得することができる。当該単語の集合のうち予め設定されたキーワードとマッチングするキーワードを検出ことができる。 In this embodiment, the user of the client on the distributor side and the user of the client on the viewer side determine words frequently applied in the exchange of live distribution, for example, "Thank you", "I love you", and "Flower". Can be arranged in advance as keywords. Voice recognition can be performed on the audio stream in the received live video to obtain a sentence corresponding to the audio stream. Then, the sentence can be divided to obtain a set of words. A keyword matching a preset keyword can be detected from the set of words.

ステップ２０３：キーワードに対応するインタラクションコマンドを確定する。 Step 203: Determine an interaction command corresponding to the keyword.

本実施例において、ステップ２０２によってライブビデオにおけるオーディオストリームに対して音声認識を行ってキーワードを取得した後、キーワードに対応するインタラクションコマンドを確定することができる。例えば、オーディオストリームには配信者側のクライアントのユーザの音声が含まれ、当該音声には「愛している」、「花」等の単語に対応する音声信号が含まれる時に、オーディオストリームに対して認識を行って、キーワードの「愛している」、「花」を認識し得る。キーワードの「愛している」に対応するインタラクションコマンドは、視聴者側のクライアントの放送インタフェースにインタラクション対象（例えば、ハート形のピクチャー）を表示することをトリガーすることに用いられてもよい。キーワードの「花」に対応するインタラクションコマンドは、視聴者側のクライアントの放送インタフェースにインタラクション対象（例えば、花のピクチャー）を表示することをトリガーすることに用いられてもよい。 In the present embodiment, after performing voice recognition on an audio stream of a live video to acquire a keyword in step 202, an interaction command corresponding to the keyword can be determined. For example, when the audio stream includes the voice of the user of the client on the distributor side, and the voice includes an audio signal corresponding to a word such as "I love you" or "flower," By performing recognition, the keywords “love” and “flower” can be recognized. The interaction command corresponding to the keyword “I love” may be used to trigger display of an interaction target (for example, a heart-shaped picture) on the broadcast interface of the client on the viewer side. The interaction command corresponding to the keyword “flower” may be used to trigger display of an interaction target (for example, a flower picture) on the broadcast interface of the client on the viewer side.

ステップ２０４：ライブビデオとインタラクションコマンドを視聴者側のクライアントに送信する。 Step 204: Send the live video and the interaction command to the client on the viewer side.

本実施例において、ステップ２０３によってキーワードに対応するインタラクションコマンドを確定した後に、インタラクションコマンドとライブビデオを視聴者側のクライアントに送信することができる。これにより、視聴者側のクライアントがインタラクションコマンドとライブビデオを受信した後に、放送インタフェースにライブビデオと、インタラクションコマンドに対応するインタラクション対象とを表示することができる。 In this embodiment, after the interaction command corresponding to the keyword is determined in step 203, the interaction command and the live video can be transmitted to the client on the viewer side. Thereby, after the client on the viewer side receives the interaction command and the live video, the live video and the interaction target corresponding to the interaction command can be displayed on the broadcast interface.

本実施例の幾つかの選択可能な実現方式において、インタラクションコマンドに対応するインタラクション対象は、アニメーション、ピクチャー、及び絵文字を含む。 In some selectable implementations of this embodiment, the interaction objects corresponding to the interaction commands include animations, pictures, and pictograms.

本実施例において、ステップ２０３によってキーワードに対応するインタラクションコマンドを確定した後に、インタラクションコマンドとライブビデオを視聴者側のクライアントに送信することができる。これにより、視聴者側のクライアントがインタラクションコマンドとライブビデオを受信した後に、ライブビデオにインタラクションコマンドに対応するアニメーション、ピクチャー、絵文字を表示することができる。配信者側のクライアントのユーザはアニメーション、ピクチャー、絵文字を利用して視聴者側のクライアントのユーザとやりとりすることができる。 In this embodiment, after the interaction command corresponding to the keyword is determined in step 203, the interaction command and the live video can be transmitted to the client on the viewer side. Thereby, after the client on the viewer side receives the interaction command and the live video, animation, pictures, and pictograms corresponding to the interaction command can be displayed on the live video. The user of the client on the distributor side can interact with the user of the client on the viewer side using animation, pictures, and pictograms.

本実施例の幾つかの選択可能な実現方式において、キーワードに対応する音声信号がライブビデオに出現する時点を確定するステップと、時点を含むタイムスタンプ情報を生成するステップと、タイムスタンプ情報を視聴者側のクライアントに送信するステップとを、更に含む。 In some selectable implementations of this embodiment, determining a time point at which the audio signal corresponding to the keyword appears in the live video, generating time stamp information including the time point, and viewing the time stamp information. Transmitting to the client on the participant side.

本実施例において、オーディオストリームに対して音声認識を行ってキーワードを取得するとともに、キーワードに対応する音声信号がライブビデオに出現する時点を確定することができる。当該時点を含むタイムスタンプ情報を生成して視聴者側のクライアントに送信することができる。これにより、視聴者側のクライアントがインタラクションコマンドとライブビデオを受信した時に、当該タイムスタンプ情報に基づいて、キーワードに対応する音声信号がライブビデオに出現する時点を確定して、インタラクションコマンドに対応するインタラクション対象を、放送インタフェースのライブビデオにおける当該時点に対応する映像フレームに重ね合わせて表示する。 In the present embodiment, it is possible to acquire a keyword by performing speech recognition on an audio stream, and to determine a time point at which an audio signal corresponding to the keyword appears in live video. Time stamp information including the time point can be generated and transmitted to the client on the viewer side. Thereby, when the client on the viewer side receives the interaction command and the live video, the time when the audio signal corresponding to the keyword appears in the live video is determined based on the time stamp information, and the client responds to the interaction command. The interaction target is displayed so as to be superimposed on a video frame corresponding to the time point in the live video of the broadcast interface.

本願に係るライブ配信に応用する対話型方法の他の一実施例のフローチャートを示している図３を参照する。本願実施例が提供するライブ配信に応用する対話型方法は、図１におけるサーバ１０２により実行されてもよく、相応に、ライブ配信に応用する対話型装置はサーバ１０２に設置されてもよいことを説明すべきである。当該方法は、以下のステップを含む。 Please refer to FIG. 3 which shows a flowchart of another embodiment of the interactive method applied to live distribution according to the present application. The interactive method applied to live distribution provided by the embodiment of the present application may be performed by the server 102 in FIG. 1, and the interactive device applied to live distribution may be installed in the server 102 accordingly. Should be explained. The method includes the following steps.

ステップ３０１：サーバが送信したライブビデオとインタラクションコマンドを受信する。 Step 301: Receive a live video and an interaction command transmitted by a server.

本実施例において、ライブビデオは、配信者側のクライアントがリアルタイムに制作することで生成され、ビデオストリームとオーディオストリームとを含む。 In this embodiment, the live video is generated by the client on the distributor side producing in real time, and includes a video stream and an audio stream.

本実施例において、視聴者側のクライアントによってライブ配信を視聴する時に、サーバが送信したライブビデオとインタラクションコマンドを受信することができる。インタラクションコマンドは、サーバがライブビデオにおけるオーディオストリームに対して音声認識を行った後に取得したキーワードに基づいて確定されることができる。 In this embodiment, when viewing the live distribution by the client on the viewer side, the live video transmitted from the server and the interaction command can be received. The interaction command can be determined based on keywords obtained after the server performs speech recognition on the audio stream in the live video.

例えば、サーバは受信した配信者側のクライアントにより送信されたライブビデオをデコーディングして、ライブビデオにおけるオーディオストリームを抽出することができる。オーディオストリームを抽出した後に、サーバはオーディオストリームに対して音声認識を行ってキーワードを取得することができる。オーディオストリームには配信者側のクライアントのユーザの音声が含まれ、配信者側のクライアントのユーザが視聴者側のクライアントのユーザがから送られた仮想ギフトに対して感謝することを例として、オーディオストリームに感謝を表すキーワード、例えば「ありがとうございます」に対応する音声信号が含まれ、サーバはオーディオストリームに対して音声認識を行って当該キーワードを取得する。この時に、サーバが送信した当該キーワードに対応するインタラクションコマンドを受信することができる。 For example, the server can decode the received live video transmitted by the distributor-side client to extract an audio stream in the live video. After extracting the audio stream, the server can perform speech recognition on the audio stream to obtain keywords. The audio stream contains the audio of the distributor's client user, and the distributor's client user appreciates the virtual gift sent by the viewer's client user, as an example. The stream includes a keyword indicating appreciation, for example, an audio signal corresponding to “Thank you”, and the server performs speech recognition on the audio stream to acquire the keyword. At this time, an interaction command corresponding to the keyword transmitted by the server can be received.

ステップ３０２：インタラクションコマンドに対応するインタラクション対象を確定する。 Step 302: Determine an interaction target corresponding to the interaction command.

本実施例において、ステップ３０１によってサーバが送信したライブビデオとインタラクションコマンドを受信した後に、インタラクションコマンドに対応するインタラクション対象を確定することができる。 In this embodiment, after receiving the live video and the interaction command transmitted by the server in step 301, the interaction target corresponding to the interaction command can be determined.

例えば、ライブビデオのオーディオストリームにおける、配信者側のクライアントのユーザの音声には「ありがとうございます」、「愛している」のキーワードが含まれる時に、「ありがとうございます」、「愛している」がそれぞれ１つのインタラクションコマンドに対応し、各インタラクションコマンドは１つのインタラクション対象に対応する。 For example, in the audio stream of a live video, the audio of the client user on the distributor side includes the keywords "Thank you" and "I love you." Each corresponds to one interaction command, and each interaction command corresponds to one interaction target.

本実施例において、インタラクションコマンドに対応するインタラクション対象は、アニメーション、ピクチャー、及び絵文字を含むが、それらに限られていない。 In the present embodiment, the interaction targets corresponding to the interaction commands include, but are not limited to, animations, pictures, and pictograms.

ステップ３０３：放送インタフェースにライブビデオとインタラクション対象とを表示する。 Step 303: Display the live video and the interaction target on the broadcast interface.

本実施例において、ステップ３０２によってインタラクションコマンドに対応するインタラクション対象を確定した後に、ライブビデオにインタラクション対象を表示することができる。 In this embodiment, after the interaction target corresponding to the interaction command is determined in step 302, the interaction target can be displayed on the live video.

ライブビデオのオーディオストリームにおける、配信者側のクライアントのユーザの音声には「ありがとうございます」、「愛している」のキーワードが含まれる時に、即ち、配信者側のクライアントのユーザがライブ配信において「ありがとうございます」、「愛している」と言った時に、「ありがとうございます」、「愛している」に対応するインタラクションコマンドを受信することができる。インタラクションコマンドに対応するインタラクション対象、例えば、アニメーション、ピクチャー、絵文字を確定することができる。放送インタフェースに「ありがとうございます」、「愛している」に対応するインタラクション対象を表示することができ、即ち、「ありがとうございます」、「愛している」に対応するアニメーション、ピクチャー、絵文字をライブビデオに重ね合わせて表示する。 When the audio of the client of the distributor in the audio stream of the live video includes the keywords "Thank you" and "I love you", that is, when the user of the distributor has When you say "Thank you" or "I love you", you can receive the interaction command corresponding to "Thank you" or "I love you." An interaction target corresponding to the interaction command, for example, an animation, a picture, or a pictogram can be determined. Interaction objects corresponding to "Thank you" and "I love you" can be displayed on the broadcast interface, that is, live animation of animations, pictures, and emoticons corresponding to "Thank you" and "I love you" Is superimposed and displayed.

本実施例の幾つかの選択可能な実現方式において、サーバが送信したタイムスタンプ情報を受信するステップを更に含み、タイムスタンプ情報はキーワードに対応する音声信号がライブビデオに出現する時点を含んで、当該時点に放送インタフェースにおいてインタラクション対象を表示する。 In some selectable implementations of this embodiment, the method further comprises receiving time stamp information transmitted by the server, wherein the time stamp information includes a time at which an audio signal corresponding to the keyword appears in live video; At this point, an interaction target is displayed on the broadcast interface.

本実施例において、サーバが送信したタイムスタンプ情報を受信することができ、タイムスタンプ情報はキーワードに対応する音声信号がライブビデオに出現する時点を含む。キーワードに対応する音声信号がライブビデオに出現する時点に基づいて、ライブビデオの当該時点に対応する映像フレームにインタラクション対象を重ねわせて表示することができる。 In this embodiment, the time stamp information transmitted by the server can be received, and the time stamp information includes a time point at which the audio signal corresponding to the keyword appears in the live video. Based on the point in time at which the audio signal corresponding to the keyword appears in the live video, the interaction target can be displayed so as to overlap the video frame corresponding to the point in the live video.

本願の配信者側のクライアント、サーバ、視聴者側のクライアントの１つの対話の概略図を示している図４を参照する。 Reference is made to FIG. 4 which shows a schematic diagram of one interaction of the client, server and viewer client of the present application.

配信者側のクライアントは画像と音声を収集し、ライブビデオを制作する。ライブクライアントはリアルタイムにライブ内容に対応する画像と音声を収集して、リアルタイムにライブビデオを制作することができる。 The distributor's client collects images and audio and produces live video. A live client can collect images and sounds corresponding to the live content in real time and produce live video in real time.

配信者側のクライアントはライブビデオをサーバに送信する。 The distributor's client sends the live video to the server.

サーバはライブビデオからオーディオを抽出、ライブビデオにおけるオーディオストリームに対して音声認識を行ってキーワードを取得し、キーワードに対応するインタラクションコマンドを確定する。各キーワードは１つのインタラクションコマンドに対応し、各インタラクションコマンドは１つのインタラクション対象に対応する。 The server extracts audio from the live video, performs voice recognition on the audio stream in the live video to obtain a keyword, and determines an interaction command corresponding to the keyword. Each keyword corresponds to one interaction command, and each interaction command corresponds to one interaction target.

サーバはインタラクションコマンドとライブビデオを視聴者側のクライアントに送信する。 The server sends the interaction command and the live video to the client on the viewer side.

視聴者側のクライアントにライブビデオとインタラクションコマンドとを表示する。視聴者側のクライアントは放送インタフェースにおいてライブビデオを放送し、しかもライブビデオにインタラクションコマンドに対応するインタラクション対象を表示することができる。 Display live video and interaction commands to the client on the viewer side. The client on the viewer side can broadcast the live video on the broadcast interface and display the interaction target corresponding to the interaction command on the live video.

本実施例において、配信者側のクライアントのユーザがネットワークライブを行う時に、ライブ配信者の音声に対して認識を行って、インタラクションコマンドを取得し、視聴者側のクライアントにライブビデオを放送するとともに、インタラクションコマンドに対応するインタラクション対象を表示する。これにより、配信者側のクライアントのユーザがライブ内容を一時的に停止する必要がない場合に視聴者側のクライアントのユーザとやりとりを行う。例えば、配信者側のクライアントのユーザがライブ配信において「ありがとうございます」、「愛している」と言った時に、視聴者側のクライアントの放送インタフェースに「ありがとうございます」、「愛している」に対応するアニメーション、ピクチャー、絵文字を表示することができる。 In this embodiment, when a user of the client on the distributor side performs network live, the voice of the live distributor is recognized, an interaction command is obtained, and the live video is broadcast to the client on the viewer side. , An interaction target corresponding to the interaction command is displayed. Thereby, when the user of the client on the distributor side does not need to temporarily stop the live content, the user of the client on the viewer side interacts. For example, when the user of the broadcaster's client says "Thank you" or "I love you" in the live distribution, the broadcast interface of the viewer's client changes to "Thank you" or "I love you". The corresponding animations, pictures and pictograms can be displayed.

本願に係るライブ配信に応用する対話型方法に適用される１つの例示的なアーキテクチャ図を示している図５を参照する。 Please refer to FIG. 5, which shows one exemplary architectural diagram applied to the interactive method applied to live distribution according to the present application.

図５において、ライブクライアントシステム、ライブサーバシステムを示している。ライブクライアントシステムは、オーディオ・ビデオ収集モジュールとインタラクション表示モジュールとを備える。オーディオ・ビデオ収集モジュールは、配信者側のクライアントに配置されてもよく、配信者側のクライアントにおいて、オーディオ・ビデオ情報、即ち、ライブ内容に対応する画像、音声を収集し、ライブサーバシステムのオーディオ・ビデオ受信モジュールに送信する。インタラクション表示モジュールは視聴者側のクライアントに配置されてもよく、ライブサーバシステムのインタラクション処理モジュールが送信したインタラクションコマンドを受信し、かつインタラクションコマンドに従って、視聴者側のクライアントにインタラクションコマンドに対応するインタラクション対象を表示することができる。ライブサーバシステムはサーバに配置されてもよく、ライブサーバシステムは、オーディオ・ビデオ受信モジュールと、オーディオ・ビデオ処理モジュールと、音声認識モジュールと、自然言語処理モジュールと、インタラクションコマンドモジュールと、インタラクション処理モジュールとを備える。オーディオ・ビデオ受信モジュールはライブクライアントが収集したオーディオ・ビデオ情報を受信し、受信したオーディオ・ビデオ情報をオーディオ・ビデオ処理モジュールに送信することに用いられてもよい。オーディオ・ビデオ処理モジュールは、オーディオ・ビデオ情報におけるオーディオ情報を解析し、オーディオ情報を音声認識モジュールに送信することに用いられてもよい。音声認識モジュールはオーディオ情報からテキスト情報を認識することに用いられてもよい。自然音声処理モジュールはテキスト情報に対して分割を行って、キーワードリストを取得することに用いられてもよい。インタラクション処理モジュールはインタラクションコマンドモジュールからキーワードリストのうちキーワードに対応するインタラクションコマンドを取得し、取得したインタラクションコマンドをインタラクション表示モジュールに送信することができる。 FIG. 5 shows a live client system and a live server system. The live client system includes an audio / video collection module and an interaction display module. The audio / video collection module may be arranged in the client on the distributor side, and collects audio / video information, that is, images and sounds corresponding to live contents, on the client on the distributor side, and outputs the audio and video data of the live server system.・ Send to the video receiving module. The interaction display module may be arranged in the client on the viewer side, receives the interaction command transmitted by the interaction processing module of the live server system, and, in accordance with the interaction command, gives the client on the viewer side an interaction target corresponding to the interaction command. Can be displayed. The live server system may be located on a server, the live server system comprising an audio / video receiving module, an audio / video processing module, a speech recognition module, a natural language processing module, an interaction command module, and an interaction processing module. And The audio / video receiving module may be used to receive the audio / video information collected by the live client and transmit the received audio / video information to the audio / video processing module. The audio-video processing module may be used to analyze the audio information in the audio-video information and send the audio information to the speech recognition module. The speech recognition module may be used to recognize text information from audio information. The natural sound processing module may be used to divide text information to obtain a keyword list. The interaction processing module can acquire an interaction command corresponding to the keyword from the keyword list from the interaction command module, and transmit the acquired interaction command to the interaction display module.

本願に係るライブ配信に応用する対話型装置の一実施例の構造略図を示している図６を参照する。当該装置実施例は図２に示す方法実施例に対応する。 Please refer to FIG. 6, which shows a structural schematic diagram of one embodiment of an interactive device applied to live distribution according to the present application. The device embodiment corresponds to the method embodiment shown in FIG.

図６に示すように、本実施例に係るライブ配信に応用する対話型装置６００は、ライブビデオ受信ユニット６０１と、認識ユニット６０２と、送信ユニット６０３とを備える。そのうち、ライブビデオ受信ユニット６０１は配信者側のクライアントが送信したライブビデオを受信し，ライブビデオは、配信者側のクライアントがリアルタイムに制作することで生成され、ライブビデオは、ビデオストリームとオーディオストリームとを含み、認識ユニット６０２はオーディオストリームに対して音声認識を行ってキーワードを取得するように配置され、確定ユニットはキーワードに対応するインタラクションコマンドを確定するように配置され、送信ユニット６０３はライブビデオとインタラクションコマンドを視聴者側のクライアントに送信して、視聴者側のクライアントの放送インタフェースにライブビデオと、インタラクションコマンドに対応するインタラクション対象とを表示するように配置される。 As shown in FIG. 6, the interactive device 600 applied to live distribution according to the present embodiment includes a live video receiving unit 601, a recognition unit 602, and a transmitting unit 603. The live video receiving unit 601 receives the live video transmitted by the client on the distributor side, the live video is generated by the client on the distributor side producing in real time, and the live video is composed of a video stream and an audio stream. The recognition unit 602 is arranged to perform speech recognition on the audio stream to obtain a keyword, the determination unit is arranged to determine an interaction command corresponding to the keyword, and the transmission unit 603 includes a live video And an interaction command are transmitted to the client on the viewer side, and the live video and the interaction target corresponding to the interaction command are displayed on the broadcast interface of the client on the viewer side.

本実施例の幾つかの選択可能な実現方式において、認識ユニット６０２は、オーディオストリームに対して音声認識を行ってオーディオストリームに対応する文を取得するために配置されるオーディオストリーム認識サブユニット（図示せず）と、文を分割して単語の集合を取得するために配置される単語分割サブユニット（図示せず）と；単語の集合のうち予め設定されたキーワードとマッチングするキーワードを検出ために配置される検出サブユニット（図示せず）とを備える。 In some optional implementations of the present embodiment, the recognition unit 602 includes an audio stream recognition subunit (FIG. 3) arranged to perform speech recognition on the audio stream and obtain a sentence corresponding to the audio stream. And a word segmentation subunit (not shown) arranged to divide a sentence to obtain a set of words; and to detect a keyword that matches a preset keyword in the set of words. And a detection subunit (not shown) to be arranged.

本実施例の幾つかの選択可能な実現方式において、装置６００は、キーワードに対応する音声信号がライブビデオに出現する時点を確定するために配置される時点確定ユニット（図示せず）と、時点を含むタイムスタンプ情報を生成するために配置される生成ユニット（図示せず）と、タイムスタンプ情報を視聴者側のクライアントに送信するために配置される情報送信ユニット（図示せず）とを更に備える。 In some optional implementations of the present embodiment, the apparatus 600 includes a time determination unit (not shown) arranged to determine the time at which the audio signal corresponding to the keyword appears in the live video; A generating unit (not shown) arranged to generate time stamp information including the following, and an information transmitting unit (not shown) arranged to transmit the time stamp information to the client on the viewer side. Prepare.

本実施例の幾つかの選択可能な実現方式において、インタラクション対象は、アニメーション、ピクチャー、及び絵文字を含む。 In some optional implementations of this embodiment, the interaction objects include animations, pictures, and pictograms.

本願に係るライブ配信に応用する対話型装置の他の一実施例の構造概略図を示している図７を参照する。当該装置実施例は、図３に示す方法実施例に対応する。 Please refer to FIG. 7 which shows a schematic structural diagram of another embodiment of the interactive device applied to live distribution according to the present application. The device embodiment corresponds to the method embodiment shown in FIG.

図７に示すように、本実施例に係るライブ配信に応用する対話型装置７００は、受信ユニット７０１と、インタラクション対象確定ユニット７０２と、表示ユニット７０３とを備える。そのうち、受信ユニット７０１は、サーバが送信したライブビデオとインタラクションコマンドを受信するために配置され、ライブビデオは、配信者側のクライアントがリアルタイムに制作することで生成され、ビデオストリームとオーディオストリームとを含み、インタラクションコマンドは、サーバがオーディオストリームに対して音声認識を行った後に取得したキーワードに基づいて確定され、インタラクション対象確定ユニット７０２はインタラクションコマンドに対応するインタラクション対象を確定するために配置され、表示ユニット７０３は、放送インタフェースにライブビデオとインタラクション対象とを表示するために配置される。 As shown in FIG. 7, the interactive device 700 applied to live distribution according to the present embodiment includes a receiving unit 701, an interaction target determining unit 702, and a display unit 703. The receiving unit 701 is arranged to receive the live video and the interaction command transmitted by the server. The live video is generated by the client on the distributor side producing in real time, and the video stream and the audio stream are generated. The interaction command is determined based on a keyword obtained after the server performs speech recognition on the audio stream. The interaction target determination unit 702 is arranged to determine an interaction target corresponding to the interaction command, and is displayed. The unit 703 is arranged to display live video and an interaction target on a broadcast interface.

本実施例の幾つかの選択可能な実現方式において、装置７００は、更に、サーバが送信したタイムスタンプ情報を受信するために配置される情報受信ユニット（図示せず）を備え、タイムスタンプ情報は、キーワードに対応する音声信号がライブビデオに出現する時点を含んで、当該時点に放送インタフェースにインタラクション対象を表示する。 In some optional implementations of this embodiment, the apparatus 700 further comprises an information receiving unit (not shown) arranged to receive the time stamp information transmitted by the server, wherein the time stamp information is , Including the time when the audio signal corresponding to the keyword appears in the live video, and displays the interaction target on the broadcast interface at that time.

図８は、本願の実施例を実現するためのライブ配信に応用する対話型装置に適用されるコンピュータシステムの構造概略図を示す。 FIG. 8 is a schematic structural diagram of a computer system applied to an interactive device applied to live distribution for realizing the embodiment of the present application.

図８に示すように、コンピュータシステム８００は、読み出し専用メモリ（ＲＯＭ）８０２に記憶されているプログラムまたは記憶部８０８からランダムアクセスメモリ（ＲＡＭ）８０３にロードされたプログラムに基づいて様々な適当な動作および処理を実行することができる中央処理装置（ＣＰＵ）８０１を備える。ＲＡＭ８０３には、システム８００の操作に必要な様々なプログラムおよびデータがさらに記憶されている。ＣＰＵ８０１、ＲＯＭ８０２およびＲＡＭ８０３は、バス８０４を介して互いに接続されている。入力／出力（Ｉ／Ｏ）インターフェース８０５もバス８０４に接続されている。 As shown in FIG. 8, the computer system 800 performs various appropriate operations based on a program stored in a read-only memory (ROM) 802 or a program loaded from a storage unit 808 into a random access memory (RAM) 803. And a central processing unit (CPU) 801 that can execute processing. Various programs and data necessary for the operation of the system 800 are further stored in the RAM 803. The CPU 801, the ROM 802, and the RAM 803 are connected to each other via a bus 804. An input / output (I / O) interface 805 is also connected to the bus 804.

キーボード、マウスなどを含む入力部８０６、陰極線管（ＣＲＴ）、液晶ディスプレイ（ＬＣＤ）など、およびスピーカなどを含む出力部８０７、ハードディスクなどを含む記憶部８０８、およびＬＡＮカード、モデムなどを含むネットワークインターフェースカードの通信部８０９は、Ｉ／Ｏインターフェース８０５に接続されている。通信部８０９は、例えばインターネットのようなネットワークを介して通信処理を実行する。ドライバ８１０は、必要に応じてＩ／Ｏインターフェース８０５に接続される。リムーバブルメディア８１１は、例えば、マグネチックディスク、光ディスク、光磁気ディスク、半導体メモリなどのようなものであり、必要に応じてドライバ８１０に取り付けられ、したがって、ドライバ８１０から読み出されたコンピュータプログラムが必要に応じて記憶部８０８にインストールされる。 An input unit 806 including a keyboard, a mouse, and the like, an output unit 807 including a cathode ray tube (CRT), a liquid crystal display (LCD), and a speaker, a storage unit 808 including a hard disk, and a network interface including a LAN card, a modem, and the like The communication unit 809 of the card is connected to the I / O interface 805. The communication unit 809 executes a communication process via a network such as the Internet, for example. The driver 810 is connected to the I / O interface 805 as needed. The removable medium 811 is, for example, a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like. The removable medium 811 is attached to the driver 810 as necessary, and thus requires a computer program read from the driver 810. Is installed in the storage unit 808 in accordance with.

特に、本願の実施例によれば、上記のフローチャートを参照しながら記載されたプロセスは、コンピュータのソフトウェアプログラムとして実現されてもよい。例えば、本願の実施例は、コンピュータプログラム製品を含み、当該コンピュータプログラム製品は、機械可読媒体に有形に具現化されるコンピュータプログラムを含み、前記コンピュータプログラムは、フローチャートで示される方法を実行するためのプログラムコードを含む。このような実施例では、当該コンピュータプログラムは、通信部８０９を介してネットワークからダウンロードされてインストールされてもよく、および／またはリムーバブルメディア８１１からインストールされてもよい。 In particular, according to embodiments of the present application, the processes described with reference to the above-described flowcharts may be implemented as software programs on a computer. For example, embodiments of the present application include a computer program product, the computer program product including a computer program tangibly embodied in a machine-readable medium, wherein the computer program executes a method shown in a flowchart. Including program code. In such an embodiment, the computer program may be downloaded and installed from a network via the communication unit 809, and / or may be installed from the removable medium 811.

図面におけるフローチャートおよびブロック図は、本願の各実施例に係るシステム、方法およびコンピュータプログラム製品により実現可能なアーキテクチャ、機能および操作を示す。ここで、フローチャートまたはブロック図における各枠は、１つのモジュール、プログラムセグメント、またはコードの一部を代表してもよく、前記モジュール、プログラムセグメント、またはコードの一部は、規定された論理機能を達成するための１つ以上の実行可能な命令を含む。なお、いくつかの代替実施態様として、枠に示された機能は、図面に示された順番と異なる順番で実行されてもよい。例えば、連続して示された２つの枠は、関連する機能に応じて、実際にほぼ並行に実行されてもよく、逆の順番で実行されてもよい。なお、ブロック図および／またはフローチャートにおける各枠と、ブロック図および／またはフローチャートにおける枠の組合せは、規定された機能または操作を実行する、ハードウェアに基づく専用システムで実現されてもよく、あるいは、専用ハードウェアとコンピュータの命令との組合せで実行されてもよい。 The flowcharts and block diagrams in the drawings illustrate architectures, functions, and operations achievable by the system, method, and computer program product according to the embodiments of the present application. Here, each frame in the flowchart or the block diagram may represent a module, a program segment, or a part of code, and the module, the program segment, or the part of code may perform a defined logical function. Includes one or more executable instructions to accomplish. It should be noted that in some alternative embodiments, the functions shown in the boxes may be performed in a different order than that shown in the figures. For example, two boxes shown in succession may actually be executed substantially in parallel, or in the reverse order, depending on the function involved. Note that each frame in the block diagram and / or flowchart and the combination of the frame in the block diagram and / or flowchart may be realized by a hardware-based dedicated system that performs a defined function or operation, or It may be executed by a combination of dedicated hardware and computer instructions.

一方、本願は、不揮発性コンピュータ記憶媒体をさらに提供し、当該不揮発性コンピュータ記憶媒体は、上記した実施例の前記装置に含まれる不揮発性コンピュータ記憶媒体であってもよく、独立に存在して端末に組み立てられていない不揮発性コンピュータ記憶媒体であってもよい。前記不揮発性コンピュータ記憶媒体は、１つ以上のプログラムが記憶され、前記１つ以上のプログラムが１つの機器により実行された場合に、上記した機器にライブ配信に、配信者側のクライアントが送信した、デオストリームとオーディオストリームとを含むライブビデオを受信し、前記オーディオストリームに対して音声認識を行ってキーワードを取得し、前記キーワードに対応するインタラクションコマンドを確定し、前記ライブビデオとインタラクションコマンドを視聴者側のクライアントに送信して視聴者側のクライアントの放送インタフェースにライブビデオとインタラクションコマンドとが対応するインタラクション対象を表示するようにさせ、ここで、前記ライブビデオは、前記配信者側のクライアントがリアルタイムに制作することで生成される。 On the other hand, the present application further provides a non-volatile computer storage medium, which may be the non-volatile computer storage medium included in the device of the above-described embodiment, and may be provided independently of the terminal. It may be a non-volatile computer storage medium that has not been assembled. The non-volatile computer storage medium stores one or more programs, and when the one or more programs are executed by one device, a client of a distributor transmits the live distribution to the device. Receiving a live video including a video stream and an audio stream, performing voice recognition on the audio stream to obtain a keyword, determining an interaction command corresponding to the keyword, and viewing the live video and the interaction command To the client on the viewer side to cause the broadcast interface of the client on the viewer side to display the interaction target corresponding to the live video and the interaction command, wherein the live video is transmitted by the client on the distributor side. Control in real time It is generated by.

以上の記載は、本願の好ましい実施例、および使用された技術的原理の説明に過ぎない。本願に係る特許請求の範囲が、上記した技術的特徴の特定な組合せからなる技術案に限定されることではなく、本願の趣旨を逸脱しない範囲で、上記の技術的特徴または同等の特徴の任意の組合せからなる他の技術案も含むべきであることを、当業者は理解すべきである。例えば、上記の特徴と、本願に開示された類似の機能を持っている技術的特徴（これらに限定されていない）とを互いに置き換えてなる技術案が挙げられる。 The above description is only illustrative of the preferred embodiment of the present application and the technical principles used. The scope of the claims of the present application is not limited to the technical solution composed of the specific combination of the above-mentioned technical features, and any of the above-mentioned technical features or equivalents may be provided without departing from the gist of the present application. Those skilled in the art should understand that other technical solutions consisting of a combination of the above should also be included. For example, there is a technical solution in which the above features and the technical features having similar functions disclosed in the present application (but not limited to these) are mutually replaced.

Claims

An interactive method applied to live broadcasting,
A server receiving a live video including a video stream and an audio stream transmitted by a client on a distributor side;
The server performing speech recognition on the audio stream to obtain a keyword,
The server determining an interaction command corresponding to the keyword;
Transmitting the live video and the interaction command to the client on the viewer side to display the live video and the interaction target corresponding to the interaction command on the broadcast interface of the client on the viewer side. Yes,
Here, the live video is generated by the client on the distributor side producing in real time ,
Performing voice recognition on the audio stream to obtain a keyword,
Performing speech recognition on the audio stream to obtain a sentence corresponding to the audio stream;
Dividing the sentence to obtain a set of words;
Detecting the keyword that matches a preset keyword from a set of words,
The interactive method comprises:
Determining the time at which the audio signal corresponding to the keyword appears in the live video;
Generating time stamp information including the time point;
Transmitting the time stamp information to a client on the viewer side,
The client viewer side, interactive and wherein the that displays superimposed interaction object to the video frame corresponding to the time in the live video.

The method of claim 1 , wherein the interaction target includes an animation, a picture, and a pictogram.

An interactive method applied to live broadcasting,
Receiving live video and interaction commands sent by the server;
Determining an interaction target corresponding to the interaction command;
Displaying live video and the interaction target on a broadcast interface,
Here, the live video includes a video stream and an audio stream, and is generated by the client on the distributor side producing in real time, and the interaction command indicates that the server has performed speech recognition on the audio stream. Finalized based on keywords retrieved later ,
The interactive method comprises:
Receiving the time stamp information transmitted by the server,
Here, the time stamp information includes a time when the audio signal corresponding to the keyword appearing in live video, thereby, that displays superimposed interaction object to the video frame corresponding to the point in the live video An interactive method, comprising:

An interactive device located on a server and applied to live distribution,
A live video receiving unit that receives live video including a video stream and an audio stream transmitted by a client on the distributor side;
A recognition unit arranged to perform speech recognition on the audio stream to obtain a keyword;
A determination unit arranged to determine an interaction command corresponding to the keyword;
A transmission unit that transmits the live video and the interaction command to the client on the viewer side to display a live video on the broadcast interface of the client on the viewer side and an interaction target corresponding to the interaction command,
Here, the live video is generated by the client on the distributor side producing in real time ,
The recognition unit comprises:
An audio stream recognition subunit arranged to perform speech recognition on the audio stream to obtain a sentence corresponding to the audio stream;
A word segmentation subunit arranged to segment the sentence to obtain a set of words;
A detection subunit arranged to detect the keyword that matches a preset keyword from a set of words,
The interactive device comprises:
A time determination unit arranged to determine a time at which an audio signal corresponding to the keyword appears in live video;
A generating unit arranged to generate time stamp information including the time point;
An information transmission unit arranged to transmit the time stamp information to a client on the viewer side, further comprising:
The client viewer side, interactive apparatus characterized by that displays superimposed interaction object to the video frame corresponding to the time in the live video.

The apparatus according to claim 4 , wherein the interaction target includes an animation, a picture, and a pictogram.

An interactive device applied to live distribution,
A receiving unit arranged to receive live video and interaction commands sent by the server;
An interaction object determination unit arranged to determine an interaction object corresponding to the interaction command;
A display unit arranged to display live video and the interaction target on a broadcast interface,
Here, the live video includes a video stream and an audio stream, and is generated by the client on the distributor side producing in real time, and the interaction command indicates that the server has performed speech recognition on the audio stream. Finalized based on keywords retrieved later ,
The interactive device further includes an information receiving unit arranged to receive the time stamp information transmitted by the server,
The time stamp information includes a time when the audio signal corresponding to the keyword appearing in live video, thereby, said the that displays superimposed interaction object to the video frame corresponding to the point in the live video Interactive device.