TW202318252A - Caption service system for remote speech recognition - Google Patents

Caption service system for remote speech recognition Download PDF

Info

Publication number
TW202318252A
TW202318252A TW110139500A TW110139500A TW202318252A TW 202318252 A TW202318252 A TW 202318252A TW 110139500 A TW110139500 A TW 110139500A TW 110139500 A TW110139500 A TW 110139500A TW 202318252 A TW202318252 A TW 202318252A
Authority
TW
Taiwan
Prior art keywords
speech recognition
subtitle
asr
speaker
live broadcast
Prior art date
Application number
TW110139500A
Other languages
Chinese (zh)
Inventor
陳信宏
廖元甫
王逸如
黃紹華
姚秉志
葉政育
陳又碩
鍾耀興
黃彥鈞
黃啟榮
沈立得
古甯允
Original Assignee
國立陽明交通大學
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 國立陽明交通大學 filed Critical 國立陽明交通大學
Priority to TW110139500A priority Critical patent/TW202318252A/en
Publication of TW202318252A publication Critical patent/TW202318252A/en

Links

Images

Landscapes

  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The present invention provides a caption service system for remote speech recognition, which provides caption service for the hearing impaired. This system includes a speaker and a live broadcast equipment at A, a listener-typist and a computer at B, a hearing impaired and a live screen at C, and an automatic speech recognition (ASR) caption server at D. Connect the live broadcast equipment, the computer, the live screen and the ASR caption server with a network. The speaker's audio is sent to the automatic speech recognition (ASR) caption server to be converted into text, which is corrected by the listener-typist, and then the text caption is sent to the live screen of the hearing impaired together with the speaker's video and audio, so that the hearing impaired can see the text caption spoken by the speaker.

Description

遠端語音辨識的字幕服務系統 Subtitle Service System for Remote Speech Recognition

本發明有關於遠端語音辨識的字幕服務系統,尤其是指經由字幕服務器與聽打員而為聽障者提供遠端語音辨識的字幕服務系統。 The present invention relates to a subtitle service system for remote voice recognition, in particular to a subtitle service system that provides remote voice recognition for the hearing-impaired via a subtitle server and a typist.

因為COVID-19疫情的爆發,所以遠距直播與教學成為廣受採納的趨勢。但目前的一般遠距直播與教學並無字幕顯示,使得聽障學生無法上課。 Due to the outbreak of the COVID-19 epidemic, remote live broadcasting and teaching has become a widely adopted trend. However, the current general remote live broadcast and teaching do not have subtitles, which makes it impossible for hearing-impaired students to attend classes.

在一般的課堂上,聽障學生上課也有問題,因為沒有顯示器直接顯示教師授課內容的字幕。在各種演講與會議場合,聽障人士無法參加,因為沒有顯示器直接顯示字幕。 In general classrooms, hearing-impaired students also have problems in class, because there are no monitors that directly display subtitles of what the teacher is teaching. In various lectures and conferences, the hearing impaired cannot participate because there is no monitor to directly display subtitles.

因此為聽障人士設置能顯示教師或演講者所說內容的字幕,是對聽障人士的一大福音。 Therefore, it is a great boon for the hearing-impaired to set subtitles that can display what the teacher or speaker said.

現在有一些會議使用聽打員當場將講者所說的內容即時用電腦打字成為字幕呈現在電腦螢幕上,讓聽障者能瞭解現場狀況。但聽打員耗費很大的精力聆聽講者的內容,一旦工作時間過長,可能就會出現漏句與錯字,因此必須提供更完善的遠端聽打員方案。 At present, some conferences use audiovisual staff to type what the speaker said on the spot with a computer and display them as subtitles on the computer screen, so that the hearing-impaired can understand the situation at the scene. However, the listener spends a lot of energy listening to the content of the speaker. Once the working hours are too long, there may be missing sentences and typos. Therefore, a more complete remote listener solution must be provided.

本發明的目的在提出一種遠端語音辨識的字幕服務系統,為 聽障者提供遠端語音辨識的字幕服務,本發明的內容敘述如下。 The purpose of the present invention is to propose a subtitle service system for remote speech recognition, for The subtitle service of remote voice recognition is provided for the hearing-impaired, and the content of the present invention is described as follows.

此系統包含在A地的講者與直播設備,在B地的聽打員與電腦,在C地的聽障者與直播畫面、在D地的自動語音辨識(ASR)字幕服務器,以網路聯接直播設備、電腦、直播畫面與ASR字幕服務器。 This system includes speakers and live broadcast equipment at A, listeners and computers at B, hearing-impaired people and live broadcast images at C, automatic speech recognition (ASR) subtitle server at D, and Internet Connect the live broadcast equipment, computer, live screen and ASR subtitle server.

自動語音辨識(ASR)字幕服務器包含:即時訊息協定(RTMP)用以接收A地經由網路而傳來的直播串流;開源語音識別工具包用以進形行語音識別和信號處理;網路伺服器負責提供網頁的介面,透過HTTP協定傳給直播設備、電腦與直播畫面;錄音模組以供聽打員進行回放的功能。 Automatic Speech Recognition (ASR) subtitle server includes: Instant Messaging Protocol (RTMP) for receiving live streaming from A through the network; open source speech recognition toolkit for speech recognition and signal processing; network The server is responsible for providing the interface of the webpage, which is transmitted to the live broadcast device, computer and live broadcast screen through the HTTP protocol; the recording module is used for playback by the listener.

講者的聲音送入自動語音辨識(ASR)字幕服務器轉成文字,經由聽打員校正,然後將文字字幕與講者的影像和聲音一起送到聽障者的直播畫面,使聽障者能看到講者所說的文字字幕。 The voice of the speaker is sent to the Automatic Speech Recognition (ASR) subtitle server to convert it into text, which is corrected by the auditory typist. See text subtitles of what the speaker said.

1:講者 1: speaker

2:直播設備 2: Live broadcast equipment

3:聽打員 3: listener

4:電腦 4: computer

5:聽障者 5: hearing impaired

6:直播畫面 6: Live screen

61:字幕區域 61: subtitle area

7:自動語音辨識(ASR)字幕服務器 7: Automatic Speech Recognition (ASR) Subtitle Server

8:網路 8: Network

9:即時訊息協定(RTMP) 9: Instant Messaging Protocol (RTMP)

10:開源語音識別工具包Kaldi ASR 10: Open source speech recognition toolkit Kaldi ASR

11:網路伺服器 11:Web server

12:錄音模組 12:Recording module

13:OBS直播軟體 13:OBS Live Streaming Software

14:ASR上傳介面 14: ASR upload interface

15:音源串流 15: Audio streaming

16:音源紀錄 16: Audio record

17:直播設備截錄的影像聲音 17: Video and sound captured by live equipment

18:從瀏覽器截錄的字幕內容 18: Subtitle content captured from browser

圖1為本發明遠端語音辨識的字幕服務系統基本架構示意圖。 FIG. 1 is a schematic diagram of the basic structure of a subtitle service system for remote speech recognition according to the present invention.

圖2為本發明自動語音辨識(ASR)字幕服務器的內容示意圖。 FIG. 2 is a schematic diagram of the content of the Automatic Speech Recognition (ASR) subtitle server of the present invention.

圖3為本發明直播設備的內容示意圖。 Fig. 3 is a schematic diagram of the content of the live broadcast device of the present invention.

圖4為本發明自動語音辨識(ASR)字幕服務器的字幕產生過程示意圖。 FIG. 4 is a schematic diagram of the subtitle generation process of the Automatic Speech Recognition (ASR) subtitle server of the present invention.

圖5為本發明聽打員在異地的操作示意圖。 Fig. 5 is a schematic diagram of the operation of the listener in different places according to the present invention.

圖6為本發明將講者直播畫面與字幕合併輸出的示意圖。 Fig. 6 is a schematic diagram of combining and outputting the speaker's live video and subtitles according to the present invention.

圖1說明本發明遠端語音辨識的字幕服務系統基本架構。講者1與直播設備2在A地,聽打員3與其電腦4在B地,聽障者5與直播畫面6在C地,自動語音辨識(ASR)字幕服務器7在D地,四地以網路8相聯,網路8可以是本地區域網路或全球網際網路。若A地、B地、C地在同一地,代表講者1、聽打員3、聽障者5在同一間教室或會議室。 FIG. 1 illustrates the basic structure of the subtitle service system for remote speech recognition of the present invention. The speaker 1 and the live broadcast device 2 are at site A, the listener 3 and his computer 4 are at site B, the hearing-impaired person 5 and the live broadcast screen 6 are at site C, the automatic speech recognition (ASR) subtitle server 7 is at site D, and four sites The network 8 is connected, and the network 8 may be a local area network or a global Internet. If site A, site B, and site C are in the same site, it means that speaker 1, auditory typist 3, and hearing-impaired person 5 are in the same classroom or meeting room.

圖2說明本發明自動語音辨識(ASR)字幕服務器7所包含的內容。即時訊息協定(Real-Time Messaging Protocol,RTMP)9,是一種廣泛使用在直播串流的一種協議,ASR字幕機服務器7搭載此協議就能接收A地經由網路8而傳來的直播串流。若不使用RTMP,也可以使用HTTP Live Streaming(縮寫為HLS),這是由蘋果公司提出的基於HTTP串流媒體網路傳輸協定,同樣也能接收A地經由網路8而傳來的直播串流。但本發明的方法並不侷限於RTMP或HLS。 FIG. 2 illustrates what is included in the Automatic Speech Recognition (ASR) subtitle server 7 of the present invention. Instant Messaging Protocol (Real-Time Messaging Protocol, RTMP) 9 is a protocol widely used in live streaming. The ASR subtitle machine server 7 equipped with this protocol can receive the live streaming from A through the network 8 . If you do not use RTMP, you can also use HTTP Live Streaming (abbreviated as HLS), which is an HTTP-based streaming media network transmission protocol proposed by Apple, and can also receive live streams from A through the network 8 flow. But the method of the present invention is not limited to RTMP or HLS.

ASR字幕服務器7的語音辨識方面使用一開源語音識別工具包Kaldi ASR 10,用於語音識別和信號處理,可在Apache License v2.0下免費獲得。 The speech recognition of the ASR subtitle server 7 uses an open source speech recognition toolkit Kaldi ASR 10 for speech recognition and signal processing, which can be obtained free of charge under the Apache License v2.0.

ASR字幕服務器7上需要架設網路伺服器11,一個負責提供網頁的介面,透過HTTP協定傳給客戶端(一般是指網頁瀏覽器),客戶端指直播設備2、電腦4、直播畫面6等。 The ASR subtitle server 7 needs to set up a network server 11, an interface responsible for providing web pages, which is transmitted to the client (generally referring to a web browser) through the HTTP protocol. The client refers to the live broadcast device 2, the computer 4, the live screen 6, etc. .

ASR字幕服務器7具有一錄音模組12,以供聽打員進行「回放」的功能。 The ASR subtitle server 7 has a recording module 12 for the function of "playback" by the listener.

請見圖3,說明本發明直播設備2的內容。A地講者1的直播設備2收錄到講者1的影像與聲音而分成兩路,其中第一路是輸入Open Broadcaster Software(OBS)直播軟體13,此軟體是由OBS Project開發的自由開源跨平台串流媒體和錄影程式,是直播主通常使用的軟體。OBS直播軟體13後端可以直接推送到各個平台例如YouTube、Facebook或Twitch等。 Please refer to FIG. 3 , illustrating the content of the live broadcast device 2 of the present invention. In A, speaker 1's live broadcast device 2 records the image and sound of speaker 1 and divides it into two channels, of which the first channel is input Open Broadcaster Software (OBS) live broadcast software 13, this software is a free and open source cross-platform streaming media and video recording program developed by the OBS Project, and is commonly used by live broadcasters. The backend of OBS Live Streaming Software 13 can be directly pushed to various platforms such as YouTube, Facebook or Twitch.

第二路僅提供講者1的聲音,可以透過簡易的ASR上傳介面14將聲音進行封包並透過RTMP 9(或HLS)串流到ASR字幕服務器7。 The second channel only provides the voice of the speaker 1, and the voice can be packaged through the simple ASR upload interface 14 and streamed to the ASR subtitle server 7 through RTMP 9 (or HLS).

請見圖4,說明本發明自動語音辨識(ASR)字幕服務器7的字幕產生過程。當串流封包送到ASR字幕服務器7的RTMP 9(或HLS)接收模組後,串流封包就會被解包成音源串流15,並分別傳給Kaldi ASR 10以及錄音模組12。錄音模組12會隨著時間的進行,將音源依時間記錄成音源紀錄16。Kaldi ASR 10模組收到音源串流15後,即逐步轉化成文字,每段文字都會夾帶一個「標籤」,如圖4所示。「標籤」會描述這段文字是對應音源紀錄第幾秒的位置,並記錄時長為多少。這些文字以及標籤會顯示在網路伺服器11的頁面上,經由網路8傳到直播設備2、電腦4、直播畫面6。 Please refer to FIG. 4 , which illustrates the subtitle generation process of the Automatic Speech Recognition (ASR) subtitle server 7 of the present invention. When the streaming packet is sent to the RTMP 9 (or HLS) receiving module of the ASR subtitle server 7, the streaming packet will be unpacked into an audio source stream 15, and sent to the Kaldi ASR 10 and the recording module 12 respectively. The recording module 12 will record the sound source into the sound source record 16 according to time. After the Kaldi ASR 10 module receives the audio source stream 15, it will gradually convert it into text, and each text will contain a "tag", as shown in Figure 4. "Tag" will describe the position of this text corresponding to the second of the audio recording, and the recording duration. These words and tags will be displayed on the webpage of the web server 11, and transmitted to the live broadcast device 2, the computer 4, and the live broadcast screen 6 via the network 8.

請見圖5,說明本發明聽打員3在異地的操作。在異地的聽打員3開啟YouTube、Facebook或Twitch平台接收來自A地講者1的直播影像與聲音。聽打員3並且透過網頁瀏覽而登入ASR字幕服務器7的網路伺服器11頁面,從頁面上閱聽講者1的文字與聲音。 See also Fig. 5, illustrate the operation of the listener 3 of the present invention in different places. The listener 3 in a different place turns on YouTube, Facebook or Twitch to receive the live video and sound from the speaker 1 in A. The listener 3 logs into the web server 11 page of the ASR subtitle server 7 through web browsing, and listens to the words and sounds of the lecturer 1 on the page.

聽打員3在ASR字幕服務器7上被設定為具備可讀可寫的權限,因此可以在網路伺服器11頁面上修改Kaldi ASR 10所產生的文字。每段文字在網路伺服器11頁面上有標籤屬性,例如聽打員3在第C段文字 點擊兩下,網路伺服器11頁面就會根據標籤的指示,請求音源紀錄16的第N3秒、語音時長Z秒的片段進行回放。這樣聽打員3就可以確認講者1所說過的內容是什麼,進而修改文字。 Listener 3 is set to have readable and writable authority on ASR subtitle server 7, therefore can revise the literal that Kaldi ASR 10 produces on network server 11 pages. Each section of text has a tag attribute on the web server 11 page. For example, the listener 3 clicks twice on the C paragraph text, and the web server 11 page will request the N 3 of the audio source record 16 according to the tag instruction. seconds, and the audio duration is Z seconds for playback. In this way, the listener 3 can confirm what the speaker 1 has said, and then modify the text.

請見圖6,說明本發明A地講者1將直播畫面6與字幕合併輸出。講者1的直播設備2可以透過網頁瀏覽登入ASR字幕服務器7的網路伺服器11頁面,但只有讀取權限。換句話說,在講者1的直播設備2上只能看ASR字幕服務器7翻譯出來的文字,以及聽打員3修改後的文字。 Please refer to FIG. 6 , which illustrates that the speaker 1 of the present invention A combines the live image 6 and subtitles for output. The live broadcast device 2 of the speaker 1 can browse and log in to the web server 11 page of the ASR subtitle server 7 through the webpage, but only has the read permission. In other words, only the text translated by the ASR subtitle server 7 and the text modified by the listener 3 can be viewed on the live broadcast device 2 of the speaker 1 .

OBS直播軟體13具有疊加畫面的功能,講者1在直播設備2從ASR字幕服務器7的網路伺服器11頁面選取字幕內容並進行截錄,然後將直播設備2截錄的影像聲音17以及從瀏覽器截錄的字幕內容18進行疊加,經由OBS直播軟體13輸出含有ASR字幕服務器7所產生的字幕的直播畫面6。然後由OBS直播軟體13推送到YouTube、Facebook或Twitch等平台,使C地的聽障者5能在直播畫面6上的字幕區域61看到字幕內容。 The OBS live broadcast software 13 has the function of superimposing pictures. The lecturer 1 selects the subtitle content from the web server 11 page of the ASR subtitle server 7 on the live broadcast device 2 and intercepts it, and then the video and sound 17 recorded by the live broadcast device 2 and from the The subtitle content 18 captured by the browser is superimposed, and the live screen 6 containing the subtitle generated by the ASR subtitle server 7 is output via the OBS live broadcast software 13 . Then push to platforms such as YouTube, Facebook or Twitch by OBS live broadcast software 13, make the hearing-impaired person 5 of C place see the subtitle content in the subtitle area 61 on the live broadcast picture 6.

本發明的精神與範圍決定於下面的申請專利範圍,不受限於上述實施例。 The spirit and scope of the present invention are determined by the scope of claims below, and are not limited to the above-mentioned embodiments.

1:講者 1: speaker

2:直播設備 2: Live broadcast equipment

3:聽打員 3: listener

4:電腦 4: computer

5:聽障者 5: hearing impaired

6:直播畫面 6: Live screen

61:字幕區域 61: subtitle area

7:自動語音辨識(ASR)字幕服務器 7: Automatic Speech Recognition (ASR) Subtitle Server

8:網路 8: Network

Claims (10)

一種遠端語音辨識的字幕服務系統,包含: A subtitle service system for remote speech recognition, comprising: 一講者與一直播設備在一A地,一聽打員與一電腦在一B地,一聽障者與一直播畫面在一C地、一自動語音辨識(ASR)字幕服務器在一D地,該直播設備、該電腦、該直播畫面與該ASR字幕服務器以一網路相聯;該講者的聲音送入該自動語音辨識(ASR)字幕服務器轉成文字,經由該聽打員校正,然後將該文字字幕與該講者的影像和聲音一起送到該聽障者的該直播畫面,使該聽障者能看到該講者所說的該文字字幕。 A speaker and a live broadcast device are at a site A, a listener and a computer are at a site B, a hearing-impaired person and a live broadcast screen are at a site C, and an automatic speech recognition (ASR) subtitle server is at a site D , the live broadcast equipment, the computer, the live screen and the ASR subtitle server are connected through a network; the voice of the speaker is sent to the automatic speech recognition (ASR) subtitle server and converted into text, which is corrected by the listener, Then send the text subtitles together with the speaker's image and sound to the live broadcast of the hearing-impaired person, so that the hearing-impaired person can see the text subtitles spoken by the speaker. 如申請專利範圍第1項之遠端語音辨識的字幕服務系統,其中該自動語音辨識(ASR)字幕服務器包含: For example, the subtitle service system for remote speech recognition in item 1 of the patent scope, wherein the Automatic Speech Recognition (ASR) subtitle server includes: 一即時訊息協定(RTMP)用以接收該A地經由該網路而傳來的一直播串流; A real-time message protocol (RTMP) is used to receive a live streaming from the place A via the network; 一開源語音識別工具包用以進形行語音識別和信號處理; An open source speech recognition toolkit for speech recognition and signal processing; 一網路伺服器負責提供網頁的介面,透過HTTP協定傳給該直播設備、該電腦與該直播畫面; A web server is responsible for providing the interface of the webpage, which is transmitted to the live broadcast device, the computer and the live broadcast screen through the HTTP protocol; 一錄音模組以供該聽打員進行回放的功能。 A recording module for playback by the listener. 如申請專利範圍第2項之遠端語音辨識的字幕服務系統,其中該直播設備截錄該講者的影像與聲音而分成兩路: For example, the subtitle service system for remote speech recognition in item 2 of the patent scope, in which the live broadcast device intercepts the video and audio of the speaker and divides it into two channels: 第一路將該講者的影像與聲音輸入一OBS直播軟體,該OBS直播軟體後端可以將該講者的影像與聲音直接推送到各種平台例如YouTube、Facebook或Twitch等; The first way is to input the video and sound of the speaker into an OBS live broadcast software, and the back end of the OBS live broadcast software can directly push the video and sound of the speaker to various platforms such as YouTube, Facebook or Twitch, etc.; 第二路僅提供該講者的聲音,可以透過一ASR上傳介面將該聲音形 成一串流封包並透過該RTMP而送到該ASR字幕服務器。 The second channel only provides the speaker's voice, which can be uploaded through an ASR interface Form a stream packet and send it to the ASR subtitle server through the RTMP. 如申請專利範圍第3項之遠端語音辨識的字幕服務系統,其中該ASR字幕服務器的字幕產生過程如下: For example, the subtitle service system for remote speech recognition in the third item of patent scope, wherein the subtitle generation process of the ASR subtitle server is as follows: 當該串流封包送到該ASR字幕服務器的該RTMP接收模組後,該串流封包就會被解包成一音源串流,並分別傳給該開源語音識別工具包以及該錄音模組; When the streaming packet is sent to the RTMP receiving module of the ASR subtitle server, the streaming packet will be unpacked into a sound stream, and sent to the open source speech recognition toolkit and the recording module respectively; 該錄音模組則隨著時間的進行,將該音源串流依時間記錄成一音源紀錄; The recording module records the audio source stream into an audio source record according to time as time progresses; 該開源語音識別工具包收到該音源串流後,即逐步轉化成文字,每段文字都會夾帶一標籤,該標籤描述該段文字是對應該音源紀錄第幾秒的位置,並記錄時長為多少;該文字以及該標籤顯示在該網路伺服器的頁面上,經由該網路傳到該直播設備、該電腦與該直播畫面。 After receiving the audio source stream, the open source speech recognition toolkit will gradually convert it into text, and each text will contain a tag, which describes the position of the text corresponding to the second recorded by the audio source, and the recording time is how much; the text and the label are displayed on the web server page, and transmitted to the live broadcast device, the computer and the live broadcast screen via the network. 如申請專利範圍第4項之遠端語音辨識的字幕服務系統,其中該聽打員開啟YouTube、Facebook或Twitch平台接收來自該講者的影像與聲音;該聽打員並且透過網頁瀏覽而登入該ASR字幕服務器的該網路伺服器頁面,從該頁面上閱聽該講者的該文字與聲音;該聽打員在該ASR字幕服務器上被設定為具備可讀可寫的權限,因此可以在該網路伺服器頁面上修改該開源語音識別工具包所產生的該文字。 For example, the subtitle service system for remote speech recognition in item 4 of the patent scope, wherein the listener opens YouTube, Facebook or Twitch platform to receive images and sounds from the speaker; The web server page of the ASR subtitle server, read and listen to the text and sound of the speaker from this page; the listener is set to have readable and writable permissions on the ASR subtitle server, so it can be used on the ASR subtitle server The text generated by the open source speech recognition toolkit is modified on the web server page. 如申請專利範圍第5項之遠端語音辨識的字幕服務系統,其中該講者從該直播設備透過網頁瀏覽而登入該ASR字幕服務器的該網路伺服器頁面,讀取該文字;該講者將其直播畫面與該文字合併輸出到該OBS直播軟體進行疊加,然後由該OBS直播軟體推送到YouTube、Facebook或Twitch等 平台,使該聽障者能在該直播畫面上的一字幕區域看到該文字內容。 For example, the subtitle service system for remote speech recognition in claim 5 of the patent scope, wherein the speaker logs in the web server page of the ASR subtitle server through web browsing from the live broadcast device, and reads the text; the speaker Combine the live video and the text and output it to the OBS live software for superimposition, and then the OBS live software will push it to YouTube, Facebook or Twitch, etc. The platform enables the hearing-impaired person to see the text content in a subtitle area on the live broadcast screen. 如申請專利範圍第1項之遠端語音辨識的字幕服務系統,其中該網路為一本地區域網路或一全球網際網路。 Such as the subtitle service system for remote speech recognition in item 1 of the scope of the patent application, wherein the network is a local area network or a global Internet. 如申請專利範圍第2項之遠端語音辨識的字幕服務系統,其中該即時訊息協定(RTMP)可以用蘋果公司的HTTP Live Streaming(HLS)取代,或用任何類似功能的網路傳輸協定取代。 For example, in the subtitle service system for remote speech recognition in item 2 of the scope of the patent application, the instant message protocol (RTMP) can be replaced by Apple's HTTP Live Streaming (HLS), or replaced by any network transmission protocol with similar functions. 如申請專利範圍第2項之遠端語音辨識的字幕服務系統,其中該開源語音識別工具為Kaldi ASR,可在Apache License v2.0下免費獲得。 For example, the subtitle service system for remote speech recognition in item 2 of the patent application scope, in which the open source speech recognition tool is Kaldi ASR, which can be obtained free of charge under the Apache License v2.0. 如申請專利範圍第1項之遠端語音辨識的字幕服務系統,其中若該A地、該B地、該C地在同一地,代表該講者、該聽打員、該聽障者在同一間教室或會議室。 For example, the subtitle service system for remote voice recognition in item 1 of the scope of the patent application, wherein if the location A, the location B, and the location C are in the same location, it means that the speaker, the typing personnel, and the hearing-impaired person are in the same location. classroom or conference room.
TW110139500A 2021-10-25 2021-10-25 Caption service system for remote speech recognition TW202318252A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW110139500A TW202318252A (en) 2021-10-25 2021-10-25 Caption service system for remote speech recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW110139500A TW202318252A (en) 2021-10-25 2021-10-25 Caption service system for remote speech recognition

Publications (1)

Publication Number Publication Date
TW202318252A true TW202318252A (en) 2023-05-01

Family

ID=87378808

Family Applications (1)

Application Number Title Priority Date Filing Date
TW110139500A TW202318252A (en) 2021-10-25 2021-10-25 Caption service system for remote speech recognition

Country Status (1)

Country Link
TW (1) TW202318252A (en)

Similar Documents

Publication Publication Date Title
US6820055B2 (en) Systems and methods for automated audio transcription, translation, and transfer with text display software for manipulating the text
US7035804B2 (en) Systems and methods for automated audio transcription, translation, and transfer
US9160551B2 (en) Analytic recording of conference sessions
JP2007006444A (en) Multimedia production control system
CN102802044A (en) Video processing method, terminal and subtitle server
WO2016127691A1 (en) Method and apparatus for broadcasting dynamic information in multimedia conference
WO2012072008A1 (en) Method and device for superposing auxiliary information of video signal
CN111479124A (en) Real-time playing method and device
US20020188772A1 (en) Media production methods and systems
JP2005269607A (en) Instant interactive audio/video management system
CA3159656A1 (en) Distributed network recording system with synchronous multi-actor recording
US11735185B2 (en) Caption service system for remote speech recognition
TW202318252A (en) Caption service system for remote speech recognition
JP2003271530A (en) Communication system, inter-system relevant device, program and recording medium
CN106170986A (en) Program output device, program server, auxiliary information management server, program and the output intent of auxiliary information and storage medium
JP2004266578A (en) Moving image editing method and apparatus
Grewe et al. MPEG-H Audio production workflows for a Next Generation Audio Experience in Broadcast, Streaming and Music
TW202318398A (en) Speech recognition system for teaching assistance
US20230026467A1 (en) Systems and methods for automated audio transcription, translation, and transfer for online meeting
US20230096430A1 (en) Speech recognition system for teaching assistance
Nishiyama et al. Recording and Mixing Techniques for Ambisonic Sound Production
Schreer et al. Media production, delivery and interaction for platform independent systems: format-agnostic media
US11381628B1 (en) Browser-based video production
JP2013201505A (en) Video conference system and multipoint connection device and computer program
JP5424359B2 (en) Understanding support system, support terminal, understanding support method and program