CN113115103A - System and method for realizing real-time audio-to-text conversion in network live broadcast - Google Patents
System and method for realizing real-time audio-to-text conversion in network live broadcast Download PDFInfo
- Publication number
- CN113115103A CN113115103A CN202110252429.XA CN202110252429A CN113115103A CN 113115103 A CN113115103 A CN 113115103A CN 202110252429 A CN202110252429 A CN 202110252429A CN 113115103 A CN113115103 A CN 113115103A
- Authority
- CN
- China
- Prior art keywords
- voice
- module
- audio
- live broadcast
- characters
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000006243 chemical reaction Methods 0.000 title claims abstract description 46
- 238000000034 method Methods 0.000 title claims abstract description 30
- 238000012545 processing Methods 0.000 claims abstract description 25
- 230000005236 sound signal Effects 0.000 claims abstract description 10
- 230000005540 biological transmission Effects 0.000 claims description 11
- 230000003044 adaptive effect Effects 0.000 claims description 2
- 238000005516 engineering process Methods 0.000 abstract description 9
- 230000007547 defect Effects 0.000 abstract description 5
- 238000010586 diagram Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000012827 research and development Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000009897 systematic effect Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/439—Processing of audio elementary streams
- H04N21/4398—Processing of audio elementary streams involving reformatting operations of audio signals
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G10L15/34—Adaptation of a single recogniser for parallel processing, e.g. by use of multiple processors or cloud computing
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/47—End-user applications
- H04N21/488—Data services, e.g. news ticker
- H04N21/4884—Data services, e.g. news ticker for displaying subtitles
Abstract
The invention relates to a system and a method for realizing real-time audio-to-text conversion in network live broadcast, which comprises a system configuration module: the system is used for configuring parameters such as stream codes, definition, language types and the like of live broadcast; the voice acquisition module: the device is connected to live broadcast equipment through a line, and audio signals in the equipment are collected; a voice conversion module: receiving voice data of a voice acquisition module in real time, and calculating, identifying and converting the voice data into characters at a high speed; and a subtitle processing module: and acquiring the characters converted by the voice conversion module in real time and editing the characters into a video screen for display. The invention designs a system and a method for converting voice into characters. The method is based on an audio recognition technology, aims to provide a novel method for displaying subtitles in real time in a live video broadcast process, and overcomes the defects that the traditional subtitle displaying method consumes financial resources and manpower, is not timely and randomly strained and the like.
Description
Technical Field
The invention relates to the technical field of audio-to-text conversion, in particular to a system and a method for realizing real-time audio-to-text conversion through network live broadcast.
Background
With the rapid development of economy in China, people have increasingly improved lives, and in various regions, theatres, enterprises and the like, the characteristics of the people need to be displayed, and activities need to be organized to display performances. China is large in population, and people cannot arrive at the scene to watch the ground, so that people can watch the performance activities of all places at home by means of network live broadcast. In the drama of the performance, various language display forms such as songs, operas, small articles, phase sounds and recitations exist, the languages are various, some languages cannot be understood, and people want to visually see results in order to promote cultural communication, so that the results need to be displayed in characters. The live caption is very important at this time, and the live caption is still a short plate in the prior art at present, and the caption switching is manually carried out by means of a high-price machine and then arranging characters in advance to contrast with script lines to follow the scene condition. The technology can realize the display of the subtitles at present, but is financial and labor-consuming, the situation on the stage sometimes changes, and the subtitles may not be matched or have no subtitles if the situation changes.
Disclosure of Invention
In view of the defects in the background art, the invention relates to a system and a method for realizing real-time audio-to-text conversion in network live broadcast. The method is based on an audio recognition technology, aims to provide a novel method for displaying subtitles in real time in a live video broadcast process, and overcomes the defects that the traditional subtitle displaying method consumes financial resources and manpower, is not timely and randomly strained and the like.
The invention relates to a system and a method for realizing real-time audio-to-text conversion in network live broadcast, which comprises a system configuration module: the system is used for configuring parameters such as stream codes, definition, language types and the like of live broadcast; the voice acquisition module: the device is connected to live broadcast equipment through a line, and audio signals in the equipment are collected; a voice conversion module: receiving voice data of a voice acquisition module in real time, and calculating, identifying and converting the voice data into characters at a high speed; and a subtitle processing module: and acquiring the characters converted by the voice conversion module in real time and editing the characters into a video screen for display.
By adopting the scheme, the technical scheme of the invention applies the audio recognition technology, applies big data cloud analysis and greatly improves the transmission efficiency by combining the current 5G technology.
And the voice recognition module is used for acquiring the data of the voice acquisition module, recognizing the language type and sending the recognized language type to the voice conversion module so as to facilitate the coding of the voice conversion module.
By adopting the scheme, the module adopts the voice recognition platform for voice recognition of the huge scientific news, and the voice accuracy rate is ensured in the voice recognition process.
Furthermore, the voice acquisition module performs noise reduction, restoration and splitting on the audio data through processing the voice frequency, so that the audio signal is converted into a character signal with punctuation marks.
By adopting the scheme, punctuation mark adding during character conversion can be completed.
Further, the subtitle processing module obtains the converted text signals through the voice conversion module, and detects and corrects the problems of wrongly written characters, punctuation and punctuation marks of the text signals through the context.
By adopting the scheme, the accuracy of character conversion is improved, and the formation of wrong sentences and words is reduced.
Furthermore, the voice recognition module can also recognize blank voice, mark the blank voice signal, and stop transmitting the limit number to the caption transmission processing module after the voice conversion module receives the mark signal.
By adopting the scheme, the idle running power consumption of the equipment is avoided by identifying the blank audio.
Furthermore, the speech recognition module can also recognize the speech speed, and after two sections of speech are collected, corresponding data are recorded according to sentence break and pause, so that the speech is distinguished from blank speech, adaptive adjustment is achieved, and the recognition efficiency of the recognition module is improved.
Through adopting above-mentioned scheme, through the speech number sentence break and the interval data of discernment everyone, form the collection scheme of single live broadcast, the effective people of fundamentally different makes the discernment adjustment, and it is fast to distinguish sentence break or blank audio frequency, effectively saves systematic consumption.
Further, the voice conversion module comprises a big data cloud analysis database.
By adopting the scheme, the conversion accuracy is high, and wrong sentence identification is rapid.
Furthermore, the system adopts 5G transmission, and the conversion efficiency, cost and speed are improved.
By adopting the scheme, the 5G high-efficiency transmission, the voice recognition accuracy and the subtitle display module are realized. The efficiency, the cost and the accuracy provide strong support for the live broadcast industry.
A system method for realizing real-time audio-to-text conversion by network live broadcast comprises the following steps:
s1: the live broadcast equipment is assembled, and parameters of the live broadcast, such as code stream, definition, language type and the like, are configured in a system configuration module before use;
s2: the method comprises the steps that a live broadcast device is connected to an audio and video cable, an audio signal is transmitted to a voice acquisition module of a system through a line during live broadcast, data of the voice acquisition module is submitted to a voice recognition module in real time, the voice recognition module recognizes various language types according to the collected data, and the recognized voice is transmitted to a voice conversion module;
s3: the voice conversion module calculates, identifies and converts the characters at a high speed, and the transmitted characters enter the subtitle processing module;
s4: and the subtitle processing module is used for displaying the processed subtitle to a live video screen to finish the whole operation.
By adopting the scheme, the novel method for displaying the subtitles in real time in the live video broadcasting process is provided, and the defects that the traditional subtitle displaying method consumes financial resources and manpower, is not subjected to real-time random strain and the like are overcome.
Drawings
The invention is further illustrated with reference to the following figures and examples.
Fig. 1 is a schematic block diagram of embodiment 1 of the present invention.
Fig. 2 is a schematic block diagram according to embodiment 2 of the present invention.
Reference numeral, 1, a system configuration module; 2. a voice acquisition module; 3. a voice conversion module; 4. a subtitle processing module; 5. and a voice recognition module.
Detailed Description
While the embodiments of the present invention will be described and illustrated in detail with reference to the accompanying drawings, it is to be understood that the invention is not limited to the specific embodiments disclosed, but is intended to cover various modifications, equivalents, and alternatives falling within the scope of the invention as defined by the appended claims.
For the convenience of understanding the embodiments of the present invention, the following description will be further explained by taking specific embodiments as examples with reference to the drawings, and the embodiments are not to be construed as limiting the embodiments of the present invention.
Embodiment 1 of the present invention is shown with reference to fig. 1, and includes four modules: the system comprises a system configuration module 1, a voice acquisition module 2, a voice conversion module 3 and a subtitle processing module 4. The technical scheme of the invention applies the audio recognition technology, applies the big data cloud analysis, greatly improves the transmission efficiency by combining the current 5G technology, applies the voice recognition platform in the voice recognition technology, ensures the voice accuracy in the voice recognition process, and has strong research and development capability and a set of live broadcast system developed by the company as a national-level new technology enterprise. 5G high-efficiency transmission, voice recognition accuracy and subtitle display module. The efficiency, the cost and the accuracy provide strong support for the live broadcast industry.
The audio-to-text method of the present embodiment 1 is as follows:
s1: the live broadcast equipment is assembled, and parameters of code streams, definition, language types and the like of the live broadcast are configured in a system configuration module 1 before the live broadcast equipment is used;
s2: the method comprises the following steps that a live broadcast device is connected into an audio and video cable, an audio signal is transmitted to a voice acquisition module 2 of a system through a line during live broadcast, and the voice acquisition module 2 transmits recognized voice to a voice conversion module 3;
s3: the voice conversion module 3 calculates, identifies and converts the characters at a high speed, and the transmitted characters enter the caption processing module 4;
s4: and the subtitle processing module 4 is used for displaying the processed subtitle on a live video screen to finish the whole operation.
Further, the voice acquisition module 2 performs noise reduction, restoration and splitting on the audio data by processing the voice frequency, so as to convert the audio signal into a character signal with punctuation marks. The caption processing module 4 acquires the converted text signals through the voice conversion module 3, and detects and corrects the problems of wrongly written characters, punctuation and punctuation marks through the context of the text signals.
Embodiment 2 of the present invention is illustrated with reference to fig. 1, and includes five modules: the system comprises a system configuration module 1, a voice acquisition module 2, a voice conversion module 3, a subtitle processing module 4 and a voice recognition shifting block 5. The system configuration module 1 sets up the parameter for live broadcast equipment, live broadcast equipment and voice acquisition module 2 electric connection, voice acquisition module 2 respectively with voice conversion module 3, voice recognition shifting block 5 electric connection, voice conversion module 3, voice recognition shifting block 5 respectively with subtitle processing module 4 electric connection, subtitle processing module 4 and live broadcast's video screen electric connection. The module is electrically connected with the module through a wire, and 5G transmission is adopted for audio transmission.
The voice recognition module 5 can also recognize blank voice, mark the blank voice signal, and stop transmitting the limit number to the caption transmission processing module after the voice conversion module 5 receives the mark signal. The speech recognition module 5 can also recognize the speech rate, and distinguish the two segments of speech from the blank speech by collecting the two segments of speech and recording the corresponding data according to sentence break and pause, and obtain self-adaptive adjustment, thereby improving the recognition efficiency of the recognition module.
The audio-to-text method of the present embodiment 2 is as follows:
s1: the live broadcast equipment is assembled, and parameters of the live broadcast, such as code stream, definition, language type and the like, are configured in a system configuration module before use;
s2: the method comprises the steps that a live broadcast device is connected to an audio and video cable, an audio signal is transmitted to a voice acquisition module of a system through a line during live broadcast, data of the voice acquisition module is submitted to a voice recognition module in real time, the voice recognition module recognizes various language types according to the collected data, and the recognized voice is transmitted to a voice conversion module;
s3: the voice conversion module calculates, identifies and converts the characters at a high speed, and the transmitted characters enter the subtitle processing module;
s4: and the subtitle processing module is used for displaying the processed subtitle to a live video screen to finish the whole operation.
The embodiment overcomes the defects that the traditional subtitle display method consumes financial resources and manpower, is not real-time and randomly strained and the like.
Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present invention, which are used for illustrating the technical solutions of the present invention and not for limiting the same, and the protection scope of the present invention is not limited thereto, although the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.
Claims (9)
1. A system for realizing real-time audio-to-text conversion in network live broadcast is characterized in that: the system configuration module is included: the system is used for configuring parameters such as stream codes, definition, language types and the like of live broadcast;
the voice acquisition module: the device is connected to live broadcast equipment through a line, and audio signals in the equipment are collected;
a voice conversion module: receiving voice data of a voice acquisition module in real time, and calculating, identifying and converting the voice data into characters at a high speed;
and a subtitle processing module: and acquiring the characters converted by the voice conversion module in real time and editing the characters into a video screen for display.
2. The system of claim 1, wherein the system for converting live audio to text via network broadcasting comprises: the voice recognition module is used for acquiring the data of the voice acquisition module, recognizing the language type and sending the recognized language type to the voice conversion module so as to facilitate the coding of the voice conversion module.
3. The system of claim 2, wherein the system for converting live audio to text via network broadcasting comprises: the voice acquisition module carries out noise reduction, restoration and splitting on the audio data through processing the voice frequency, so that the audio signal is converted into a character signal with punctuation marks.
4. The system of claim 3, wherein the system for converting live audio to text via network broadcasting comprises: the caption processing module obtains the converted character signals through the voice conversion module, and detects and corrects the problems of wrongly written characters, punctuation and punctuation marks of the character signals through the context.
5. The system of claim 4, wherein the system for converting live audio to text via network broadcasting comprises: the voice recognition module can also recognize blank voice, mark blank voice signals, and stop transmitting limit signals to the caption transmission processing module after the voice conversion module receives the mark signals.
6. The system of claim 5, wherein the system for converting live audio to text via network broadcasting comprises: the voice recognition module can also recognize the speed of speech, and after two sections of speech are collected, corresponding data are recorded according to sentence break and pause, so that the speech is distinguished from blank speech, adaptive adjustment is achieved, and the recognition efficiency of the recognition module is improved.
7. The system of claim 6, wherein the system for converting live audio to text via network broadcasting comprises: the voice conversion module comprises a big data cloud analysis database.
8. The system of claim 7, wherein the system for converting live audio to text via network broadcasting comprises: the system adopts 5G transmission, and improves the conversion efficiency, cost and speed.
9. A system method for realizing real-time audio-to-text conversion in network live broadcast is characterized in that: the method comprises the following steps:
s1: the live broadcast equipment is assembled, and parameters of the live broadcast, such as code stream, definition, language type and the like, are configured in a system configuration module before use;
s2: the method comprises the steps that a live broadcast device is connected to an audio and video cable, an audio signal is transmitted to a voice acquisition module of a system through a line during live broadcast, data of the voice acquisition module is submitted to a voice recognition module in real time, the voice recognition module recognizes various language types according to the collected data, and the recognized voice is transmitted to a voice conversion module;
s3: the voice conversion module calculates, identifies and converts the characters at a high speed, and the transmitted characters enter the subtitle processing module;
s4: and the subtitle processing module is used for displaying the processed subtitle to a live video screen to finish the whole operation.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110252429.XA CN113115103A (en) | 2021-03-09 | 2021-03-09 | System and method for realizing real-time audio-to-text conversion in network live broadcast |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110252429.XA CN113115103A (en) | 2021-03-09 | 2021-03-09 | System and method for realizing real-time audio-to-text conversion in network live broadcast |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113115103A true CN113115103A (en) | 2021-07-13 |
Family
ID=76710697
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110252429.XA Pending CN113115103A (en) | 2021-03-09 | 2021-03-09 | System and method for realizing real-time audio-to-text conversion in network live broadcast |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113115103A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113873306A (en) * | 2021-09-23 | 2021-12-31 | 深圳市多狗乐智能研发有限公司 | Method for projecting real-time translation caption superposition picture to live broadcast room through hardware |
CN115002502A (en) * | 2022-07-29 | 2022-09-02 | 广州市千钧网络科技有限公司 | Data processing method and server |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105704538A (en) * | 2016-03-17 | 2016-06-22 | 广东小天才科技有限公司 | Method and system for generating audio and video subtitles |
CN105913845A (en) * | 2016-04-26 | 2016-08-31 | 惠州Tcl移动通信有限公司 | Mobile terminal voice recognition and subtitle generation method and system and mobile terminal |
CN106340294A (en) * | 2016-09-29 | 2017-01-18 | 安徽声讯信息技术有限公司 | Synchronous translation-based news live streaming subtitle on-line production system |
CN108600773A (en) * | 2018-04-25 | 2018-09-28 | 腾讯科技(深圳)有限公司 | Caption data method for pushing, subtitle methods of exhibiting, device, equipment and medium |
KR102135643B1 (en) * | 2019-09-04 | 2020-07-20 | (주) 소프트기획 | Real-time intelligent shorthand service providing system using voice recognition engine |
-
2021
- 2021-03-09 CN CN202110252429.XA patent/CN113115103A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105704538A (en) * | 2016-03-17 | 2016-06-22 | 广东小天才科技有限公司 | Method and system for generating audio and video subtitles |
CN105913845A (en) * | 2016-04-26 | 2016-08-31 | 惠州Tcl移动通信有限公司 | Mobile terminal voice recognition and subtitle generation method and system and mobile terminal |
CN106340294A (en) * | 2016-09-29 | 2017-01-18 | 安徽声讯信息技术有限公司 | Synchronous translation-based news live streaming subtitle on-line production system |
CN108600773A (en) * | 2018-04-25 | 2018-09-28 | 腾讯科技(深圳)有限公司 | Caption data method for pushing, subtitle methods of exhibiting, device, equipment and medium |
KR102135643B1 (en) * | 2019-09-04 | 2020-07-20 | (주) 소프트기획 | Real-time intelligent shorthand service providing system using voice recognition engine |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113873306A (en) * | 2021-09-23 | 2021-12-31 | 深圳市多狗乐智能研发有限公司 | Method for projecting real-time translation caption superposition picture to live broadcast room through hardware |
CN115002502A (en) * | 2022-07-29 | 2022-09-02 | 广州市千钧网络科技有限公司 | Data processing method and server |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110085213B (en) | Audio abnormity monitoring method, device, equipment and storage medium | |
CN113115103A (en) | System and method for realizing real-time audio-to-text conversion in network live broadcast | |
US8655654B2 (en) | Generating representations of group interactions | |
US11227620B2 (en) | Information processing apparatus and information processing method | |
CN109089173B (en) | Method and system for detecting advertisement delivery of smart television terminal | |
CN103607635A (en) | Method, device and terminal for caption identification | |
CN106095903A (en) | A kind of radio and television the analysis of public opinion method and system based on degree of depth learning art | |
CN1404688A (en) | Apparatus and method of program classification using observed cues in the transcript information | |
CN104246874A (en) | Media synchronisation system | |
CN103117058A (en) | Multi-voice engine switch system and method based on intelligent television platform | |
CN102547139A (en) | Method for splitting news video program, and method and system for cataloging news videos | |
CN110881115B (en) | Strip splitting method and system for conference video | |
CN103123787A (en) | Method for synchronizing and exchanging mobile terminal with media | |
CN105227966A (en) | To televise control method, server and control system of televising | |
CN108600776B (en) | System and method for safe broadcast control | |
CN111107284B (en) | Real-time generation system and generation method for video subtitles | |
CN112000938A (en) | Power grid dispatching identity authentication method and system based on multimode identification | |
CN102890931A (en) | Method for increasing voice recognition rate | |
CN104349182B (en) | The method of the intelligent terminal media play content feedback realized by sound channel | |
CN101771845A (en) | File play handling method and device and player | |
CN102148939A (en) | Method, device and television for real-time displaying subtitles of television program | |
CN112599130B (en) | Intelligent conference system based on intelligent screen | |
CN112261331A (en) | Recording and broadcasting system supporting intelligent AI teaching analysis | |
CN103369361A (en) | Image data echo control method, server and terminal | |
TW201351393A (en) | Playing method and apparatus |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210713 |
|
RJ01 | Rejection of invention patent application after publication |