CN113115103A - System and method for realizing real-time audio-to-text conversion in network live broadcast - Google Patents

System and method for realizing real-time audio-to-text conversion in network live broadcast Download PDF

Info

Publication number
CN113115103A
CN113115103A CN202110252429.XA CN202110252429A CN113115103A CN 113115103 A CN113115103 A CN 113115103A CN 202110252429 A CN202110252429 A CN 202110252429A CN 113115103 A CN113115103 A CN 113115103A
Authority
CN
China
Prior art keywords
voice
module
audio
live broadcast
characters
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110252429.XA
Other languages
Chinese (zh)
Inventor
王旭辉
易长安
王旭伟
李洋
张玉龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Maiqu Network Technology Co ltd
Original Assignee
Hangzhou Maiqu Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Maiqu Network Technology Co ltd filed Critical Hangzhou Maiqu Network Technology Co ltd
Priority to CN202110252429.XA priority Critical patent/CN113115103A/en
Publication of CN113115103A publication Critical patent/CN113115103A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • H04N21/4398Processing of audio elementary streams involving reformatting operations of audio signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/34Adaptation of a single recogniser for parallel processing, e.g. by use of multiple processors or cloud computing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/488Data services, e.g. news ticker
    • H04N21/4884Data services, e.g. news ticker for displaying subtitles

Abstract

The invention relates to a system and a method for realizing real-time audio-to-text conversion in network live broadcast, which comprises a system configuration module: the system is used for configuring parameters such as stream codes, definition, language types and the like of live broadcast; the voice acquisition module: the device is connected to live broadcast equipment through a line, and audio signals in the equipment are collected; a voice conversion module: receiving voice data of a voice acquisition module in real time, and calculating, identifying and converting the voice data into characters at a high speed; and a subtitle processing module: and acquiring the characters converted by the voice conversion module in real time and editing the characters into a video screen for display. The invention designs a system and a method for converting voice into characters. The method is based on an audio recognition technology, aims to provide a novel method for displaying subtitles in real time in a live video broadcast process, and overcomes the defects that the traditional subtitle displaying method consumes financial resources and manpower, is not timely and randomly strained and the like.

Description

System and method for realizing real-time audio-to-text conversion in network live broadcast
Technical Field
The invention relates to the technical field of audio-to-text conversion, in particular to a system and a method for realizing real-time audio-to-text conversion through network live broadcast.
Background
With the rapid development of economy in China, people have increasingly improved lives, and in various regions, theatres, enterprises and the like, the characteristics of the people need to be displayed, and activities need to be organized to display performances. China is large in population, and people cannot arrive at the scene to watch the ground, so that people can watch the performance activities of all places at home by means of network live broadcast. In the drama of the performance, various language display forms such as songs, operas, small articles, phase sounds and recitations exist, the languages are various, some languages cannot be understood, and people want to visually see results in order to promote cultural communication, so that the results need to be displayed in characters. The live caption is very important at this time, and the live caption is still a short plate in the prior art at present, and the caption switching is manually carried out by means of a high-price machine and then arranging characters in advance to contrast with script lines to follow the scene condition. The technology can realize the display of the subtitles at present, but is financial and labor-consuming, the situation on the stage sometimes changes, and the subtitles may not be matched or have no subtitles if the situation changes.
Disclosure of Invention
In view of the defects in the background art, the invention relates to a system and a method for realizing real-time audio-to-text conversion in network live broadcast. The method is based on an audio recognition technology, aims to provide a novel method for displaying subtitles in real time in a live video broadcast process, and overcomes the defects that the traditional subtitle displaying method consumes financial resources and manpower, is not timely and randomly strained and the like.
The invention relates to a system and a method for realizing real-time audio-to-text conversion in network live broadcast, which comprises a system configuration module: the system is used for configuring parameters such as stream codes, definition, language types and the like of live broadcast; the voice acquisition module: the device is connected to live broadcast equipment through a line, and audio signals in the equipment are collected; a voice conversion module: receiving voice data of a voice acquisition module in real time, and calculating, identifying and converting the voice data into characters at a high speed; and a subtitle processing module: and acquiring the characters converted by the voice conversion module in real time and editing the characters into a video screen for display.
By adopting the scheme, the technical scheme of the invention applies the audio recognition technology, applies big data cloud analysis and greatly improves the transmission efficiency by combining the current 5G technology.
And the voice recognition module is used for acquiring the data of the voice acquisition module, recognizing the language type and sending the recognized language type to the voice conversion module so as to facilitate the coding of the voice conversion module.
By adopting the scheme, the module adopts the voice recognition platform for voice recognition of the huge scientific news, and the voice accuracy rate is ensured in the voice recognition process.
Furthermore, the voice acquisition module performs noise reduction, restoration and splitting on the audio data through processing the voice frequency, so that the audio signal is converted into a character signal with punctuation marks.
By adopting the scheme, punctuation mark adding during character conversion can be completed.
Further, the subtitle processing module obtains the converted text signals through the voice conversion module, and detects and corrects the problems of wrongly written characters, punctuation and punctuation marks of the text signals through the context.
By adopting the scheme, the accuracy of character conversion is improved, and the formation of wrong sentences and words is reduced.
Furthermore, the voice recognition module can also recognize blank voice, mark the blank voice signal, and stop transmitting the limit number to the caption transmission processing module after the voice conversion module receives the mark signal.
By adopting the scheme, the idle running power consumption of the equipment is avoided by identifying the blank audio.
Furthermore, the speech recognition module can also recognize the speech speed, and after two sections of speech are collected, corresponding data are recorded according to sentence break and pause, so that the speech is distinguished from blank speech, adaptive adjustment is achieved, and the recognition efficiency of the recognition module is improved.
Through adopting above-mentioned scheme, through the speech number sentence break and the interval data of discernment everyone, form the collection scheme of single live broadcast, the effective people of fundamentally different makes the discernment adjustment, and it is fast to distinguish sentence break or blank audio frequency, effectively saves systematic consumption.
Further, the voice conversion module comprises a big data cloud analysis database.
By adopting the scheme, the conversion accuracy is high, and wrong sentence identification is rapid.
Furthermore, the system adopts 5G transmission, and the conversion efficiency, cost and speed are improved.
By adopting the scheme, the 5G high-efficiency transmission, the voice recognition accuracy and the subtitle display module are realized. The efficiency, the cost and the accuracy provide strong support for the live broadcast industry.
A system method for realizing real-time audio-to-text conversion by network live broadcast comprises the following steps:
s1: the live broadcast equipment is assembled, and parameters of the live broadcast, such as code stream, definition, language type and the like, are configured in a system configuration module before use;
s2: the method comprises the steps that a live broadcast device is connected to an audio and video cable, an audio signal is transmitted to a voice acquisition module of a system through a line during live broadcast, data of the voice acquisition module is submitted to a voice recognition module in real time, the voice recognition module recognizes various language types according to the collected data, and the recognized voice is transmitted to a voice conversion module;
s3: the voice conversion module calculates, identifies and converts the characters at a high speed, and the transmitted characters enter the subtitle processing module;
s4: and the subtitle processing module is used for displaying the processed subtitle to a live video screen to finish the whole operation.
By adopting the scheme, the novel method for displaying the subtitles in real time in the live video broadcasting process is provided, and the defects that the traditional subtitle displaying method consumes financial resources and manpower, is not subjected to real-time random strain and the like are overcome.
Drawings
The invention is further illustrated with reference to the following figures and examples.
Fig. 1 is a schematic block diagram of embodiment 1 of the present invention.
Fig. 2 is a schematic block diagram according to embodiment 2 of the present invention.
Reference numeral, 1, a system configuration module; 2. a voice acquisition module; 3. a voice conversion module; 4. a subtitle processing module; 5. and a voice recognition module.
Detailed Description
While the embodiments of the present invention will be described and illustrated in detail with reference to the accompanying drawings, it is to be understood that the invention is not limited to the specific embodiments disclosed, but is intended to cover various modifications, equivalents, and alternatives falling within the scope of the invention as defined by the appended claims.
For the convenience of understanding the embodiments of the present invention, the following description will be further explained by taking specific embodiments as examples with reference to the drawings, and the embodiments are not to be construed as limiting the embodiments of the present invention.
Embodiment 1 of the present invention is shown with reference to fig. 1, and includes four modules: the system comprises a system configuration module 1, a voice acquisition module 2, a voice conversion module 3 and a subtitle processing module 4. The technical scheme of the invention applies the audio recognition technology, applies the big data cloud analysis, greatly improves the transmission efficiency by combining the current 5G technology, applies the voice recognition platform in the voice recognition technology, ensures the voice accuracy in the voice recognition process, and has strong research and development capability and a set of live broadcast system developed by the company as a national-level new technology enterprise. 5G high-efficiency transmission, voice recognition accuracy and subtitle display module. The efficiency, the cost and the accuracy provide strong support for the live broadcast industry.
The audio-to-text method of the present embodiment 1 is as follows:
s1: the live broadcast equipment is assembled, and parameters of code streams, definition, language types and the like of the live broadcast are configured in a system configuration module 1 before the live broadcast equipment is used;
s2: the method comprises the following steps that a live broadcast device is connected into an audio and video cable, an audio signal is transmitted to a voice acquisition module 2 of a system through a line during live broadcast, and the voice acquisition module 2 transmits recognized voice to a voice conversion module 3;
s3: the voice conversion module 3 calculates, identifies and converts the characters at a high speed, and the transmitted characters enter the caption processing module 4;
s4: and the subtitle processing module 4 is used for displaying the processed subtitle on a live video screen to finish the whole operation.
Further, the voice acquisition module 2 performs noise reduction, restoration and splitting on the audio data by processing the voice frequency, so as to convert the audio signal into a character signal with punctuation marks. The caption processing module 4 acquires the converted text signals through the voice conversion module 3, and detects and corrects the problems of wrongly written characters, punctuation and punctuation marks through the context of the text signals.
Embodiment 2 of the present invention is illustrated with reference to fig. 1, and includes five modules: the system comprises a system configuration module 1, a voice acquisition module 2, a voice conversion module 3, a subtitle processing module 4 and a voice recognition shifting block 5. The system configuration module 1 sets up the parameter for live broadcast equipment, live broadcast equipment and voice acquisition module 2 electric connection, voice acquisition module 2 respectively with voice conversion module 3, voice recognition shifting block 5 electric connection, voice conversion module 3, voice recognition shifting block 5 respectively with subtitle processing module 4 electric connection, subtitle processing module 4 and live broadcast's video screen electric connection. The module is electrically connected with the module through a wire, and 5G transmission is adopted for audio transmission.
The voice recognition module 5 can also recognize blank voice, mark the blank voice signal, and stop transmitting the limit number to the caption transmission processing module after the voice conversion module 5 receives the mark signal. The speech recognition module 5 can also recognize the speech rate, and distinguish the two segments of speech from the blank speech by collecting the two segments of speech and recording the corresponding data according to sentence break and pause, and obtain self-adaptive adjustment, thereby improving the recognition efficiency of the recognition module.
The audio-to-text method of the present embodiment 2 is as follows:
s1: the live broadcast equipment is assembled, and parameters of the live broadcast, such as code stream, definition, language type and the like, are configured in a system configuration module before use;
s2: the method comprises the steps that a live broadcast device is connected to an audio and video cable, an audio signal is transmitted to a voice acquisition module of a system through a line during live broadcast, data of the voice acquisition module is submitted to a voice recognition module in real time, the voice recognition module recognizes various language types according to the collected data, and the recognized voice is transmitted to a voice conversion module;
s3: the voice conversion module calculates, identifies and converts the characters at a high speed, and the transmitted characters enter the subtitle processing module;
s4: and the subtitle processing module is used for displaying the processed subtitle to a live video screen to finish the whole operation.
The embodiment overcomes the defects that the traditional subtitle display method consumes financial resources and manpower, is not real-time and randomly strained and the like.
Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present invention, which are used for illustrating the technical solutions of the present invention and not for limiting the same, and the protection scope of the present invention is not limited thereto, although the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (9)

1. A system for realizing real-time audio-to-text conversion in network live broadcast is characterized in that: the system configuration module is included: the system is used for configuring parameters such as stream codes, definition, language types and the like of live broadcast;
the voice acquisition module: the device is connected to live broadcast equipment through a line, and audio signals in the equipment are collected;
a voice conversion module: receiving voice data of a voice acquisition module in real time, and calculating, identifying and converting the voice data into characters at a high speed;
and a subtitle processing module: and acquiring the characters converted by the voice conversion module in real time and editing the characters into a video screen for display.
2. The system of claim 1, wherein the system for converting live audio to text via network broadcasting comprises: the voice recognition module is used for acquiring the data of the voice acquisition module, recognizing the language type and sending the recognized language type to the voice conversion module so as to facilitate the coding of the voice conversion module.
3. The system of claim 2, wherein the system for converting live audio to text via network broadcasting comprises: the voice acquisition module carries out noise reduction, restoration and splitting on the audio data through processing the voice frequency, so that the audio signal is converted into a character signal with punctuation marks.
4. The system of claim 3, wherein the system for converting live audio to text via network broadcasting comprises: the caption processing module obtains the converted character signals through the voice conversion module, and detects and corrects the problems of wrongly written characters, punctuation and punctuation marks of the character signals through the context.
5. The system of claim 4, wherein the system for converting live audio to text via network broadcasting comprises: the voice recognition module can also recognize blank voice, mark blank voice signals, and stop transmitting limit signals to the caption transmission processing module after the voice conversion module receives the mark signals.
6. The system of claim 5, wherein the system for converting live audio to text via network broadcasting comprises: the voice recognition module can also recognize the speed of speech, and after two sections of speech are collected, corresponding data are recorded according to sentence break and pause, so that the speech is distinguished from blank speech, adaptive adjustment is achieved, and the recognition efficiency of the recognition module is improved.
7. The system of claim 6, wherein the system for converting live audio to text via network broadcasting comprises: the voice conversion module comprises a big data cloud analysis database.
8. The system of claim 7, wherein the system for converting live audio to text via network broadcasting comprises: the system adopts 5G transmission, and improves the conversion efficiency, cost and speed.
9. A system method for realizing real-time audio-to-text conversion in network live broadcast is characterized in that: the method comprises the following steps:
s1: the live broadcast equipment is assembled, and parameters of the live broadcast, such as code stream, definition, language type and the like, are configured in a system configuration module before use;
s2: the method comprises the steps that a live broadcast device is connected to an audio and video cable, an audio signal is transmitted to a voice acquisition module of a system through a line during live broadcast, data of the voice acquisition module is submitted to a voice recognition module in real time, the voice recognition module recognizes various language types according to the collected data, and the recognized voice is transmitted to a voice conversion module;
s3: the voice conversion module calculates, identifies and converts the characters at a high speed, and the transmitted characters enter the subtitle processing module;
s4: and the subtitle processing module is used for displaying the processed subtitle to a live video screen to finish the whole operation.
CN202110252429.XA 2021-03-09 2021-03-09 System and method for realizing real-time audio-to-text conversion in network live broadcast Pending CN113115103A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110252429.XA CN113115103A (en) 2021-03-09 2021-03-09 System and method for realizing real-time audio-to-text conversion in network live broadcast

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110252429.XA CN113115103A (en) 2021-03-09 2021-03-09 System and method for realizing real-time audio-to-text conversion in network live broadcast

Publications (1)

Publication Number Publication Date
CN113115103A true CN113115103A (en) 2021-07-13

Family

ID=76710697

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110252429.XA Pending CN113115103A (en) 2021-03-09 2021-03-09 System and method for realizing real-time audio-to-text conversion in network live broadcast

Country Status (1)

Country Link
CN (1) CN113115103A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113873306A (en) * 2021-09-23 2021-12-31 深圳市多狗乐智能研发有限公司 Method for projecting real-time translation caption superposition picture to live broadcast room through hardware
CN115002502A (en) * 2022-07-29 2022-09-02 广州市千钧网络科技有限公司 Data processing method and server

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105704538A (en) * 2016-03-17 2016-06-22 广东小天才科技有限公司 Method and system for generating audio and video subtitles
CN105913845A (en) * 2016-04-26 2016-08-31 惠州Tcl移动通信有限公司 Mobile terminal voice recognition and subtitle generation method and system and mobile terminal
CN106340294A (en) * 2016-09-29 2017-01-18 安徽声讯信息技术有限公司 Synchronous translation-based news live streaming subtitle on-line production system
CN108600773A (en) * 2018-04-25 2018-09-28 腾讯科技(深圳)有限公司 Caption data method for pushing, subtitle methods of exhibiting, device, equipment and medium
KR102135643B1 (en) * 2019-09-04 2020-07-20 (주) 소프트기획 Real-time intelligent shorthand service providing system using voice recognition engine

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105704538A (en) * 2016-03-17 2016-06-22 广东小天才科技有限公司 Method and system for generating audio and video subtitles
CN105913845A (en) * 2016-04-26 2016-08-31 惠州Tcl移动通信有限公司 Mobile terminal voice recognition and subtitle generation method and system and mobile terminal
CN106340294A (en) * 2016-09-29 2017-01-18 安徽声讯信息技术有限公司 Synchronous translation-based news live streaming subtitle on-line production system
CN108600773A (en) * 2018-04-25 2018-09-28 腾讯科技(深圳)有限公司 Caption data method for pushing, subtitle methods of exhibiting, device, equipment and medium
KR102135643B1 (en) * 2019-09-04 2020-07-20 (주) 소프트기획 Real-time intelligent shorthand service providing system using voice recognition engine

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113873306A (en) * 2021-09-23 2021-12-31 深圳市多狗乐智能研发有限公司 Method for projecting real-time translation caption superposition picture to live broadcast room through hardware
CN115002502A (en) * 2022-07-29 2022-09-02 广州市千钧网络科技有限公司 Data processing method and server

Similar Documents

Publication Publication Date Title
CN110085213B (en) Audio abnormity monitoring method, device, equipment and storage medium
CN113115103A (en) System and method for realizing real-time audio-to-text conversion in network live broadcast
US8655654B2 (en) Generating representations of group interactions
US11227620B2 (en) Information processing apparatus and information processing method
CN109089173B (en) Method and system for detecting advertisement delivery of smart television terminal
CN103607635A (en) Method, device and terminal for caption identification
CN106095903A (en) A kind of radio and television the analysis of public opinion method and system based on degree of depth learning art
CN1404688A (en) Apparatus and method of program classification using observed cues in the transcript information
CN104246874A (en) Media synchronisation system
CN103117058A (en) Multi-voice engine switch system and method based on intelligent television platform
CN102547139A (en) Method for splitting news video program, and method and system for cataloging news videos
CN110881115B (en) Strip splitting method and system for conference video
CN103123787A (en) Method for synchronizing and exchanging mobile terminal with media
CN105227966A (en) To televise control method, server and control system of televising
CN108600776B (en) System and method for safe broadcast control
CN111107284B (en) Real-time generation system and generation method for video subtitles
CN112000938A (en) Power grid dispatching identity authentication method and system based on multimode identification
CN102890931A (en) Method for increasing voice recognition rate
CN104349182B (en) The method of the intelligent terminal media play content feedback realized by sound channel
CN101771845A (en) File play handling method and device and player
CN102148939A (en) Method, device and television for real-time displaying subtitles of television program
CN112599130B (en) Intelligent conference system based on intelligent screen
CN112261331A (en) Recording and broadcasting system supporting intelligent AI teaching analysis
CN103369361A (en) Image data echo control method, server and terminal
TW201351393A (en) Playing method and apparatus

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210713

RJ01 Rejection of invention patent application after publication