CN113068058A

CN113068058A - Real-time subtitle on-screen live broadcasting system based on voice recognition and transcription technology

Info

Publication number: CN113068058A
Application number: CN202110297837.7A
Authority: CN
Inventors: 李广垒; 陈祖涛
Original assignee: Anhui Baoxin Information Technology Co ltd
Current assignee: Anhui Baoxin Information Technology Co ltd
Priority date: 2021-03-19
Filing date: 2021-03-19
Publication date: 2021-07-02

Abstract

The invention discloses a real-time subtitle on-screen live broadcasting system based on voice recognition and transcription technologies, which comprises a voice acquisition module, a voice noise elimination module, a character conversion module, a character voice library, a character verification module, a data receiving module, a data processing module, a master control module and a subtitle playing module, wherein the voice acquisition module is used for acquiring a voice signal; the voice acquisition module comprises two voice acquisition terminals, and the voice acquisition terminals are used for acquiring real-time voice information during live broadcasting; the real-time voice information is sent to a voice denoising module, the voice denoising module performs denoising processing on the received real-time voice information, and the denoised voice information is obtained after the denoising processing; and the voice information subjected to noise elimination is sent to a character conversion module, and the character conversion module sends the acquired voice information subjected to noise elimination to a character voice library for voice-to-character processing. The invention can prepare voice to text and provide more accurate caption information.

Description

Real-time subtitle on-screen live broadcasting system based on voice recognition and transcription technology

Technical Field

The invention relates to the field of voice recognition, in particular to a real-time subtitle on-screen live broadcasting system based on voice recognition and transcription technologies.

Background

The language identification refers to the process that a computer operates on language symbols used in daily life by using limited characteristics or rules to identify characters or words, the character voice conversion technology is a voice generation technology based on a voice synthesis technology, and can convert texts in the computer into continuous natural languages or convert the natural languages into a form of characters, and a subtitle on-screen live broadcasting system is needed when voice contents are converted into characters and displayed in a subtitle form in the live broadcasting process.

The existing live system for subtitle on-screen display is used, errors are easy to occur when voice is converted into characters, the subtitle is inaccurate, and certain influence is brought to the use of the live system for subtitle on-screen display.

Disclosure of Invention

The technical problem to be solved by the invention is as follows: how to solve current subtitle live broadcast system of going to the screen, use from the centre, make mistakes easily when pronunciation commentaries on classics word, the subtitle that leads to is inaccurate, the problem of certain influence has been brought for the use of subtitle live broadcast system of going to the screen, provides a real-time subtitle live broadcast system of going to the screen based on speech recognition and transcription technique.

The invention solves the technical problems through the following technical scheme, and the invention comprises a voice acquisition module, a voice noise elimination module, a character conversion module, a character voice library, a character verification module, a data receiving module, a data processing module, a master control module and a subtitle playing module;

the voice acquisition module comprises two voice acquisition terminals, and the voice acquisition terminals are used for acquiring real-time voice information during live broadcasting;

the real-time voice information is sent to a voice denoising module, the voice denoising module performs denoising processing on the received real-time voice information, and the denoised voice information is obtained after the denoising processing;

the voice information subjected to noise elimination is sent to a character conversion module, and the character conversion module sends the obtained voice information subjected to noise elimination to a character voice library for voice-to-character conversion processing to obtain converted character information;

the character information is sent to a character verification module, and the character verification module is used for performing character verification processing on the converted character information to obtain standard character information;

the standard text information is sent to a data receiving module, and the data receiving module converts the standard text information and processes the standard text information to play text contents;

and the master control module controls the subtitle playing module to synchronously play the text content.

Preferably, the specific processing procedure of the voice acquisition module is as follows:

the method comprises the following steps: the two voice acquisition terminals synchronously acquire voice information and respectively mark the voice information as M1 and M2;

step two: synchronously accelerating and playing the voice information M1 and the voice information M2, extracting the voice information M1 and the voice information M2 which are less than a preset value, and marking the voice information M1 and the voice information M2 as Ki, i is 1 … … n;

step three: all Ki are combined, and then the voice information M1 and the rest of the voice information M2 are combined to obtain the combined voice information M_Andspeech information M_Andi.e. the voice information to be denoised.

Preferably, the voice denoising module performs denoising processing in the following specific process: and (3) leading the voice information needing voice denoising into a voice denoising module, automatically eliminating information irrelevant to the current task through a soft thresholding layer of a self-adaptive threshold value by a deep residual shrinkage network in the voice denoising module, accurately identifying strong noise data, and eliminating strong noise, wherein the voice information with the noise eliminated is obtained after the strong noise is eliminated.

Preferably, the specific process of the text conversion module for performing text conversion is as follows:

the method comprises the following steps: importing voice information subjected to noise reduction processing, marking the voice information as P, and importing the voice information P into a character voice library;

step two: leading the voice information P into a character voice library for matching processing;

step three: when the similarity between the voice information of the voice information P and the voice characters prestored in the character voice library exceeds a preset value, the character matching is indicated to be successful, namely the extracted character is marked as an identification character;

step four: and arranging and combining all the identification characters according to the identification time to obtain the converted character information.

Preferably, the specific processing procedure of the text verification module is as follows: and extracting the converted character information, transmitting the converted character information back to a character voice library, performing a character-to-voice process, and when the similarity between the voice information converted by the character-to-voice process and the original input voice exceeds a preset value, verifying that the character passes through, and marking the verified character as standard character information.

Compared with the prior art, the invention has the following advantages: this real-time subtitle on-screen live broadcast system based on speech recognition and transcription technique through in the pronunciation collection stage, connects the processing that carries out more clarity to the speech information, can the effectual quality that promotes the speech information who gathers to promote the accuracy of conversion characters, at pronunciation commentaries on classics characters stage simultaneously, after the conversion succeeds, go forward to the pronunciation again and verify, the degree of accuracy of further assurance characters conversion has promoted the degree of accuracy of this system, lets the system be worth using widely more.

Drawings

FIG. 1 is a system block diagram of the present invention.

Detailed Description

The following examples are given for the detailed implementation and specific operation of the present invention, but the scope of the present invention is not limited to the following examples.

As shown in fig. 1, the present embodiment provides a technical solution: a real-time subtitle on-screen live broadcast system based on voice recognition and transcription technology comprises a voice acquisition module, a voice noise elimination module, a character conversion module, a character voice library, a character verification module, a data receiving module, a data processing module, a master control module and a subtitle playing module;

The specific processing process of the voice acquisition module for the voice acquisition module is as follows:

The voice denoising module performs denoising processing specifically as follows: and (3) leading the voice information needing voice denoising into a voice denoising module, automatically eliminating information irrelevant to the current task through a soft thresholding layer of a self-adaptive threshold value by a deep residual shrinkage network in the voice denoising module, accurately identifying strong noise data, and eliminating strong noise, wherein the voice information with the noise eliminated is obtained after the strong noise is eliminated.

The specific process of the character conversion module for performing character conversion is as follows:

The specific processing procedure of the character verification module is as follows: and extracting the converted character information, transmitting the converted character information back to a character voice library, performing a character-to-voice process, and when the similarity between the voice information converted by the character-to-voice process and the original input voice exceeds a preset value, verifying that the character passes through, and marking the verified character as standard character information.

In summary, when the invention is used, the voice collecting module includes two voice collecting terminals, the voice collecting terminal is used for collecting real-time voice information during live broadcasting, the real-time voice information is sent to the voice denoising module, the voice denoising module performs denoising processing on the received real-time voice information, the denoised voice information is obtained after denoising processing, the denoised voice information is sent to the text conversion module, the text conversion module sends the obtained denoised voice information to the text voice library for voice text-to-text processing, converted text information is obtained, the text information is sent to the text verification module, the text verification module is used for performing text verification processing on the converted text information to obtain standard text information, the standard text information is sent to the data receiving module, the data receiving module converts the standard text information, the master control module controls the caption playing module to play the text content synchronously.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims

1. A real-time subtitle on-screen live broadcast system based on voice recognition and transcription technology is characterized by comprising a voice acquisition module, a voice noise elimination module, a character conversion module, a character voice library, a character verification module, a data receiving module, a data processing module, a master control module and a subtitle playing module;

2. The system of claim 1, wherein the system comprises: the specific processing process of the voice acquisition module for the voice acquisition module is as follows:

3. The system of claim 1, wherein the system comprises: the voice denoising module performs denoising processing specifically as follows: and (3) leading the voice information needing voice denoising into a voice denoising module, automatically eliminating information irrelevant to the current task through a soft thresholding layer of a self-adaptive threshold value by a deep residual shrinkage network in the voice denoising module, accurately identifying strong noise data, and eliminating strong noise, wherein the voice information with the noise eliminated is obtained after the strong noise is eliminated.

4. The system of claim 1, wherein the system comprises: the specific process of the character conversion module for performing character conversion is as follows:

5. The system of claim 1, wherein the system comprises: the specific processing procedure of the character verification module is as follows: and extracting the converted character information, transmitting the converted character information back to a character voice library, performing a character-to-voice process, and when the similarity between the voice information converted by the character-to-voice process and the original input voice exceeds a preset value, verifying that the character passes through, and marking the verified character as standard character information.