CN113207032A

CN113207032A - System and method for increasing subtitles by recording videos in intelligent classroom

Info

Publication number: CN113207032A
Application number: CN202110477210.XA
Authority: CN
Inventors: 秦曙光; 陈家峰
Original assignee: Readboy Education Technology Co Ltd
Current assignee: Readboy Education Technology Co Ltd
Priority date: 2021-04-29
Filing date: 2021-04-29
Publication date: 2021-08-03

Abstract

The invention discloses a system and a method for adding subtitles to a video recorded in an intelligent classroom, which are used for automatically extracting audio, identifying voice and aligning subtitles to the recorded video. By using the method and the device, the caption content can be accurately and quickly added to the recorded video, a large amount of manual review and translation work is avoided, the caption generation efficiency is improved, and the course quality is guaranteed.

Description

System and method for increasing subtitles by recording videos in intelligent classroom

Technical Field

The invention relates to the technical field of intelligent classes, in particular to a system and a method for recording videos and adding subtitles in an intelligent class.

Background

Currently, students understand in a live classroom by watching PPT and lectures displayed by teachers. When the content taught by the teacher is not heard clearly, the content is often understood by playing back the live broadcast and combining the content of the PPT. Alternatively, the content that the teacher just explained can be understood by asking other classmates who are watching the live broadcast. However, the above method increases the time consumption of students in class, and cannot make the students know the information to be conveyed by the teachers in real time. Therefore, recorded and broadcast lessons can be played back selectively, and the learning enthusiasm of students can be struck often when the content explained by teachers is not clear in the recorded and broadcast lessons, so that the desire of the students to listen to lessons is reduced, and the effect of live broadcast teaching is reduced.

Disclosure of Invention

Aiming at the defects of the prior art, the invention aims to provide a system and a method for recording videos in an intelligent classroom and adding subtitles.

In order to achieve the purpose, the invention adopts the following technical scheme:

a system for recording videos and adding subtitles in an intelligent classroom comprises:

the video recording module: the system is used for carrying out video recording on live video of an intelligent classroom to obtain a recorded video file;

the audio extraction module: the audio extraction module is used for extracting audio from a recorded video file recorded by the video recording module to obtain a recorded audio file;

a voice recognition module: the system is used for recording audio files for voice recognition, recognizing corresponding text contents, synchronously recording the starting time and the ending time of each audio segment which can be recognized to obtain the text contents, and establishing the association relationship between each audio segment and the text contents;

and a subtitle adding module: the video playing device is used for displaying each segment of text content in each segment of video segment corresponding to the recorded video file according to the starting time and the ending time of each segment of audio segment, wherein the starting displaying time and the ending displaying time of each segment of text content correspond to the starting time and the ending time of the corresponding video segment; finally, the video file with the added subtitle content is obtained;

an editing module: the method is used for the user to modify the video file added with the subtitle content, and comprises the steps of changing the initial display time of the text content and modifying the text content.

Further, in the system, the voice recognition module is configured to perform voice recognition on the recorded audio files in sequence according to a time sequence, record a current time as a start time of the audio segment when the text content is recognized for the first time, and record the time as an end time of the audio segment when the text content cannot be recognized any more after a preset time period elapses from a certain time; and recording the time as the starting time of the next audio segment until the next recognition of the text content, and so on, thereby recognizing each audio segment capable of recognizing the text content and obtaining the starting time and the ending time of each audio segment.

The invention also provides a method for utilizing the system, which comprises the following specific processes:

in the live broadcast process of an intelligent classroom, a video recording module synchronously records videos, and after the live broadcast is finished, the video recording is finished to obtain a recorded video file;

the audio extraction module performs audio extraction on the recorded video file recorded by the video recording module to obtain a recorded audio file;

the voice recognition module carries out voice recognition on the recorded audio file, recognizes corresponding text contents, synchronously records the starting time and the ending time of each audio segment which can be recognized to obtain the text contents, and establishes the association relationship between each audio segment and the text contents;

when a user triggers a caption adding event, the caption adding module displays each segment of text content in each segment of video segment corresponding to the recorded video file according to the starting time and the ending time of each segment of audio segment, wherein the starting display time and the ending display time of each segment of text content correspond to the starting time and the ending time of the corresponding video segment; finally, the video file with the added subtitle content is obtained;

when the user finds that the text content is not matched with the picture of the recorded video file, the initial display time of the corresponding text content can be advanced or pushed back by the editing module so as to be completely matched with the picture of the recorded video file; when the user finds that the text content has errors, the text content can be modified through the editing module.

Further, in the method, the voice recognition module performs voice recognition on the recorded audio files in sequence according to a time sequence, records the current time as the start time of the audio segment when the text content is recognized for the first time, and records the current time as the end time of the audio segment when the text content cannot be recognized any more after a certain time exceeds a preset time; and recording the time as the starting time of the next audio segment until the next recognition of the text content, and so on, thereby recognizing each audio segment capable of recognizing the text content and obtaining the starting time and the ending time of each audio segment.

The invention has the beneficial effects that: by using the method and the device, the caption content can be accurately and quickly added to the recorded video, a large amount of manual review and translation work is avoided, the caption generation efficiency is improved, and the course quality is guaranteed.

Detailed Description

The present invention will be further described below, and it should be noted that the present embodiment is based on the technical solution, and a detailed implementation manner and a specific operation process are provided, but the protection scope of the present invention is not limited to the present embodiment.

Example 1

The embodiment provides a system for recording video and increasing subtitles in an intelligent classroom, which comprises:

Further, the voice recognition module is used for sequentially carrying out voice recognition on the recorded audio files according to a time sequence, recording the current time as the starting time of the audio segment when the text content is recognized for the first time, and recording the time as the ending time of the audio segment when the text content cannot be recognized any more after the preset time length is exceeded from a certain time; and recording the time as the starting time of the next audio segment until the next recognition of the text content, and so on, thereby recognizing each audio segment capable of recognizing the text content and obtaining the starting time and the ending time of each audio segment.

Example 2

The present embodiment provides a method for implementing the system described in embodiment 1, which includes the following specific processes:

In the method, a voice recognition module carries out voice recognition on recorded audio files in sequence according to a time sequence, the current time is recorded as the starting time of the audio segment when the text content is recognized for the first time, and the time is recorded as the ending time of the audio segment when the text content cannot be recognized any more after the preset time length is exceeded from a certain time; and recording the time as the starting time of the next audio segment until the next recognition of the text content, and so on, thereby recognizing each audio segment capable of recognizing the text content and obtaining the starting time and the ending time of each audio segment.

Various corresponding changes and modifications can be made by those skilled in the art based on the above technical solutions and concepts, and all such changes and modifications should be included in the protection scope of the present invention.

Claims

1. The utility model provides a system for recording video increase caption in wisdom classroom which characterized in that includes:

2. The system according to claim 1, wherein the voice recognition module is configured to perform voice recognition on the recorded audio files in sequence according to a time sequence, record a current time as a start time of the audio segment when the text content is recognized for the first time, and record the current time as an end time of the audio segment when the text content cannot be recognized any more after a preset time period elapses from a certain time; and recording the time as the starting time of the next audio segment until the next recognition of the text content, and so on, thereby recognizing each audio segment capable of recognizing the text content and obtaining the starting time and the ending time of each audio segment.

3. A method using the system of any one of claims 1-2, characterized by the specific process of:

4. The method according to claim 3, wherein the voice recognition module performs voice recognition on the recorded audio files in sequence according to a time sequence, records the current time as the start time of the audio segment when the text content is recognized for the first time, and records the current time as the end time of the audio segment when the text content cannot be recognized any more after a preset time period from a certain time; and recording the time as the starting time of the next audio segment until the next recognition of the text content, and so on, thereby recognizing each audio segment capable of recognizing the text content and obtaining the starting time and the ending time of each audio segment.