CN113194356A

CN113194356A - Video subtitle adding method and system

Info

Publication number: CN113194356A
Application number: CN202110310014.3A
Authority: CN
Inventors: 武永鑫
Original assignee: Individual
Current assignee: Individual
Priority date: 2021-03-23
Filing date: 2021-03-23
Publication date: 2021-07-30

Abstract

A video subtitle adds method and system, through obtaining the video object to be carried on subtitle addition, judge the state attribute of the video object; acquiring an audio file contained in a video object, importing the audio file into an ffmpy3 library, and acquiring a storage directory of the video object and the audio file; converting the video object into an audio file by using the processing of the OS module; assigning an audio attribute to the converted audio file by adopting an FFmpeg algorithm; calling an interface in a voice Url form, and calling an API (application programming interface) to transcribe the audio file; after the transfer is finished, obtaining a transfer result and outputting a text file after the transfer; combining the text file with the video object to add the video caption; and exporting the video object added with the video subtitle. The invention realizes the artificial intelligence to add the subtitles to the film, improves the film watching effect and facilitates the watching of foreign language videos by automatically adding the subtitles; manpower and material resources brought by manual translation can be greatly reduced, and the translation cost is reduced.

Description

Video subtitle adding method and system

Technical Field

The invention relates to the technical field of video processing, in particular to a method and a system for adding video subtitles.

Background

As computer networks have developed, videos have become more popular, whether online or offline for movies, or communicated via videos, and have played an increasingly important role in people's lives. Video is also a way to communicate between countries in the present society, which hinders people from further understanding foreign languages due to the variety of language categories and the inability of most people to master multiple languages.

At present, people mostly watch the film by using specific viewing software, wherein the film is manually translated and added with subtitles, and is completely foreign language subtitles or a foreign language film without subtitles, so that the viewing of people is hindered. In addition, when people browse foreign language websites or download movies without subtitles, the problem that people cannot understand is solved, but people can understand the movies without understanding through the subtitles. However, the addition of the subtitles requires manual translation, a translator must be skilled in the language to be translated and the translated language, that is, at least two languages must be refined to realize the addition of the subtitles, and the manual translation is high in cost and time and energy. In summary, a technical solution for adding video subtitles is needed.

Disclosure of Invention

Therefore, the invention provides a video subtitle adding method and a video subtitle adding system, which can realize automatic subtitle adding and subtitle-free translation of videos and solve the problems that the time consumption and the cost are high when subtitles are added to subtitle-free videos only through manual translation.

In order to achieve the above purpose, the invention provides the following technical scheme: in a first aspect, a method for adding video subtitles is provided, which includes the following steps:

acquiring a video object to be subjected to subtitle addition, and judging the state attribute of the video object;

acquiring an audio file contained in the video object: reading the storage directories of the video object and the audio file, and judging whether an audio directory exists, a) when the audio directory exists, storing the audio file into the corresponding audio directory; b) when the audio directory does not exist, creating the audio directory, and storing the audio file into the created audio directory; importing an ffmpy3 library to obtain storage directories of the video objects and the audio files; converting the video object into an audio file by using the processing of an OS module; assigning an audio attribute to the converted audio file by adopting an FFmpeg algorithm;

calling an interface in a voice Url form, and calling an API (application programming interface) to transcribe the audio file;

after the transfer is finished, obtaining a transfer result and outputting a text file after the transfer;

combining the text file with the video object to add video subtitles;

and exporting the video object added with the video subtitle.

As a preferred scheme of the video subtitle adding method, the state attribute of the video object includes a local attribute and an online attribute, the local attribute indicates that the video object is from a local video file, and the online attribute indicates that the video object is from a foreign language website.

As a preferred scheme of the video subtitle adding method, the video object is converted into an audio file in a WAV format by processing of an OS module.

As a preferred embodiment of the video subtitle adding method, the step of transferring the audio file includes:

(a1) pretreatment of

Calling a preprocessing interface, and uploading basic information, fragment information and configurable parameters of an audio file to be transcribed;

(a2) file fragment uploading

If the preprocessing is successful, calling a file uploading interface, and sequentially uploading audio slices according to the slice information;

(a3) merging files

After all the file slices are uploaded successfully, calling a merged file interface, and informing a server side to perform file merging and transferring operations;

(a4) query processing progress

After a calling party sends a file merging request, the server side lists tasks into a plan and inquires whether the task processing progress state reaches a preset value or not;

(a5) obtaining results

And calling an acquisition result interface to acquire a transcription result, actively calling back by the server, and actively sending the transcription result to the configured callback address after the transcription is completed.

As a preferred scheme of the video subtitle adding method, in step (a1), when the pre-processing interface is successfully called, a task ID is returned, where the task ID is a unique identifier of the transfer task and is used as a necessary parameter of a subsequent interface.

As a preferred embodiment of the video subtitle adding method, in step (a2), if the uploaded audio slice is abnormal, the audio slice with failed uploading is retried and executed again.

As a preferred scheme of the video subtitle adding method, in step (a3), the merge file interface does not return a transcription result, the merge file interface notifies the server to list the task in a transcription plan, and the transcription interface is obtained through getrule.

As a preferred scheme of the video subtitle adding method, the text file comprises foreign language and a translation text of the foreign language, and the text file is combined with the video object to carry out video bilingual subtitle adding; video objects are derived after the addition of the video bilingual subtitles.

As a preferred embodiment of the video subtitle adding method, the step of adding the video bilingual subtitle by combining the text file and the video object includes:

(b1) acquiring a file path of the video object and acquiring a file path of the bilingual subtitle;

(b2) importing a corresponding text file comprising bilingual subtitles to the video object;

(b3) acquiring the height and width of a video in the video object;

(b4) and calling ImageMagick to finish the addition of the bilingual subtitles.

In a second aspect, there is provided a video caption adding system, which adopts the method for adding a video caption according to the first aspect or any possible implementation manner thereof, and includes:

the video object loading module is used for acquiring a video object to be subjected to subtitle addition and judging the state attribute of the video object;

an audio file loading module, configured to obtain an audio file included in the video object: reading the storage directories of the video object and the audio file, and judging whether an audio directory exists, a) when the audio directory exists, storing the audio file into the corresponding audio directory; b) when the audio directory does not exist, creating the audio directory, and storing the audio file into the created audio directory;

the storage directory extraction module is used for importing an ffmpy3 library to obtain storage directories of the video objects and the audio files;

the OS module processing module is used for converting the video object into an audio file by utilizing the processing of the OS module;

the audio attribute processing module is used for endowing the converted audio file with audio attributes by adopting an FFmpeg algorithm;

the audio transcription module is used for calling an interface in a voice Url form and calling an API (application program interface) to transcribe the audio file;

the text file output module is used for acquiring a transcription result and outputting a transcribed text file after the transcription is finished;

the subtitle adding module is used for combining the text file and the video object to add the video subtitle;

and the video export module is used for exporting the video object added with the video subtitle.

As a preferred scheme of the video subtitle adding system, the audio transcription module includes:

the preprocessing submodule is used for calling a preprocessing interface and uploading basic information, fragment information and configurable parameters of the audio file to be transcribed;

the file fragment uploading sub-module is used for calling a file uploading interface when the preprocessing is successful and sequentially uploading audio fragments according to the fragment information;

the combined file sub-module is used for calling a combined file interface after all the file slices are successfully uploaded, and informing a server side of carrying out file combination and transcription operation;

the query processing progress submodule is used for listing the tasks into a plan by the server side after the calling party sends a file merging request, and querying whether the task processing progress state reaches a preset value or not;

and the result obtaining submodule is used for calling the result obtaining interface to obtain a transcription result, the server side actively calls back, and the transcription result is actively sent to the configured callback address after the transcription is finished.

As a preferred scheme of the video subtitle adding system, the subtitle adding module includes:

the file path extraction submodule is used for acquiring a file path of the video object and acquiring a file path of the bilingual subtitle;

the text file importing submodule is used for importing a corresponding text file comprising bilingual subtitles to the video object;

the video information acquisition submodule is used for acquiring the height and the width of the video in the video object;

and the bilingual subtitle adding submodule is used for calling ImageMagick to complete the addition of the bilingual subtitles.

The invention has the following advantages: judging the state attribute of a video object by acquiring the video object to be subjected to subtitle addition; acquiring an audio file contained in a video object: reading storage directories of video objects and audio files, and judging whether an audio directory exists, a) when the audio directory exists, storing the audio files into the corresponding audio directory; b) when the audio directory does not exist, creating the audio directory, and storing the audio file into the created audio directory; importing an ffmpy3 library to obtain storage directories of video objects and audio files; converting the video object into an audio file by using the processing of the OS module; assigning an audio attribute to the converted audio file by adopting an FFmpeg algorithm; calling an interface in a voice Url form, and calling an API (application programming interface) to transcribe the audio file; after the transfer is finished, obtaining a transfer result and outputting a text file after the transfer; combining the text file with the video object to add the video caption; and exporting the video object added with the video subtitle. The method and the system realize the artificial intelligence to add the subtitles to the film, can also realize the automatic subtitle adding of the web page to the video (including the film and the foreign news), not only add the Chinese subtitles to the foreign video, but also add the subtitles in any language to the video according to the requirements, for example, add the French subtitles to the Mandarin film, thereby facilitating the study of French, and improving the viewing effect and facilitating the viewing of the foreign video by people through the automatic subtitle adding; manpower and material resources brought by manual translation can be greatly reduced, and the translation cost is reduced.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It should be apparent that the drawings in the following description are merely exemplary, and that other embodiments can be derived from the drawings provided by those of ordinary skill in the art without inventive effort.

The structures, ratios, sizes, and the like shown in the present specification are only used for matching with the contents disclosed in the specification, so that those skilled in the art can understand and read the present invention, and do not limit the conditions for implementing the present invention, so that the present invention has no technical significance, and any structural modifications, changes in the ratio relationship, or adjustments of the sizes, without affecting the functions and purposes of the present invention, should still fall within the scope of the present invention.

Fig. 1 is a schematic diagram illustrating a video subtitle adding method according to an embodiment of the present invention;

fig. 2 is a schematic processing flow diagram of a video subtitle adding method according to an embodiment of the present invention;

fig. 3 is an audio file transcription algorithm diagram in a video subtitle adding process according to an embodiment of the present invention;

fig. 4 is a schematic diagram of a video subtitle adding system according to an embodiment of the present invention.

Detailed Description

The present invention is described in terms of particular embodiments, other advantages and features of the invention will become apparent to those skilled in the art from the following disclosure, and it is to be understood that the described embodiments are merely exemplary of the invention and that it is not intended to limit the invention to the particular embodiments disclosed. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In the embodiment of the invention, the adopted English abbreviation is explained as follows:

ffmpy 3: a Python wrapper for FFmpeg, originally derived from the FFmpeg entry, compiles the FFmpeg command line according to the parameters provided and their respective options, and executes it using a Python sub-process; ffmpy3 is similar to the command line method used by FFmpeg. It can read any number of input "files" (regular files, pipes, network streams, capture devices, etc.) and write any number of output "files"; ffmpy3 supports the pipeline protocol of FFmpeg.

An OS module: one of the Python standard libraries is a module for accessing operating system functions, and cross-platform access can be achieved using an interface provided in the OS module.

WAV: one of the sound file formats is a standard digital audio file developed by microsoft corporation specifically for Windows, which can record various mono or stereo sound information and ensure that the sound is not distorted.

FFmpeg: a set of open source computer programs that can be used to record, convert digital audio, video, and convert them into streams; a complete solution for recording, converting and streaming audio and video is provided; an audio/video codec library is included.

Url: refers to a Uniform Resource Locator (URL), which is a representation method used on a web service program of the internet to specify the location of information.

API: the application program interface is some predefined interfaces (such as functions and HTTP interfaces) or refers to the convention of connecting different components of the software system.

ImageMagick: is an image processing software that can edit and display most of the most popular image formats today, including JPEG, TIFF, PNM, PNG, GIF, and Photo CS. ImageMagick utilizes multiple compute threads to improve performance and can read, process or write in mega, giga or mega pixel image sizes.

Example 1

Referring to fig. 1, 2 and 3, there is provided a video subtitle adding method including the steps of:

s1, acquiring a video object to be subjected to subtitle addition, and judging the state attribute of the video object;

s2, acquiring the audio file contained in the video object:

s21, reading the storage directories of the video object and the audio file, and judging whether an audio directory exists, a) when the audio directory exists, storing the audio file into the corresponding audio directory; b) when the audio directory does not exist, creating the audio directory, and storing the audio file into the created audio directory;

s22, importing an ffmpy3 library to obtain storage directories of the video objects and the audio files;

s23, converting the video object into an audio file in a WAV format by using the processing of an OS module;

s24, assigning audio attributes to the converted audio file by adopting an FFmpeg algorithm;

s3, calling an interface in a voice Url form, and calling an API (application programming interface) to transcribe the audio file;

s4, after the transcription is finished, acquiring a transcription result and outputting a transcribed text file;

s5, combining the text file and the video object to add video subtitles;

and S6, exporting the video object added with the video subtitle.

In this embodiment, the step of transferring the audio file includes:

(a1) pretreatment of

(a2) file fragment uploading

(a3) merging files

(a4) query processing progress

(a5) obtaining results

Specifically, in the step (a1), after the preprocessing interface is successfully called, a task ID is returned, where the task ID is a unique identifier of the transfer task and is used as a necessary parameter of a subsequent interface. In the step (a2), if the uploaded audio slice is abnormal, the audio slice which fails to be uploaded is retried and executed again. In the step (a3), the merge file interface does not return a transcription result, the merge file interface notifies the server to list the task in the transcription plan, and the transcription interface is obtained through getResult.

With reference again to fig. 3, in which:

pre-treatment/preparation

File fragment uploading/uploading

Merge file/merge

Query processing progress/getProgress

Obtaining a result/getResult;

after the caller sends out the file merging request, the server has already listed the task in the plan. Before obtaining the result, when the task is 9, the user can call the result obtaining interface to obtain the transcription result, and the process is invisible to the user and is only the processing process of the background program.

In this embodiment, the text file includes a foreign language and a translation text of the foreign language, and the text file is combined with the video object to perform video bilingual subtitle addition; video objects are derived after the addition of the video bilingual subtitles.

Specifically, the step of adding the video bilingual subtitles by combining the text file and the video object includes:

(b3) acquiring the height and width of a video in the video object;

(b4) and calling ImageMagick to finish the addition of the bilingual subtitles.

In this embodiment, the status attribute of the video object includes a local attribute and an online attribute, where the local attribute indicates that the video object is from a local video file, and the online attribute indicates that the video object is from a foreign language website.

In the embodiment of the present invention, the targeted video object is preferably divided into two cases:

the first is a downloaded video file, that is, a foreign language video without any subtitles, the video is imported into software, the video file is automatically acquired and converted into an audio file, in this step, the video is converted into audio, and then the text is acquired through processing, the language is automatically identified and converted into a text file. In the process of converting audio into text, a corresponding interface is usually set up by means of mature translation software, such as hundred-degree translation, and the like, and an audio file is uploaded to the interface and correspondingly processed to finally obtain a text document after audio translation. The last step is to combine the text document with the video and finally export the video file containing the bilingual subtitles.

Another is online video, such as browsing a 'rotten tomato' website, where videos in a webpage are all played online, but not cached in a local file, so that caching is needed, files to be viewed are cached in a cloud space, but not in the local of a user, and online video subtitles are added. In the processing of the process, only the converted text file is segmented and added to the time period corresponding to the online video without returning the video file. Certainly, there is a buffering time for the translation of the video, and those skilled in the art can strive to make the user see the online video containing the bilingual subtitles in the shortest time based on the scheme of the present application.

The technical scheme of the embodiment of the invention is not limited to local video files and online videos, when a certain object carries out conversation with a foreign net friend by using communication software, the communicated voice can be automatically translated into a foreign language, and bilingual subtitles are added to the proper position of the chat video, so that online communication is realized. Therefore, foreign net friends hear the foreign language, and a certain object hears the translated Chinese language, so that the addition of subtitles is realized, and the online conversion of the language is also included.

The embodiment of the invention can also have an application scene, such as that a foreign net friend swishes the trembler, when the user browses the video of the Chinese character, the caption can be added into the short video by the technical scheme of the invention, so that the user can conveniently browse the short video, the trembler software does not need to be divided into two versions at home and abroad, and the problem of language barrier is solved.

The technical scheme of the embodiment of the invention can also be applied to the addition of the video subtitles of instant messaging tools such as QQ, WeChat and the like.

Example 2

In a second aspect, there is provided a video subtitle adding system using the method of embodiment 1 or any possible implementation manner thereof, the video subtitle adding system including:

the video object loading module 1 is used for acquiring a video object to be subjected to subtitle addition and judging the state attribute of the video object;

an audio file loading module 2, configured to obtain an audio file included in the video object: reading the storage directories of the video object and the audio file, and judging whether an audio directory exists, a) when the audio directory exists, storing the audio file into the corresponding audio directory; b) when the audio directory does not exist, creating the audio directory, and storing the audio file into the created audio directory;

the storage directory extraction module 3 is configured to import an ffmpy3 library, and obtain storage directories of the video object and the audio file;

the OS module processing module 4 is used for converting the video object into an audio file by utilizing the processing of the OS module;

an audio attribute processing module 5, configured to assign an audio attribute to the converted audio file by using an FFmpeg algorithm;

the audio transcription module 6 is used for calling an interface in a voice Url form and calling an API (application programming interface) to transcribe the audio file;

the text file output module 7 is used for acquiring a transcription result and outputting a transcribed text file after the transcription is finished;

a subtitle adding module 8, configured to combine the text file with the video object to perform video subtitle addition;

and the video export module 9 is used for exporting the video object added with the video subtitle.

In this embodiment, the audio transcription module 6 includes:

the preprocessing submodule 61 is used for calling a preprocessing interface and uploading basic information, fragment information and configurable parameters of the audio file to be transcribed;

the file fragment uploading submodule 62 is used for calling a file uploading interface when the preprocessing is successful and uploading audio fragments in sequence according to the fragment information;

the merged file submodule 63 is configured to, after all the file slices are successfully uploaded, call a merged file interface, and notify the server to perform file merging and transcription;

the query processing progress submodule 64 is used for listing the tasks into a plan by the server side after the calling party sends a file merging request, and querying whether the task processing progress state reaches a preset value;

and the result obtaining submodule 65 is configured to call a result obtaining interface to obtain a transcription result, and the server actively call back the transcription result, and actively send the transcription result to the configured callback address after the transcription is completed.

In this embodiment, the subtitle adding module 8 includes:

a file path extraction sub-module 81, configured to obtain a file path of the video object and obtain a file path of the bilingual subtitle;

a text file importing sub-module 82, configured to import a corresponding text file including bilingual subtitles to the video object;

a video information obtaining sub-module 83, configured to obtain the height and width of the video in the video object;

and the bilingual subtitle adding sub-module 84 is used for calling ImageMagick to complete the addition of the bilingual subtitles.

It should be noted that, for the contents of information interaction, execution process, and the like between the modules/units of the video subtitle adding system, since the contents are based on the same concept as the method embodiment in the embodiment of the present application, the technical effect brought by the contents is the same as the method embodiment of the present application, and specific contents can be referred to the description in the foregoing method embodiment of the present application.

The method comprises the steps of judging the state attribute of a video object by acquiring the video object to be subjected to subtitle addition; acquiring an audio file contained in a video object: reading storage directories of video objects and audio files, and judging whether an audio directory exists, a) when the audio directory exists, storing the audio files into the corresponding audio directory; b) when the audio directory does not exist, creating the audio directory, and storing the audio file into the created audio directory; importing an ffmpy3 library to obtain storage directories of video objects and audio files; converting the video object into an audio file by using the processing of the OS module; assigning an audio attribute to the converted audio file by adopting an FFmpeg algorithm; calling an interface in a voice Url form, and calling an API (application programming interface) to transcribe the audio file; after the transfer is finished, obtaining a transfer result and outputting a text file after the transfer; combining the text file with the video object to add the video caption; and exporting the video object added with the video subtitle. The method and the system realize the artificial intelligence to add the subtitles to the film, can also realize the automatic subtitle adding of the web page to the video (including the film and the foreign news), not only add the Chinese subtitles to the foreign video, but also add the subtitles in any language to the video according to the requirements, for example, add the French subtitles to the Mandarin film, thereby facilitating the study of French, and improving the viewing effect and facilitating the viewing of the foreign video by people through the automatic subtitle adding; manpower and material resources brought by manual translation can be greatly reduced, and the translation cost is reduced.

As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or program product. Thus, various aspects of the invention may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "module" or "platform.

Although the invention has been described in detail above with reference to a general description and specific examples, it will be apparent to one skilled in the art that modifications or improvements may be made thereto based on the invention. Accordingly, such modifications and improvements are intended to be within the scope of the invention as claimed.

Claims

1. A video subtitle adding method is characterized by comprising the following steps:

combining the text file with the video object to add video subtitles;

and exporting the video object added with the video subtitle.

2. The method according to claim 1, wherein the status attribute of the video object comprises a local attribute and an online attribute, the local attribute indicating that the video object is from a local video file, and the online attribute indicating that the video object is from a foreign language website.

3. The method of claim 2, wherein the video object is converted into an audio file in WAV format by an OS module.

4. The method of claim 1, wherein the step of transferring the audio file comprises:

(a1) pretreatment of

(a2) file fragment uploading

(a3) merging files

(a4) query processing progress

(a5) obtaining results

5. The method according to claim 4, wherein in the step (a1), when the pre-processing interface is successfully called, a task ID is returned, wherein the task ID is a unique identifier of the transfer task and is used as a necessary transmission parameter of a subsequent interface;

in the step (a2), if the uploaded audio slice is abnormal, retrying and executing the audio slice which fails to be uploaded again;

in the step (a3), the merge file interface does not return a transcription result, the merge file interface notifies the server to list the task in the transcription plan, and the transcription interface is obtained through getResult.

6. The method according to claim 1, wherein the text file contains foreign language and translation text of the foreign language, and the text file is combined with the video object to perform video bilingual subtitle addition; video objects are derived after the addition of the video bilingual subtitles.

7. The method of claim 6, wherein the step of adding the video bilingual subtitles by combining the text file and the video object comprises:

(b3) acquiring the height and width of a video in the video object;

(b4) and calling ImageMagick to finish the addition of the bilingual subtitles.

8. A video subtitle adding system using the video subtitle adding method according to any one of claims 1 to 7, the video subtitle adding system comprising:

9. The video caption addition system of claim 8, wherein the audio transcription module comprises:

10. The video subtitle adding system according to claim 8, wherein the subtitle adding module comprises: