CN113194356A - Video subtitle adding method and system - Google Patents

Video subtitle adding method and system Download PDF

Info

Publication number
CN113194356A
CN113194356A CN202110310014.3A CN202110310014A CN113194356A CN 113194356 A CN113194356 A CN 113194356A CN 202110310014 A CN202110310014 A CN 202110310014A CN 113194356 A CN113194356 A CN 113194356A
Authority
CN
China
Prior art keywords
file
video
audio
video object
calling
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110310014.3A
Other languages
Chinese (zh)
Inventor
武永鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN202110310014.3A priority Critical patent/CN113194356A/en
Publication of CN113194356A publication Critical patent/CN113194356A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44016Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for substituting a video clip
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/488Data services, e.g. news ticker
    • H04N21/4884Data services, e.g. news ticker for displaying subtitles

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Studio Circuits (AREA)

Abstract

A video subtitle adds method and system, through obtaining the video object to be carried on subtitle addition, judge the state attribute of the video object; acquiring an audio file contained in a video object, importing the audio file into an ffmpy3 library, and acquiring a storage directory of the video object and the audio file; converting the video object into an audio file by using the processing of the OS module; assigning an audio attribute to the converted audio file by adopting an FFmpeg algorithm; calling an interface in a voice Url form, and calling an API (application programming interface) to transcribe the audio file; after the transfer is finished, obtaining a transfer result and outputting a text file after the transfer; combining the text file with the video object to add the video caption; and exporting the video object added with the video subtitle. The invention realizes the artificial intelligence to add the subtitles to the film, improves the film watching effect and facilitates the watching of foreign language videos by automatically adding the subtitles; manpower and material resources brought by manual translation can be greatly reduced, and the translation cost is reduced.

Description

Video subtitle adding method and system
Technical Field
The invention relates to the technical field of video processing, in particular to a method and a system for adding video subtitles.
Background
As computer networks have developed, videos have become more popular, whether online or offline for movies, or communicated via videos, and have played an increasingly important role in people's lives. Video is also a way to communicate between countries in the present society, which hinders people from further understanding foreign languages due to the variety of language categories and the inability of most people to master multiple languages.
At present, people mostly watch the film by using specific viewing software, wherein the film is manually translated and added with subtitles, and is completely foreign language subtitles or a foreign language film without subtitles, so that the viewing of people is hindered. In addition, when people browse foreign language websites or download movies without subtitles, the problem that people cannot understand is solved, but people can understand the movies without understanding through the subtitles. However, the addition of the subtitles requires manual translation, a translator must be skilled in the language to be translated and the translated language, that is, at least two languages must be refined to realize the addition of the subtitles, and the manual translation is high in cost and time and energy. In summary, a technical solution for adding video subtitles is needed.
Disclosure of Invention
Therefore, the invention provides a video subtitle adding method and a video subtitle adding system, which can realize automatic subtitle adding and subtitle-free translation of videos and solve the problems that the time consumption and the cost are high when subtitles are added to subtitle-free videos only through manual translation.
In order to achieve the above purpose, the invention provides the following technical scheme: in a first aspect, a method for adding video subtitles is provided, which includes the following steps:
acquiring a video object to be subjected to subtitle addition, and judging the state attribute of the video object;
acquiring an audio file contained in the video object: reading the storage directories of the video object and the audio file, and judging whether an audio directory exists, a) when the audio directory exists, storing the audio file into the corresponding audio directory; b) when the audio directory does not exist, creating the audio directory, and storing the audio file into the created audio directory; importing an ffmpy3 library to obtain storage directories of the video objects and the audio files; converting the video object into an audio file by using the processing of an OS module; assigning an audio attribute to the converted audio file by adopting an FFmpeg algorithm;
calling an interface in a voice Url form, and calling an API (application programming interface) to transcribe the audio file;
after the transfer is finished, obtaining a transfer result and outputting a text file after the transfer;
combining the text file with the video object to add video subtitles;
and exporting the video object added with the video subtitle.
As a preferred scheme of the video subtitle adding method, the state attribute of the video object includes a local attribute and an online attribute, the local attribute indicates that the video object is from a local video file, and the online attribute indicates that the video object is from a foreign language website.
As a preferred scheme of the video subtitle adding method, the video object is converted into an audio file in a WAV format by processing of an OS module.
As a preferred embodiment of the video subtitle adding method, the step of transferring the audio file includes:
(a1) pretreatment of
Calling a preprocessing interface, and uploading basic information, fragment information and configurable parameters of an audio file to be transcribed;
(a2) file fragment uploading
If the preprocessing is successful, calling a file uploading interface, and sequentially uploading audio slices according to the slice information;
(a3) merging files
After all the file slices are uploaded successfully, calling a merged file interface, and informing a server side to perform file merging and transferring operations;
(a4) query processing progress
After a calling party sends a file merging request, the server side lists tasks into a plan and inquires whether the task processing progress state reaches a preset value or not;
(a5) obtaining results
And calling an acquisition result interface to acquire a transcription result, actively calling back by the server, and actively sending the transcription result to the configured callback address after the transcription is completed.
As a preferred scheme of the video subtitle adding method, in step (a1), when the pre-processing interface is successfully called, a task ID is returned, where the task ID is a unique identifier of the transfer task and is used as a necessary parameter of a subsequent interface.
As a preferred embodiment of the video subtitle adding method, in step (a2), if the uploaded audio slice is abnormal, the audio slice with failed uploading is retried and executed again.
As a preferred scheme of the video subtitle adding method, in step (a3), the merge file interface does not return a transcription result, the merge file interface notifies the server to list the task in a transcription plan, and the transcription interface is obtained through getrule.
As a preferred scheme of the video subtitle adding method, the text file comprises foreign language and a translation text of the foreign language, and the text file is combined with the video object to carry out video bilingual subtitle adding; video objects are derived after the addition of the video bilingual subtitles.
As a preferred embodiment of the video subtitle adding method, the step of adding the video bilingual subtitle by combining the text file and the video object includes:
(b1) acquiring a file path of the video object and acquiring a file path of the bilingual subtitle;
(b2) importing a corresponding text file comprising bilingual subtitles to the video object;
(b3) acquiring the height and width of a video in the video object;
(b4) and calling ImageMagick to finish the addition of the bilingual subtitles.
In a second aspect, there is provided a video caption adding system, which adopts the method for adding a video caption according to the first aspect or any possible implementation manner thereof, and includes:
the video object loading module is used for acquiring a video object to be subjected to subtitle addition and judging the state attribute of the video object;
an audio file loading module, configured to obtain an audio file included in the video object: reading the storage directories of the video object and the audio file, and judging whether an audio directory exists, a) when the audio directory exists, storing the audio file into the corresponding audio directory; b) when the audio directory does not exist, creating the audio directory, and storing the audio file into the created audio directory;
the storage directory extraction module is used for importing an ffmpy3 library to obtain storage directories of the video objects and the audio files;
the OS module processing module is used for converting the video object into an audio file by utilizing the processing of the OS module;
the audio attribute processing module is used for endowing the converted audio file with audio attributes by adopting an FFmpeg algorithm;
the audio transcription module is used for calling an interface in a voice Url form and calling an API (application program interface) to transcribe the audio file;
the text file output module is used for acquiring a transcription result and outputting a transcribed text file after the transcription is finished;
the subtitle adding module is used for combining the text file and the video object to add the video subtitle;
and the video export module is used for exporting the video object added with the video subtitle.
As a preferred scheme of the video subtitle adding system, the audio transcription module includes:
the preprocessing submodule is used for calling a preprocessing interface and uploading basic information, fragment information and configurable parameters of the audio file to be transcribed;
the file fragment uploading sub-module is used for calling a file uploading interface when the preprocessing is successful and sequentially uploading audio fragments according to the fragment information;
the combined file sub-module is used for calling a combined file interface after all the file slices are successfully uploaded, and informing a server side of carrying out file combination and transcription operation;
the query processing progress submodule is used for listing the tasks into a plan by the server side after the calling party sends a file merging request, and querying whether the task processing progress state reaches a preset value or not;
and the result obtaining submodule is used for calling the result obtaining interface to obtain a transcription result, the server side actively calls back, and the transcription result is actively sent to the configured callback address after the transcription is finished.
As a preferred scheme of the video subtitle adding system, the subtitle adding module includes:
the file path extraction submodule is used for acquiring a file path of the video object and acquiring a file path of the bilingual subtitle;
the text file importing submodule is used for importing a corresponding text file comprising bilingual subtitles to the video object;
the video information acquisition submodule is used for acquiring the height and the width of the video in the video object;
and the bilingual subtitle adding submodule is used for calling ImageMagick to complete the addition of the bilingual subtitles.
The invention has the following advantages: judging the state attribute of a video object by acquiring the video object to be subjected to subtitle addition; acquiring an audio file contained in a video object: reading storage directories of video objects and audio files, and judging whether an audio directory exists, a) when the audio directory exists, storing the audio files into the corresponding audio directory; b) when the audio directory does not exist, creating the audio directory, and storing the audio file into the created audio directory; importing an ffmpy3 library to obtain storage directories of video objects and audio files; converting the video object into an audio file by using the processing of the OS module; assigning an audio attribute to the converted audio file by adopting an FFmpeg algorithm; calling an interface in a voice Url form, and calling an API (application programming interface) to transcribe the audio file; after the transfer is finished, obtaining a transfer result and outputting a text file after the transfer; combining the text file with the video object to add the video caption; and exporting the video object added with the video subtitle. The method and the system realize the artificial intelligence to add the subtitles to the film, can also realize the automatic subtitle adding of the web page to the video (including the film and the foreign news), not only add the Chinese subtitles to the foreign video, but also add the subtitles in any language to the video according to the requirements, for example, add the French subtitles to the Mandarin film, thereby facilitating the study of French, and improving the viewing effect and facilitating the viewing of the foreign video by people through the automatic subtitle adding; manpower and material resources brought by manual translation can be greatly reduced, and the translation cost is reduced.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It should be apparent that the drawings in the following description are merely exemplary, and that other embodiments can be derived from the drawings provided by those of ordinary skill in the art without inventive effort.
The structures, ratios, sizes, and the like shown in the present specification are only used for matching with the contents disclosed in the specification, so that those skilled in the art can understand and read the present invention, and do not limit the conditions for implementing the present invention, so that the present invention has no technical significance, and any structural modifications, changes in the ratio relationship, or adjustments of the sizes, without affecting the functions and purposes of the present invention, should still fall within the scope of the present invention.
Fig. 1 is a schematic diagram illustrating a video subtitle adding method according to an embodiment of the present invention;
fig. 2 is a schematic processing flow diagram of a video subtitle adding method according to an embodiment of the present invention;
fig. 3 is an audio file transcription algorithm diagram in a video subtitle adding process according to an embodiment of the present invention;
fig. 4 is a schematic diagram of a video subtitle adding system according to an embodiment of the present invention.
Detailed Description
The present invention is described in terms of particular embodiments, other advantages and features of the invention will become apparent to those skilled in the art from the following disclosure, and it is to be understood that the described embodiments are merely exemplary of the invention and that it is not intended to limit the invention to the particular embodiments disclosed. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the embodiment of the invention, the adopted English abbreviation is explained as follows:
ffmpy 3: a Python wrapper for FFmpeg, originally derived from the FFmpeg entry, compiles the FFmpeg command line according to the parameters provided and their respective options, and executes it using a Python sub-process; ffmpy3 is similar to the command line method used by FFmpeg. It can read any number of input "files" (regular files, pipes, network streams, capture devices, etc.) and write any number of output "files"; ffmpy3 supports the pipeline protocol of FFmpeg.
An OS module: one of the Python standard libraries is a module for accessing operating system functions, and cross-platform access can be achieved using an interface provided in the OS module.
WAV: one of the sound file formats is a standard digital audio file developed by microsoft corporation specifically for Windows, which can record various mono or stereo sound information and ensure that the sound is not distorted.
FFmpeg: a set of open source computer programs that can be used to record, convert digital audio, video, and convert them into streams; a complete solution for recording, converting and streaming audio and video is provided; an audio/video codec library is included.
Url: refers to a Uniform Resource Locator (URL), which is a representation method used on a web service program of the internet to specify the location of information.
API: the application program interface is some predefined interfaces (such as functions and HTTP interfaces) or refers to the convention of connecting different components of the software system.
ImageMagick: is an image processing software that can edit and display most of the most popular image formats today, including JPEG, TIFF, PNM, PNG, GIF, and Photo CS. ImageMagick utilizes multiple compute threads to improve performance and can read, process or write in mega, giga or mega pixel image sizes.
Example 1
Referring to fig. 1, 2 and 3, there is provided a video subtitle adding method including the steps of:
s1, acquiring a video object to be subjected to subtitle addition, and judging the state attribute of the video object;
s2, acquiring the audio file contained in the video object:
s21, reading the storage directories of the video object and the audio file, and judging whether an audio directory exists, a) when the audio directory exists, storing the audio file into the corresponding audio directory; b) when the audio directory does not exist, creating the audio directory, and storing the audio file into the created audio directory;
s22, importing an ffmpy3 library to obtain storage directories of the video objects and the audio files;
s23, converting the video object into an audio file in a WAV format by using the processing of an OS module;
s24, assigning audio attributes to the converted audio file by adopting an FFmpeg algorithm;
s3, calling an interface in a voice Url form, and calling an API (application programming interface) to transcribe the audio file;
s4, after the transcription is finished, acquiring a transcription result and outputting a transcribed text file;
s5, combining the text file and the video object to add video subtitles;
and S6, exporting the video object added with the video subtitle.
In this embodiment, the step of transferring the audio file includes:
(a1) pretreatment of
Calling a preprocessing interface, and uploading basic information, fragment information and configurable parameters of an audio file to be transcribed;
(a2) file fragment uploading
If the preprocessing is successful, calling a file uploading interface, and sequentially uploading audio slices according to the slice information;
(a3) merging files
After all the file slices are uploaded successfully, calling a merged file interface, and informing a server side to perform file merging and transferring operations;
(a4) query processing progress
After a calling party sends a file merging request, the server side lists tasks into a plan and inquires whether the task processing progress state reaches a preset value or not;
(a5) obtaining results
And calling an acquisition result interface to acquire a transcription result, actively calling back by the server, and actively sending the transcription result to the configured callback address after the transcription is completed.
Specifically, in the step (a1), after the preprocessing interface is successfully called, a task ID is returned, where the task ID is a unique identifier of the transfer task and is used as a necessary parameter of a subsequent interface. In the step (a2), if the uploaded audio slice is abnormal, the audio slice which fails to be uploaded is retried and executed again. In the step (a3), the merge file interface does not return a transcription result, the merge file interface notifies the server to list the task in the transcription plan, and the transcription interface is obtained through getResult.
With reference again to fig. 3, in which:
pre-treatment/preparation
File fragment uploading/uploading
Merge file/merge
Query processing progress/getProgress
Obtaining a result/getResult;
after the caller sends out the file merging request, the server has already listed the task in the plan. Before obtaining the result, when the task is 9, the user can call the result obtaining interface to obtain the transcription result, and the process is invisible to the user and is only the processing process of the background program.
In this embodiment, the text file includes a foreign language and a translation text of the foreign language, and the text file is combined with the video object to perform video bilingual subtitle addition; video objects are derived after the addition of the video bilingual subtitles.
Specifically, the step of adding the video bilingual subtitles by combining the text file and the video object includes:
(b1) acquiring a file path of the video object and acquiring a file path of the bilingual subtitle;
(b2) importing a corresponding text file comprising bilingual subtitles to the video object;
(b3) acquiring the height and width of a video in the video object;
(b4) and calling ImageMagick to finish the addition of the bilingual subtitles.
In this embodiment, the status attribute of the video object includes a local attribute and an online attribute, where the local attribute indicates that the video object is from a local video file, and the online attribute indicates that the video object is from a foreign language website.
In the embodiment of the present invention, the targeted video object is preferably divided into two cases:
the first is a downloaded video file, that is, a foreign language video without any subtitles, the video is imported into software, the video file is automatically acquired and converted into an audio file, in this step, the video is converted into audio, and then the text is acquired through processing, the language is automatically identified and converted into a text file. In the process of converting audio into text, a corresponding interface is usually set up by means of mature translation software, such as hundred-degree translation, and the like, and an audio file is uploaded to the interface and correspondingly processed to finally obtain a text document after audio translation. The last step is to combine the text document with the video and finally export the video file containing the bilingual subtitles.
Another is online video, such as browsing a 'rotten tomato' website, where videos in a webpage are all played online, but not cached in a local file, so that caching is needed, files to be viewed are cached in a cloud space, but not in the local of a user, and online video subtitles are added. In the processing of the process, only the converted text file is segmented and added to the time period corresponding to the online video without returning the video file. Certainly, there is a buffering time for the translation of the video, and those skilled in the art can strive to make the user see the online video containing the bilingual subtitles in the shortest time based on the scheme of the present application.
The technical scheme of the embodiment of the invention is not limited to local video files and online videos, when a certain object carries out conversation with a foreign net friend by using communication software, the communicated voice can be automatically translated into a foreign language, and bilingual subtitles are added to the proper position of the chat video, so that online communication is realized. Therefore, foreign net friends hear the foreign language, and a certain object hears the translated Chinese language, so that the addition of subtitles is realized, and the online conversion of the language is also included.
The embodiment of the invention can also have an application scene, such as that a foreign net friend swishes the trembler, when the user browses the video of the Chinese character, the caption can be added into the short video by the technical scheme of the invention, so that the user can conveniently browse the short video, the trembler software does not need to be divided into two versions at home and abroad, and the problem of language barrier is solved.
The technical scheme of the embodiment of the invention can also be applied to the addition of the video subtitles of instant messaging tools such as QQ, WeChat and the like.
Example 2
In a second aspect, there is provided a video subtitle adding system using the method of embodiment 1 or any possible implementation manner thereof, the video subtitle adding system including:
the video object loading module 1 is used for acquiring a video object to be subjected to subtitle addition and judging the state attribute of the video object;
an audio file loading module 2, configured to obtain an audio file included in the video object: reading the storage directories of the video object and the audio file, and judging whether an audio directory exists, a) when the audio directory exists, storing the audio file into the corresponding audio directory; b) when the audio directory does not exist, creating the audio directory, and storing the audio file into the created audio directory;
the storage directory extraction module 3 is configured to import an ffmpy3 library, and obtain storage directories of the video object and the audio file;
the OS module processing module 4 is used for converting the video object into an audio file by utilizing the processing of the OS module;
an audio attribute processing module 5, configured to assign an audio attribute to the converted audio file by using an FFmpeg algorithm;
the audio transcription module 6 is used for calling an interface in a voice Url form and calling an API (application programming interface) to transcribe the audio file;
the text file output module 7 is used for acquiring a transcription result and outputting a transcribed text file after the transcription is finished;
a subtitle adding module 8, configured to combine the text file with the video object to perform video subtitle addition;
and the video export module 9 is used for exporting the video object added with the video subtitle.
In this embodiment, the audio transcription module 6 includes:
the preprocessing submodule 61 is used for calling a preprocessing interface and uploading basic information, fragment information and configurable parameters of the audio file to be transcribed;
the file fragment uploading submodule 62 is used for calling a file uploading interface when the preprocessing is successful and uploading audio fragments in sequence according to the fragment information;
the merged file submodule 63 is configured to, after all the file slices are successfully uploaded, call a merged file interface, and notify the server to perform file merging and transcription;
the query processing progress submodule 64 is used for listing the tasks into a plan by the server side after the calling party sends a file merging request, and querying whether the task processing progress state reaches a preset value;
and the result obtaining submodule 65 is configured to call a result obtaining interface to obtain a transcription result, and the server actively call back the transcription result, and actively send the transcription result to the configured callback address after the transcription is completed.
In this embodiment, the subtitle adding module 8 includes:
a file path extraction sub-module 81, configured to obtain a file path of the video object and obtain a file path of the bilingual subtitle;
a text file importing sub-module 82, configured to import a corresponding text file including bilingual subtitles to the video object;
a video information obtaining sub-module 83, configured to obtain the height and width of the video in the video object;
and the bilingual subtitle adding sub-module 84 is used for calling ImageMagick to complete the addition of the bilingual subtitles.
It should be noted that, for the contents of information interaction, execution process, and the like between the modules/units of the video subtitle adding system, since the contents are based on the same concept as the method embodiment in the embodiment of the present application, the technical effect brought by the contents is the same as the method embodiment of the present application, and specific contents can be referred to the description in the foregoing method embodiment of the present application.
The method comprises the steps of judging the state attribute of a video object by acquiring the video object to be subjected to subtitle addition; acquiring an audio file contained in a video object: reading storage directories of video objects and audio files, and judging whether an audio directory exists, a) when the audio directory exists, storing the audio files into the corresponding audio directory; b) when the audio directory does not exist, creating the audio directory, and storing the audio file into the created audio directory; importing an ffmpy3 library to obtain storage directories of video objects and audio files; converting the video object into an audio file by using the processing of the OS module; assigning an audio attribute to the converted audio file by adopting an FFmpeg algorithm; calling an interface in a voice Url form, and calling an API (application programming interface) to transcribe the audio file; after the transfer is finished, obtaining a transfer result and outputting a text file after the transfer; combining the text file with the video object to add the video caption; and exporting the video object added with the video subtitle. The method and the system realize the artificial intelligence to add the subtitles to the film, can also realize the automatic subtitle adding of the web page to the video (including the film and the foreign news), not only add the Chinese subtitles to the foreign video, but also add the subtitles in any language to the video according to the requirements, for example, add the French subtitles to the Mandarin film, thereby facilitating the study of French, and improving the viewing effect and facilitating the viewing of the foreign video by people through the automatic subtitle adding; manpower and material resources brought by manual translation can be greatly reduced, and the translation cost is reduced.
As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or program product. Thus, various aspects of the invention may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "module" or "platform.
Although the invention has been described in detail above with reference to a general description and specific examples, it will be apparent to one skilled in the art that modifications or improvements may be made thereto based on the invention. Accordingly, such modifications and improvements are intended to be within the scope of the invention as claimed.

Claims (10)

1. A video subtitle adding method is characterized by comprising the following steps:
acquiring a video object to be subjected to subtitle addition, and judging the state attribute of the video object;
acquiring an audio file contained in the video object: reading the storage directories of the video object and the audio file, and judging whether an audio directory exists, a) when the audio directory exists, storing the audio file into the corresponding audio directory; b) when the audio directory does not exist, creating the audio directory, and storing the audio file into the created audio directory; importing an ffmpy3 library to obtain storage directories of the video objects and the audio files; converting the video object into an audio file by using the processing of an OS module; assigning an audio attribute to the converted audio file by adopting an FFmpeg algorithm;
calling an interface in a voice Url form, and calling an API (application programming interface) to transcribe the audio file;
after the transfer is finished, obtaining a transfer result and outputting a text file after the transfer;
combining the text file with the video object to add video subtitles;
and exporting the video object added with the video subtitle.
2. The method according to claim 1, wherein the status attribute of the video object comprises a local attribute and an online attribute, the local attribute indicating that the video object is from a local video file, and the online attribute indicating that the video object is from a foreign language website.
3. The method of claim 2, wherein the video object is converted into an audio file in WAV format by an OS module.
4. The method of claim 1, wherein the step of transferring the audio file comprises:
(a1) pretreatment of
Calling a preprocessing interface, and uploading basic information, fragment information and configurable parameters of an audio file to be transcribed;
(a2) file fragment uploading
If the preprocessing is successful, calling a file uploading interface, and sequentially uploading audio slices according to the slice information;
(a3) merging files
After all the file slices are uploaded successfully, calling a merged file interface, and informing a server side to perform file merging and transferring operations;
(a4) query processing progress
After a calling party sends a file merging request, the server side lists tasks into a plan and inquires whether the task processing progress state reaches a preset value or not;
(a5) obtaining results
And calling an acquisition result interface to acquire a transcription result, actively calling back by the server, and actively sending the transcription result to the configured callback address after the transcription is completed.
5. The method according to claim 4, wherein in the step (a1), when the pre-processing interface is successfully called, a task ID is returned, wherein the task ID is a unique identifier of the transfer task and is used as a necessary transmission parameter of a subsequent interface;
in the step (a2), if the uploaded audio slice is abnormal, retrying and executing the audio slice which fails to be uploaded again;
in the step (a3), the merge file interface does not return a transcription result, the merge file interface notifies the server to list the task in the transcription plan, and the transcription interface is obtained through getResult.
6. The method according to claim 1, wherein the text file contains foreign language and translation text of the foreign language, and the text file is combined with the video object to perform video bilingual subtitle addition; video objects are derived after the addition of the video bilingual subtitles.
7. The method of claim 6, wherein the step of adding the video bilingual subtitles by combining the text file and the video object comprises:
(b1) acquiring a file path of the video object and acquiring a file path of the bilingual subtitle;
(b2) importing a corresponding text file comprising bilingual subtitles to the video object;
(b3) acquiring the height and width of a video in the video object;
(b4) and calling ImageMagick to finish the addition of the bilingual subtitles.
8. A video subtitle adding system using the video subtitle adding method according to any one of claims 1 to 7, the video subtitle adding system comprising:
the video object loading module is used for acquiring a video object to be subjected to subtitle addition and judging the state attribute of the video object;
an audio file loading module, configured to obtain an audio file included in the video object: reading the storage directories of the video object and the audio file, and judging whether an audio directory exists, a) when the audio directory exists, storing the audio file into the corresponding audio directory; b) when the audio directory does not exist, creating the audio directory, and storing the audio file into the created audio directory;
the storage directory extraction module is used for importing an ffmpy3 library to obtain storage directories of the video objects and the audio files;
the OS module processing module is used for converting the video object into an audio file by utilizing the processing of the OS module;
the audio attribute processing module is used for endowing the converted audio file with audio attributes by adopting an FFmpeg algorithm;
the audio transcription module is used for calling an interface in a voice Url form and calling an API (application program interface) to transcribe the audio file;
the text file output module is used for acquiring a transcription result and outputting a transcribed text file after the transcription is finished;
the subtitle adding module is used for combining the text file and the video object to add the video subtitle;
and the video export module is used for exporting the video object added with the video subtitle.
9. The video caption addition system of claim 8, wherein the audio transcription module comprises:
the preprocessing submodule is used for calling a preprocessing interface and uploading basic information, fragment information and configurable parameters of the audio file to be transcribed;
the file fragment uploading sub-module is used for calling a file uploading interface when the preprocessing is successful and sequentially uploading audio fragments according to the fragment information;
the combined file sub-module is used for calling a combined file interface after all the file slices are successfully uploaded, and informing a server side of carrying out file combination and transcription operation;
the query processing progress submodule is used for listing the tasks into a plan by the server side after the calling party sends a file merging request, and querying whether the task processing progress state reaches a preset value or not;
and the result obtaining submodule is used for calling the result obtaining interface to obtain a transcription result, the server side actively calls back, and the transcription result is actively sent to the configured callback address after the transcription is finished.
10. The video subtitle adding system according to claim 8, wherein the subtitle adding module comprises:
the file path extraction submodule is used for acquiring a file path of the video object and acquiring a file path of the bilingual subtitle;
the text file importing submodule is used for importing a corresponding text file comprising bilingual subtitles to the video object;
the video information acquisition submodule is used for acquiring the height and the width of the video in the video object;
and the bilingual subtitle adding submodule is used for calling ImageMagick to complete the addition of the bilingual subtitles.
CN202110310014.3A 2021-03-23 2021-03-23 Video subtitle adding method and system Pending CN113194356A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110310014.3A CN113194356A (en) 2021-03-23 2021-03-23 Video subtitle adding method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110310014.3A CN113194356A (en) 2021-03-23 2021-03-23 Video subtitle adding method and system

Publications (1)

Publication Number Publication Date
CN113194356A true CN113194356A (en) 2021-07-30

Family

ID=76973723

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110310014.3A Pending CN113194356A (en) 2021-03-23 2021-03-23 Video subtitle adding method and system

Country Status (1)

Country Link
CN (1) CN113194356A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114900718A (en) * 2022-07-12 2022-08-12 深圳市华曦达科技股份有限公司 Multi-region perception automatic multi-subtitle realization method, device and system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120316860A1 (en) * 2011-06-08 2012-12-13 Microsoft Corporation Dynamic video caption translation player
CN106340291A (en) * 2016-09-27 2017-01-18 广东小天才科技有限公司 Bilingual subtitle production method and system
CN106851401A (en) * 2017-03-20 2017-06-13 惠州Tcl移动通信有限公司 A kind of method and system of automatic addition captions
CN109495792A (en) * 2018-11-30 2019-03-19 北京字节跳动网络技术有限公司 A kind of subtitle adding method, device, electronic equipment and the readable medium of video

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120316860A1 (en) * 2011-06-08 2012-12-13 Microsoft Corporation Dynamic video caption translation player
CN106340291A (en) * 2016-09-27 2017-01-18 广东小天才科技有限公司 Bilingual subtitle production method and system
CN106851401A (en) * 2017-03-20 2017-06-13 惠州Tcl移动通信有限公司 A kind of method and system of automatic addition captions
CN109495792A (en) * 2018-11-30 2019-03-19 北京字节跳动网络技术有限公司 A kind of subtitle adding method, device, electronic equipment and the readable medium of video

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
LIDASHENT: "python-视频声音根据语音识别自动转为带时间的srt字幕文件", 《HTTPS://BLOG.CSDN.NET/LIDASHENT/ARTICLE/DETAILS/113987349》 *
李超,邱钊,黄向生,胡建政,蔡金晔: "基于知识图谱的智能语音客服系统设计", 《计算机科学与应用》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114900718A (en) * 2022-07-12 2022-08-12 深圳市华曦达科技股份有限公司 Multi-region perception automatic multi-subtitle realization method, device and system

Similar Documents

Publication Publication Date Title
US7415537B1 (en) Conversational portal for providing conversational browsing and multimedia broadcast on demand
US9852762B2 (en) User interface for video preview creation
US8732775B2 (en) Systems and methods of processing closed captioning for video on demand content
US20200045123A1 (en) Method and system for a uniform resource identifier (uri) broker
US11514948B1 (en) Model-based dubbing to translate spoken audio in a video
US9373359B2 (en) Systems and methods for rendering text onto moving image content
US20150089076A1 (en) Method of streaming media to heterogeneous client devices
WO2020211731A1 (en) Video playing method and related device
US20100281042A1 (en) Method and System for Transforming and Delivering Video File Content for Mobile Devices
CN105681912A (en) Video playing method and device
CN105828096B (en) Method and device for processing media stream file
WO2009149354A2 (en) Systems and methods for creating and sharing a presentation
US20120151080A1 (en) Media Repackaging Systems and Software for Adaptive Streaming Solutions, Methods of Production and Uses Thereof
JP2023515392A (en) Information processing method, system, device, electronic device and storage medium
CN113194356A (en) Video subtitle adding method and system
CN105592081A (en) Method for converting videos between terminal and server
CN106664299A (en) Media presentation guide method based on hyper text transport protocol media stream and related devic
WO2007068197A1 (en) A method and system for content directional transmission and distributed access in the telecommunication transmission terminal
US20150215671A1 (en) Video sharing mechanism where in the filters can be changed after the video is shared with a filter
US20220263882A1 (en) A service worker and the related method
CN110544475B (en) Method for implementing multi-voice assistant
Black et al. A compendium of robust data structures
CN113779018A (en) Data processing method and device
US20230239328A1 (en) Computer implemented method for processing streaming requests and responses
KR102659938B1 (en) Method and apparatus for dynamic adaptive streaming over HTTP

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210730

RJ01 Rejection of invention patent application after publication