CN110035326A - Subtitle generation, the video retrieval method based on subtitle, device and electronic equipment - Google Patents
Subtitle generation, the video retrieval method based on subtitle, device and electronic equipment Download PDFInfo
- Publication number
- CN110035326A CN110035326A CN201910272387.9A CN201910272387A CN110035326A CN 110035326 A CN110035326 A CN 110035326A CN 201910272387 A CN201910272387 A CN 201910272387A CN 110035326 A CN110035326 A CN 110035326A
- Authority
- CN
- China
- Prior art keywords
- video
- data
- subtitle
- target video
- language
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/783—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
- G06F16/7844—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using original textual content or text extracted from visual content or transcript of audio data
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/439—Processing of audio elementary streams
- H04N21/4394—Processing of audio elementary streams involving operations for analysing the audio stream, e.g. detecting features or characteristics in audio streams
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/47—End-user applications
- H04N21/488—Data services, e.g. news ticker
- H04N21/4884—Data services, e.g. news ticker for displaying subtitles
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/222—Studio circuitry; Studio devices; Studio equipment
- H04N5/262—Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
- H04N5/278—Subtitling
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Theoretical Computer Science (AREA)
- Library & Information Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Studio Circuits (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
- Television Signal Processing For Recording (AREA)
Abstract
The embodiment of the invention discloses subtitle generation, the video retrieval method based on subtitle, device and electronic equipments.One specific embodiment of this method includes: to extract audio data from the video data of target video;Speech recognition is carried out to the audio data, caption data is generated according to speech recognition result;By caption data in conjunction with the video data of the target video, the video data of the target video comprising subtitle is generated.The corresponding subtitle of audio data for generating target video by video editor is realized, and subtitle is combined with video data.On the one hand the cost for generating the subtitle of video can be reduced, the speed of video caption generation on the other hand can be improved.
Description
Technical field
The present invention relates to multimedia technology field more particularly to a kind of generation of subtitle, the video retrieval method based on subtitle,
Device and electronic equipment.
Background technique
Subtitle refers to that in video playing, the explanation text occurred in video playing interface may include dialogue, explanation
Word or other information.Subtitle can help spectators to understand the content of program.Usual subtitle be all after video program completion,
Post-production.
Current subtitle adding method is all to use third party's speech-to-text software, life after video record completion
At the corresponding text of audio content in video.Then text is pasted in video editor, places text realization frame by frame and holds
Continuous Subtitle Demonstration effect.Such subtitle addition manner causes subtitle to add higher cost.
Summary of the invention
The embodiment of the invention provides a kind of generation of subtitle, the video retrieval method based on subtitle, device and electronic equipment,
It realizes and caption data is generated according to the audio data in video in video editor, to reduce the mesh of subtitle manufacturing cost
's.
In a first aspect, the embodiment of the invention provides a kind of method for generating captions, this method comprises: from the view of target video
Frequency extracts audio data in;Speech recognition is carried out to the audio data, generates caption data;By caption data with it is described
The video data of target video combines, and generates the video data of the target video comprising subtitle.
Optionally, described by caption data in conjunction with the video data of the target video, generate include subtitle mesh
Before the video data for marking video, the method also includes: obtain the time synchronization generated in the shooting process of target video
Information;And it is described by caption data in conjunction with the video data of the target video, generate the target video comprising subtitle
Video data, comprising: be based on the time synchronization information for the corresponding caption data of the audio data and the target video
Video data combine, generate include subtitle target video.
Optionally, it is described based on the time synchronization information by the video data of the caption data and the target video
In conjunction with, comprising: determine at least one audio data frame included by the audio data;For each audio data frame, determine
Point and end time point at the beginning of the audio data frame, and put at the end of at the beginning of the determining and audio data frame
Between put corresponding video image key frame;It will be with the audio data frame pair based on the sart point in time and end time point
The caption data answered is in conjunction with video image key frame.
Optionally, described that speech recognition is carried out to the audio data, generate caption data, comprising: according to speech recognition
As a result, generating the consistent first language caption data of category of language corresponding with the voice;It generates and the first language word
At least one corresponding second language caption data of curtain data, the affiliated category of language of second language and the first language institute
It is different to belong to category of language;And it is described by caption data in conjunction with the video data of the target video, generate comprising subtitle
The video data of target video, comprising: by the first language caption data, at least one described second language caption data with
The video data of the target video combines, and generates the video data of the target video comprising subtitle.
Optionally, the method also includes: receive user input subtitle be arranged parameter;And it is described by caption data with
The video data of the target video combines, and generates the video data of the target video comprising subtitle, comprising: will apply described
The caption data of parameter is arranged in conjunction with the video data of the target video in subtitle, generates the view of the target video including subtitle
Frequency evidence.
Optionally, before extracting audio data in the video data from target video, the method also includes: it connects
The subtitle for receiving user's input generates instruction;And the corresponding audio data of the extraction target video, comprising: according to the subtitle
Instruction is generated, the corresponding audio data of target video data is extracted.
Second aspect, the embodiment of the present invention improve a kind of video retrieval method based on subtitle, including to receive user defeated
The video search keyword entered;The video search keyword is matched with presetting database, is determined according to matching result
The corresponding search target video of the video search keyword, in the presetting database in advance the multiple videos of associated storage and
Described search target video, is sent to the terminal device of user by the corresponding caption data of each video;Wherein, described pre-
If the corresponding caption data of any video in database is generated based on any one method for generating captions of first aspect.
The third aspect, the embodiment of the invention provides a kind of caption generation devices, comprising: extraction unit is used for from target
Audio data is extracted in the video data of video;First generation unit is generated for carrying out speech recognition to the audio data
Caption data;Second generation unit includes subtitle in conjunction with the video data of the target video, generating caption data
Target video video data.
Optionally, described device further includes synchronizing information acquiring unit, and the synchronizing information acquiring unit is used for: being obtained
The time synchronization information generated in the shooting process of target video;And second generation unit is further used for: being based on institute
Time synchronization information is stated by caption data in conjunction with the video data of the target video, generates the target video comprising subtitle.
Optionally, second generation unit is further used for: determining at least one sound included by the audio data
Frequency data frame;For each audio data frame, determine point and end time point at the beginning of the audio data frame, and determine with
Video image key frame corresponding with end time point is put at the beginning of the audio data frame;Based on the time started
Point and end time point will caption data corresponding with the audio data frame in conjunction with video image key frame.
Optionally, first generation unit is further used for: according to speech recognition result, generating corresponding with the voice
The consistent first language caption data of category of language;It is generated according to presetting method corresponding with the first language caption data
At least one second language caption data, the affiliated category of language of second language and the affiliated category of language of the first language are not
Together;And second generation unit is further used for: by the first language caption data, at least one described second language
Caption data generates the video data of the target video comprising subtitle in conjunction with the video data of the target video.
Optionally, described device further includes the first receiving unit, and first receiving unit is used for: receiving user's input
Parameter is arranged in subtitle;And second generation unit is further used for: will apply the subtitle number of the subtitle setting parameter
According in conjunction with the video data of the target video, the video data of the target video including subtitle is generated.
Optionally, described device further includes the second receiving unit, and second receiving unit is used for: in the extraction unit
Before the audio data for extracting target video, the subtitle for receiving user's input generates instruction;And the extraction unit is further
For: it is generated and is instructed according to the subtitle, extract audio data from the video data of target video.
Fourth aspect, the embodiment of the invention provides a kind of video frequency searching device based on subtitle, comprising: receiving unit,
For receiving the video search keyword of user's input;Determination unit is used for the video search keyword and preset data
Library is matched, and determines the corresponding search target video of the video search keyword, the preset data according to matching result
Multiple videos and the corresponding caption data of each video are pre-saved in library, transmission unit is used for described search mesh
Mark video is sent to the terminal device of user;Wherein, the corresponding caption data of any video in the presetting database is based on
Any caption generation device of the third aspect generates.
5th aspect, the embodiment of the invention provides a kind of electronic equipment, comprising: one or more processors;Storage dress
It sets, for storing one or more programs, when one or more programs are executed by one or more processors, so that one or more
A processor realizes the step of any one of the above method for generating captions.
6th aspect, the embodiment of the invention provides a kind of electronic equipment, comprising: one or more processors;Storage dress
It sets, for storing one or more programs, when one or more of programs are executed by one or more of processors, so that
One or more of processors realize such as the above-mentioned video retrieval method based on subtitle.
7th aspect, the embodiment of the invention provides a kind of computer-readable mediums, are stored thereon with computer program, should
The step of any one of the above method for generating captions is realized when program is executed by processor.
Eighth aspect, the embodiment of the invention provides a kind of computer-readable mediums, are stored thereon with computer program, should
The step of above-mentioned video retrieval method based on subtitle is realized when program is executed by processor.
Subtitle generation provided in an embodiment of the present invention, the video retrieval method based on subtitle, device and electronic equipment, pass through
Extract the audio data of target video;Then, speech recognition is carried out to audio data, generates caption data;Finally, by subtitle number
According in conjunction with the video data of target video, the video data of the target video comprising subtitle is generated.Above scheme realizes logical
It crosses video editor and generates the corresponding subtitle of audio data of target video, and subtitle is combined with video data.On the one hand
The cost for generating the subtitle of video can be reduced, the speed of video caption generation on the other hand can be improved.
Detailed description of the invention
Attached drawing for a better understanding of the present invention, does not constitute an undue limitation on the present invention.Wherein:
Fig. 1 is the flow chart of one embodiment of method for generating captions according to the present invention;
Fig. 2 is the flow chart according to another embodiment of the method for generating captions of the application;
Fig. 3 is the flow chart of another embodiment of method for generating captions according to the present invention;
Fig. 4 is the flow chart of one embodiment of the video retrieval method according to the present invention based on subtitle;
Fig. 5 is the structural schematic diagram of one embodiment of caption generation device according to the present invention;
Fig. 6 is the structural schematic diagram of one embodiment of the video frequency searching device according to the present invention based on subtitle;
Fig. 7 is that the method for generating captions of one embodiment of the present of invention can be applied to exemplary system architecture therein;
Fig. 8 is the schematic diagram of the basic structure of the electronic equipment provided according to embodiments of the present invention.
Specific embodiment
Below in conjunction with attached drawing, an exemplary embodiment of the present invention will be described, including the various of the embodiment of the present invention
Details is to help understanding.They should be thought only exemplary.Therefore it will be appreciated by those of ordinary skill in the art that,
It can be with various changes and modifications are made to the embodiments described herein, without departing from scope and spirit of the present invention.
It should be noted that in the absence of conflict, the feature in embodiment and embodiment in the present invention can phase
Mutually combination.
Referring to FIG. 1, it illustrates the processes of one embodiment of method for generating captions according to the present invention.Such as Fig. 1 institute
Show the method for generating captions, comprising the following steps:
Step 101, audio data is extracted from the video data of target video.
In the present embodiment, above-mentioned target video can be the video of captured in real-time, be also possible to locally be stored pre-
The video first shot.It can also be the video obtained from other electronic equipments.The video data of above-mentioned target video may include
The video flowing and audio stream being made of multiple image.According to default encapsulation in the corresponding video data of namely above-mentioned target video
Format encapsulates video flowing and audio stream.
In application scenes, it can be extracted from the video data of target video for encapsulating video flowing and audio stream
Audio data out.Namely the video data of above-mentioned target video is demultiplexed, audio stream is isolated from above-mentioned video data.On
It states video flowing and audio stream is multiplexed identical time shaft.
In other application scenarios, the audio content of video can be recorded in video preprocessor playing process,
To obtain the corresponding audio data of target video.
Whole extraction can be carried out to the audio data of target video.The video data of target video can also be decomposed into
Multiple decomposition video datas are carried out the corresponding audio data of each decomposition video simultaneously using multithreading and extracted.
Step 102, speech recognition is carried out to audio data, generates caption data.
In the present embodiment, speech recognition can be carried out to audio data, caption data is generated according to speech recognition result.
Specifically, the corresponding word content of audio data can be obtained by speech recognition first, then word content is compiled into subtitle,
To obtain the caption data of target video.
In some optional implementations of the present embodiment, default sound bank can use to carry out language to audio data
Sound identification, obtains the corresponding word content of audio data.In application scenes, default sound bank can be for local preparatory
Storaged voice library.In these application scenarios, it can use local voice library and speech recognition carried out to audio data.Other one
In a little application scenarios, default sound bank can be the sound bank being stored in devices in remote electronic.It, can in these application scenarios
It is communicated to connect with being realized by wired or communication and above-mentioned electronic equipment, and accesses remote speech library, using remote
Journey sound bank carries out speech recognition to above-mentioned audio data.
It may include multiple texts and each text corresponding at least one in application scenes, in sound bank
A standard pronunciation.
In other application scenarios, sound bank is comprising voice, text and semantic adaptable database;In language
By the voice of input in sound library, the clear and coherent sentence of corresponding text composition can be found according to specific context, semanteme, and it is defeated
The voice entered is adapted.
In the optional implementation of other of the present embodiment, above-mentioned steps 102 carry out voice to audio data
Identification, generating caption data may include steps of:
Firstly, determining voice messaging from audio data.
Secondly, audio data is decomposed into multiple speech data frames.
Specifically, voice messaging may include multiple speech data frames.Each speech data frame is a series of tight associations
Word or sentence.Then the frequency spectrum of available voice messaging in practice determines above-mentioned each language according to the frequency spectrum of voice messaging
The initial position of tablet section and final position, so that above-mentioned voice messaging is decomposed into multiple sound bites.
Then, the voice data in each sound bite is identified using the method for speech recognition, generated and each sound bite
Corresponding word content.
It can be using speech recognition technology, speech recognition technology are also referred to as automatic speech recognition in the prior art
Automatic Speech Recognition, (ASR), target is that the vocabulary Content Transformation in the voice by the mankind is to calculate
Machine readable input, such as key, binary coding or character string etc. have neural network, adaptive etc. by the way of
Method.By speech recognition, the lexical information in voice segments can be identified, and be translated into the mode of text, obtained
To the word content of audio data.
It is above-mentioned that word content is compiled into subtitle, including will word content corresponding with each sound bite according to each language
The sequencing of tablet section is concatenated into the subtitle of target video.
Step 103, by caption data in conjunction with the video data of target video, the view of the target video comprising subtitle is generated
Frequency evidence.
Packaged video flowing and audio stream can be multiplexed same time axis in the video data of visual frequency.It can be according to upper
It states time shaft and combines caption data with the video data of target video, generate the video counts of the target video comprising subtitle
According to.
In application scenes, caption data can be embedded in the video data of target video, to produce packet
The video data of target video containing subtitle.
In other application scenarios, caption data can also be generated to independent file, by subtitle data files with
Video data is packaged into the video data of the target video comprising subtitle.
In some optional implementations of the present embodiment, before step 103, above-mentioned method for generating captions is also wrapped
It includes: obtaining the time synchronization information generated in the shooting process of target video.Here time synchronization information can be each view
The corresponding timestamp of frequency picture frame and the corresponding timestamp of each audio frame.Above-mentioned timestamp usually may include the time started and
End time.
In these optional implementations, the video counts by caption data and the target video of above-mentioned steps 103
According to combination, the video data of the target video comprising subtitle is generated, comprising: be based on the time synchronization information for the audio number
According to corresponding caption data in conjunction with the video data of the target video, the target video comprising subtitle is generated.Based on above-mentioned
Time synchronization information by caption data in conjunction with the video data of target video so that when playing target video, target video
Video data and caption data can be played simultaneously.
In practice, it is above-mentioned based on time synchronization information by the video counts of audio data corresponding caption data and target video
According to combination, may include steps of:
Firstly, determining at least one audio data frame included by above-mentioned audio data.
Multiple audio data frames obtained in step 102 can be used.
Secondly, determining point and end time point at the beginning of the audio data frame, and really for each audio data frame
Determine and puts video image key frame corresponding with end time point at the beginning of the audio data frame.
Finally, will caption data corresponding with the audio data frame and video figure based on sart point in time and end time point
As key frame combines.
Point and end time at the beginning of namely by the corresponding caption data of audio data frame according to audio data frame
Point video image frame corresponding with the audio data frame is combined, and ensure that the video flowing of target video and subtitle stream data are same
Step.To which when playing target video, the video data and caption data of target video can be played simultaneously.
It should be noted that method provided in this embodiment can be executed by terminal device.In practice, the present embodiment provides
Method can be executed by video editor is arranged in terminal device.It can also be by the video capture that is arranged in terminal device
Tool executes, or the video distribution tool by being arranged in terminal device executes.
The method that the above embodiment of the present invention provides from the video data of target video by extracting audio data;So
Afterwards, speech recognition is carried out to audio data, generates caption data;Finally, by the video data knot of caption data and target video
It closes, generates the video data of the target video comprising subtitle.Above scheme, which is realized, generates target video by video editor
The corresponding subtitle of audio data, and subtitle is combined with video data.On the one hand the subtitle for generating video can be reduced
On the other hand the speed of video caption generation can be improved in cost.
With further reference to Fig. 2, it illustrates the flow charts of another embodiment of method for generating captions.As shown in Fig. 2, should
The process of method for generating captions, comprising the following steps:
Step 201, audio data is extracted from the video data of target video.
Step 201 is identical as the step 101 in embodiment illustrated in fig. 1, does not repeat herein.
Step 202, speech recognition is carried out to audio data, it is right with above-mentioned voice data institute to generate according to speech recognition result
The consistent first language caption data of the category of language answered.
In the present embodiment, audio data carries out the detailed step of speech recognition, can refer to step 102 shown in FIG. 1,
It does not repeat herein.
It in the present embodiment, may include the language of multiple types in the sound bank using its progress speech recognition, such as
Chinese, English, French, Japanese etc..
When progress voice is other, can be generated and the consistent first language subtitle number of category of language corresponding to voice data
According to.
Here first language can be any language in above-mentioned Chinese, English, French, Japanese etc..
Step 203, at least one second language caption data corresponding with first language caption data is generated.
In application scenes, the method that existing various real time translations can be used is generated and first language subtitle number
According at least one corresponding second language caption data.
It should be noted that a kind of language real time translation at the method for another language be at present extensively research and application
Well-known technique, do not repeat herein.
Step 204, by first language caption data, the video counts of at least one second language caption data and target video
According to combination, the video data of the target video comprising subtitle is generated.
In the present embodiment, step 204 can be same or similar with the step 103 in embodiment illustrated in fig. 1, does not go to live in the household of one's in-laws on getting married herein
It states.
Furthermore it is possible to first language caption data be arranged, at least one second language subtitle states the mode respectively shown, example
The position such as respectively shown, so as to user watch target video when, each language subtitle data can according to it is above-mentioned be arranged into
Row display.
From figure 2 it can be seen that compared with the corresponding embodiment of Fig. 1, the process of the method for generating captions in the present embodiment
It highlights and first language caption data is generated according to speech recognition result, and second language is generated according to first language caption data
The step of caption data, can extend subtitle type, be beneficial for improving user experience.
With further reference to Fig. 3, it illustrates the flow charts of another embodiment of method for generating captions.As shown in figure 3, should
The process of method for generating captions, comprising the following steps:
Step 301, audio data is extracted from the video data of target video.
Step 301 is identical as the step 101 in embodiment illustrated in fig. 1, does not repeat herein.
Step 302, speech recognition is carried out to audio data, generates caption data.
Step 302 can be same or similar with the step 102 in embodiment illustrated in fig. 1, does not repeat herein.
Step 303, parameter is arranged in the subtitle for receiving user's input.
In the present embodiment, user can be configured the parameter of subtitle.Such as the size of font, subtitle are aobvious in subtitle
Show position, subtitle region background color, font style, font color etc..
User can be prompted to be configured subtitle parameters in the video editing interface of target video.User can basis
Prompt inputs above-mentioned subtitle setting parameter in video editing interface.User can input above-mentioned word in such a way that text inputs
Curtain setting parameter can also input above-mentioned subtitle by the normal form of voice and parameter is arranged.
In the present embodiment, above-mentioned steps 303 can be interchanged with step 302.
Step 304, the caption data for applying subtitle setting parameter is generated into packet in conjunction with the video data of target video
The video data of target video containing subtitle.
In the present embodiment, the subtitle setting parameter that user inputs can be applied to caption data by video editing tool
In.Such as above-mentioned subtitle setting parameter etc. is written on the head of subtitle data files.
Specific method of the caption data in conjunction with the video data of target video that subtitle setting parameter will be applied can be with
With reference to the step 103 of embodiment illustrated in fig. 1, do not repeat herein.
From figure 3, it can be seen that compared with the corresponding embodiment of Fig. 1, the process of the method for generating captions in the present embodiment
The subtitle setting parameter for receiving user's input is highlighted, the view of the caption data and target video of subtitle setting parameter will be applied
The step of frequency is according to combining.Personalized caption data can be generated in the above method, to realize the diversification of caption data.
In some optional implementations of each embodiment of method for generating captions of the application, embodiment shown in Fig. 1
Step 101 before, before the step 201 of embodiment illustrated in fig. 2 and before the step 301 of embodiment illustrated in fig. 3, subtitle generates
Method can also include receiving the subtitle generation instruction of user's input.It is real shown in the step 101 and Fig. 3 of embodiment shown in Fig. 1
Before the step 301 for applying example, above-mentioned method for generating captions be may further include: the subtitle for receiving user's input generates instruction.
In these optional implementations, user can input subtitle in the video editing interface of editor's target video
Generate instruction.
Specifically, it when opening target video is edited in video editing interface, can prompt the user whether to generate word
Curtain.Or prompt whether to generate subtitle in the recording process of target video, or target video is being sent to other electricity
Prompt whether generate subtitle before sub- equipment.
User can input subtitle and generate instruction according to above-mentioned prompt.Here subtitle instructions for example can be user couple
The selection operation generation of the options of the no instruction generation subtitle generated in subtitle is shown in screen.
In addition, user can also select not generate subtitle according to above-mentioned prompt.
When user has input the instruction for generating the subtitle of target video, the step 101 of above-mentioned embodiment illustrated in fig. 1, Fig. 2
Audio number is extracted in the video data of the slave target video of the step 301 of the step 201 and embodiment illustrated in fig. 3 of illustrated embodiment
According to, may include: according to the subtitle of user generate instruct, extract audio data from the video data of target video.
Referring to FIG. 4, it illustrates the streams of one embodiment of the video retrieval method according to the present invention based on subtitle
Journey.It is somebody's turn to do the video retrieval method based on subtitle as shown in Figure 4, comprising the following steps:
Step 401, the video search keyword of user's input is received.
In the present embodiment, user can in the terminal device that it is used input video search key or video
Search key, sentence.For example, input video search key, word, sentence in the search interface that user applies at one.Above-mentioned use
The video search keyword of family input can be the video search keyword inputted in the form of text, be also possible to voice shape
The video search keyword of formula input.When user is with speech form input video keyword, above-mentioned terminal device can be at this
Ground carries out speech recognition, to identify the corresponding video search keyword of user speech.In addition, above-mentioned terminal device can also lead to
It crosses network and sends server end for user's input voice, speech recognition is carried out to user speech by server, to identify
The video search keyword of user's input out.
Above-mentioned terminal device can send server end for video search keyword.It is defeated that server end can receive user
The video search keyword entered.
Step 402, video search keyword is matched with presetting database, video search is determined according to matching result
The corresponding search target video of keyword.
In the present embodiment, caption data corresponding to any video in above-mentioned presetting database can be based on Fig. 1, figure
2 or the method for generating captions that is illustrated of embodiment shown in Fig. 3 generate.
In the present embodiment, after server receives the video search keyword that terminal device is sent in step 401,
Video search keyword can be matched with presetting database.It specifically, can will be pre- in video key and database
The caption data of the multiple videos first stored is compared, and includes the video of video key as search target view using its subtitle
Frequently.Video it is possible to further will in its banner include video key searches for target video as preferred.
It is understood that above-mentioned search target video may include at least one video.
Step 403, search target video is sent to the terminal device of user.
In the present embodiment, at least one search target video can be sent to the terminal device of user.Specifically, may be used
It is sent to the terminal device of user so that the corresponding summary info of target video and link information will be searched for, so that terminal device exhibition
Show above-mentioned search target video.Above-mentioned summary info can also include the image information etc. of search target video.
Video retrieval method provided in this embodiment based on subtitle, can by keyword that user inputs with stored
The caption data of video determines search target video with matched method.Compared to existing according to key frame of video image pair
The information answered and the keyword of user's input carry out matched method, for the method to determine search target video, this method
Video retrieval method can reduce the cost of video search, improve the efficiency of video search.
It generates and fills the present invention provides a kind of subtitle as the realization to method shown in above-mentioned each figure with further reference to Fig. 5
The one embodiment set, the Installation practice is corresponding with embodiment of the method shown in FIG. 1, which specifically can be applied to respectively
In kind electronic equipment.
As shown in figure 5, the caption generation device of the present embodiment includes extraction unit 501, the first generation unit 502 and second
Generation unit 503.Wherein, extraction unit 501, for extracting audio data from the video data of target video;First generates
Unit 502 generates caption data for carrying out speech recognition to the audio data;Second generation unit 503 is used for word
Curtain data generate the video data of the target video comprising subtitle in conjunction with the video data of the target video.
In the present embodiment, the extraction unit 501 of caption generation device, the first generation unit 502 and the second generation unit
503 specific processing and its brought technical effect can be respectively with reference to step 101, step 102 and steps in Fig. 1 corresponding embodiment
Rapid 103 related description, details are not described herein.
In some optional implementations of the present embodiment, caption generation device further includes synchronizing information acquiring unit
(not shown).Above-mentioned synchronizing information acquiring unit is used for: it is same to obtain the time generated in the shooting process of target video
Walk information;And second generation unit 503 be further used for: based on time synchronization information by the view of caption data and target video
Frequency generates the target video comprising subtitle according to combination.
In some optional implementations of the present embodiment, the second generation unit 503 is further used for: determining the sound
Frequency is according at least one included audio data frame;For each audio data frame, at the beginning of determining the audio data frame
Between point and end time point, and determine and corresponding with the end time point video figure of point at the beginning of the audio data frame
As key frame;It will caption data corresponding with the audio data frame and video figure based on the sart point in time and end time point
As key frame combines.
In some optional implementations of the present embodiment, the first generation unit 502 is further used for: being known according to voice
Not as a result, generating the consistent first language caption data of category of language corresponding with the voice;It generates and the first language
At least one corresponding second language caption data of caption data, the affiliated category of language of second language and the first language
Affiliated category of language is different;And second generation unit 503 be further used for: by the first language caption data, it is described extremely
A few second language caption data generates the view of the target video comprising subtitle in conjunction with the video data of the target video
Frequency evidence
In some optional implementations of embodiment, caption generation device further include the first receiving unit (in figure not
It shows).First receiving unit is used for: parameter is arranged in the subtitle for receiving user's input;And second generation unit 503 further use
In: by the caption data for applying subtitle setting parameter in conjunction with the video data of target video, generate the target including subtitle
The video data of video.
In some optional implementations of the present embodiment, the second generation unit is further used for: according to audio data
Corresponding caption data and time synchronization information generate subtitle file, and subtitle file is based on time synchronization information and video counts
It is packaged according to file, generates the video data of the target video comprising subtitle.
In some optional implementations of the present embodiment, caption generation device further includes the second receiving unit.Second
Receiving unit is used for: before the audio data that extraction unit extracts target video, the subtitle for receiving user's input generates instruction;
And extraction unit is further used for: being generated and is instructed according to subtitle, extracts the corresponding audio data of target video data.
With further reference to Fig. 6, as the realization to method shown in above-mentioned each figure, the present invention provides a kind of based on subtitle
One embodiment of video frequency searching device, the Installation practice is corresponding with embodiment of the method shown in Fig. 4, which specifically may be used
To be applied in various electronic equipments.
As shown in fig. 6, the video frequency searching device based on subtitle of the present embodiment includes receiving unit 601, determination unit 602
With transmission unit 603.Wherein, receiving unit 601, for receiving the video search keyword of user's input;Determination unit 602,
For matching the video search keyword with presetting database, determine that the video search is crucial according to matching result
Word corresponding search target video pre-saves multiple videos and the corresponding word of each video in the presetting database
Curtain data, transmission unit 603, for described search target video to be sent to the terminal device of user;Wherein, the present count
It is generated according to the corresponding caption data of any video in library based on caption generation device shown in fig. 5.
Referring to FIG. 7, the method for generating captions that Fig. 7 shows one embodiment of the present of invention can be applied to therein show
Example property system architecture.
As shown in fig. 7, system architecture may include terminal device 701,702,703, network 704 and server 705.Network
704 between terminal device 701,702,703 and server 705 to provide the medium of communication link.Network 704 may include
Various connection types, such as wired, wireless communication link or fiber optic cables etc..
Terminal device 701,702,703 can be interacted by network 704 with server 705, to receive or send message etc..
Various client applications can be installed, such as the application of video editing class, video playback class are answered on terminal device 701,702,703
With etc..
Terminal device 701,702,703 can be hardware, be also possible to software.When terminal device 701,702,703 is hard
When part, the various electronic equipments of video tour, including but not limited to smart phone, plate are can be with display screen and supported
Computer pocket computer on knee and desktop computer etc..When terminal device 701,702,703 is software, may be mounted at
In above-mentioned cited electronic equipment.Multiple softwares or software module may be implemented into (such as providing Distributed Services in it
Software or software module), single software or software module also may be implemented into.It is not specifically limited herein.Above-mentioned target view
Frequency can be the video shot by terminal device, can also be the video shot by other video capture equipment, and according to communication
Connection is sent to above-mentioned terminal device.
Server 705 can provide various services, such as the transmission of receiving terminal apparatus 701,702,703 includes subtitle
The video data of target video.And target video is pushed to subscriber terminal equipment according to the search key of user's input.
It should be noted that method for generating captions provided by the embodiment of the present invention generally by terminal device 701,702,
703 execute, and correspondingly, caption generation device is generally positioned in terminal device 701,702,703.
It should be noted that method for generating captions provided by the embodiment of the present invention can be by terminal device is arranged in
701, the video editing application execution in 702,703.
It should be understood that the number of terminal device, network and server in Fig. 7 is only schematical.According to realization need
It wants, can have any number of terminal device, network and server.
Below with reference to Fig. 8, it illustrates the signals of the basic structure for the electronic equipment for being suitable for being used to realize the embodiment of the present invention
Figure.Electronic equipment shown in Fig. 8 is only an example, should not function to the embodiment of the present invention and use scope bring it is any
Limitation.
As shown in figure 8, electronic equipment may include one or more processors 801, storage device 802.Storage device 802
User stores one or more programs.One or more programs in storage device 802 can be by one or more processors 801
It executes.When one or more programs are executed by one or more processors, so that this may be implemented in one or more processors
The above-mentioned function of being limited in the method for invention.
Being described in module involved in the embodiment of the present invention can be realized by way of software, can also be by hard
The mode of part is realized.Described module also can be set in the processor, for example, can be described as: a kind of processor packet
Include extraction unit, the first generation unit and the second generation unit.Wherein, the title of these modules is not constituted under certain conditions
Restriction to the module itself, for example, extraction unit is also described as " extracting audio from the video data of target video
The unit of data ".
As on the other hand, the present invention also provides a kind of computer-readable medium, which be can be
Included in equipment described in above-described embodiment;It is also possible to individualism, and without in the supplying equipment.Of the invention
Computer-readable medium can be computer-readable signal media or computer readable storage medium either the two
Any combination.Computer readable storage medium for example may be-but not limited to-electricity, magnetic, optical, electromagnetic, infrared ray or
System, device or the device of semiconductor, or any above combination.The more specific example of computer readable storage medium can
To include but is not limited to: having electrical connection, portable computer diskette, hard disk, the random access storage of one or more conducting wires
Device (RAM), read-only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, Portable, compact magnetic
Disk read-only memory (CD-ROM), light storage device, magnetic memory device or above-mentioned any appropriate combination.
Above-mentioned computer-readable medium carries one or more program, when said one or multiple programs are set by this
When standby execution, so that the equipment extracts audio data from the video data of target video;Voice is carried out to the audio data
Identification generates caption data;By caption data in conjunction with the video data of the target video, generates the target comprising subtitle and regard
The video data of frequency.
Above-mentioned specific embodiment, does not constitute a limitation on the scope of protection of the present invention.Those skilled in the art should be bright
It is white, design requirement and other factors are depended on, various modifications, combination, sub-portfolio and substitution can occur.It is any
Made modifications, equivalent substitutions and improvements etc. within the spirit and principles in the present invention, should be included in the scope of the present invention
Within.
Claims (18)
1. a kind of method for generating captions characterized by comprising
Audio data is extracted from the video data of target video;
Speech recognition is carried out to the audio data, generates caption data;
By caption data in conjunction with the video data of the target video, the video data of the target video comprising subtitle is generated.
2. the method according to claim 1, wherein in the video by caption data and the target video
Data combine, before the video data for generating the target video comprising subtitle, the method also includes:
Obtain the time synchronization information generated in the shooting process of target video;And
It is described by caption data in conjunction with the video data of the target video, generate comprising subtitle target video video counts
According to, comprising:
Based on the time synchronization information by the video data of the audio data corresponding caption data and the target video
In conjunction with generation includes the target video of subtitle.
3. according to the method described in claim 2, it is characterized in that, described be based on the time synchronization information for the subtitle number
According in conjunction with the video data of the target video, comprising:
Determine at least one audio data frame included by the audio data;
For each audio data frame, point and end time point at the beginning of the audio data frame, and the determining and sound are determined
Video image key frame corresponding with end time point is put at the beginning of frequency data frame;
It will caption data corresponding with the audio data frame and video image pass based on the sart point in time and end time point
Key frame combines.
4. method according to claim 1-3, which is characterized in that described to carry out voice knowledge to the audio data
Not, caption data is generated, comprising:
According to speech recognition result, the consistent first language caption data of category of language corresponding with the voice is generated;
Corresponding with the first language caption data at least one second language caption data is generated, belonging to the second language
Category of language is different from the affiliated category of language of the first language;And
It is described by caption data in conjunction with the video data of the target video, generate comprising subtitle target video video counts
According to, comprising:
By the first language caption data, the video counts of at least one the second language caption data and the target video
According to combination, the video data of the target video comprising subtitle is generated.
5. method according to claim 1-3, which is characterized in that the method also includes:
Parameter is arranged in the subtitle for receiving user's input;And
It is described by caption data in conjunction with the video data of the target video, generate comprising subtitle target video video counts
According to, comprising:
By the caption data for applying the subtitle setting parameter in conjunction with the video data of the target video, generate including word
The video data of the target video of curtain.
6. method according to claim 1-3, which is characterized in that in the video data from target video
Before extracting audio data, the method also includes:
The subtitle for receiving user's input generates instruction;And
The corresponding audio data of the extraction target video, comprising:
It is generated and is instructed according to the subtitle, extract the corresponding audio data of target video data.
7. a kind of video retrieval method based on subtitle characterized by comprising
Receive the video search keyword of user's input;
The video search keyword is matched with presetting database, determines that the video search is crucial according to matching result
The corresponding search target video of word, the multiple videos of associated storage and each video respectively correspond in advance in the presetting database
Caption data;
Described search target video is sent to the terminal device of user;Wherein, any video pair in the presetting database
The caption data answered is generated based on the method that any one of claim 1-6 is provided.
8. a kind of caption generation device characterized by comprising
Extraction unit, for extracting audio data from the video data of target video;
First generation unit generates caption data for carrying out speech recognition to the audio data;
Second generation unit, for caption data in conjunction with the video data of the target video, to be generated the mesh comprising subtitle
Mark the video data of video.
9. device according to claim 8, which is characterized in that described device further includes synchronizing information acquiring unit,
The synchronizing information acquiring unit is used for:
Obtain the time synchronization information generated in the shooting process of target video;And
Second generation unit is further used for:
Based on the time synchronization information by caption data in conjunction with the video data of the target video, generate comprising subtitle
Target video.
10. device according to claim 9, which is characterized in that second generation unit is further used for:
Determine at least one audio data frame included by the audio data;
For each audio data frame, point and end time point at the beginning of the audio data frame, and the determining and sound are determined
Video image key frame corresponding with end time point is put at the beginning of frequency data frame;
It will caption data corresponding with the audio data frame and video image pass based on the sart point in time and end time point
Key frame combines.
11. according to the described in any item devices of claim 8-10, which is characterized in that first generation unit is further used
In:
According to speech recognition result, the consistent first language caption data of category of language corresponding with the voice is generated;
At least one second language caption data corresponding with the first language caption data is generated according to presetting method, it is described
The affiliated category of language of second language is different from the affiliated category of language of the first language;And
Second generation unit is further used for:
By the first language caption data, the video counts of at least one the second language caption data and the target video
According to combination, the video data of the target video comprising subtitle is generated.
12. according to the described in any item devices of claim 8-10, which is characterized in that described device further includes the first reception list
Member, first receiving unit are used for:
Parameter is arranged in the subtitle for receiving user's input;And
Second generation unit is further used for:
By the caption data for applying the subtitle setting parameter in conjunction with the video data of the target video, generate including word
The video data of the target video of curtain.
13. according to the described in any item devices of claim 8-10, which is characterized in that described device further includes the second reception list
Member, second receiving unit are used for:
Before the audio data that the extraction unit extracts target video, the subtitle for receiving user's input generates instruction;And
The extraction unit is further used for:
It is generated and is instructed according to the subtitle, extract audio data from the video data of target video.
14. a kind of video frequency searching device based on subtitle characterized by comprising
Receiving unit, for receiving the video search keyword of user's input;
Determination unit determines institute according to matching result for matching the video search keyword with presetting database
The corresponding search target video of video search keyword is stated, pre-saves multiple videos and each view in the presetting database
Frequently corresponding caption data,
Transmission unit, for described search target video to be sent to the terminal device of user;Wherein, in the presetting database
The device that is provided based on any one of claim 8-13 of the corresponding caption data of any video generate.
15. a kind of electronic equipment characterized by comprising
One or more processors;
Storage device, for storing one or more programs,
When one or more of programs are executed by one or more of processors, so that one or more of processors are real
Now such as method as claimed in any one of claims 1 to 6.
16. a kind of electronic equipment characterized by comprising
One or more processors;
Storage device, for storing one or more programs,
When one or more of programs are executed by one or more of processors, so that one or more of processors are real
Now the method for claim 7.
17. a kind of computer-readable medium, is stored thereon with computer program, which is characterized in that the program is executed by processor
Shi Shixian method for example as claimed in any one of claims 1 to 6.
18. a kind of computer-readable medium, is stored thereon with computer program, which is characterized in that the program is executed by processor
Shi Shixian method according to claim 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910272387.9A CN110035326A (en) | 2019-04-04 | 2019-04-04 | Subtitle generation, the video retrieval method based on subtitle, device and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910272387.9A CN110035326A (en) | 2019-04-04 | 2019-04-04 | Subtitle generation, the video retrieval method based on subtitle, device and electronic equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110035326A true CN110035326A (en) | 2019-07-19 |
Family
ID=67237520
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910272387.9A Pending CN110035326A (en) | 2019-04-04 | 2019-04-04 | Subtitle generation, the video retrieval method based on subtitle, device and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110035326A (en) |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110740275A (en) * | 2019-10-30 | 2020-01-31 | 中央电视台 | nonlinear editing systems |
CN111683266A (en) * | 2020-05-06 | 2020-09-18 | 厦门盈趣科技股份有限公司 | Method and terminal for configuring subtitles through simultaneous translation of videos |
CN111949805A (en) * | 2020-09-23 | 2020-11-17 | 深圳前海知行科技有限公司 | Subtitle generating method, device and equipment based on artificial intelligence and storage medium |
CN111986656A (en) * | 2020-08-31 | 2020-11-24 | 上海松鼠课堂人工智能科技有限公司 | Teaching video automatic caption processing method and system |
CN112055261A (en) * | 2020-07-14 | 2020-12-08 | 北京百度网讯科技有限公司 | Subtitle display method and device, electronic equipment and storage medium |
CN112163102A (en) * | 2020-09-29 | 2021-01-01 | 北京字跳网络技术有限公司 | Search content matching method and device, electronic equipment and storage medium |
CN112309391A (en) * | 2020-03-06 | 2021-02-02 | 北京字节跳动网络技术有限公司 | Method and apparatus for outputting information |
CN112511910A (en) * | 2020-11-23 | 2021-03-16 | 浪潮天元通信信息系统有限公司 | Real-time subtitle processing method and device |
CN112684967A (en) * | 2021-03-11 | 2021-04-20 | 荣耀终端有限公司 | Method for displaying subtitles and electronic equipment |
EP3817395A1 (en) * | 2019-10-30 | 2021-05-05 | Beijing Xiaomi Mobile Software Co., Ltd. | Video recording method and apparatus, device, and readable storage medium |
CN112929758A (en) * | 2020-12-31 | 2021-06-08 | 广州朗国电子科技有限公司 | Multimedia content subtitle generating method, equipment and storage medium |
CN112995749A (en) * | 2021-02-07 | 2021-06-18 | 北京字节跳动网络技术有限公司 | Method, device and equipment for processing video subtitles and storage medium |
WO2021120190A1 (en) * | 2019-12-20 | 2021-06-24 | 深圳市欢太科技有限公司 | Data processing method and apparatus, electronic device, and storage medium |
CN113345439A (en) * | 2021-05-28 | 2021-09-03 | 北京达佳互联信息技术有限公司 | Subtitle generating method, device, electronic equipment and storage medium |
CN113490057A (en) * | 2021-06-30 | 2021-10-08 | 海信电子科技(武汉)有限公司 | Display device and media asset recommendation method |
CN113490058A (en) * | 2021-08-20 | 2021-10-08 | 云知声(上海)智能科技有限公司 | Intelligent subtitle matching system applied to later stage of movie and television |
CN115034233A (en) * | 2022-06-16 | 2022-09-09 | 安徽听见科技有限公司 | Translation method, translation device, electronic equipment and storage medium |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2005115607A (en) * | 2003-10-07 | 2005-04-28 | Matsushita Electric Ind Co Ltd | Video retrieving device |
CN103984772A (en) * | 2014-06-04 | 2014-08-13 | 百度在线网络技术(北京)有限公司 | Method and device for generating text retrieval subtitle library and video retrieval method and device |
CN106162293A (en) * | 2015-04-22 | 2016-11-23 | 无锡天脉聚源传媒科技有限公司 | A kind of video sound and the method and device of image synchronization |
CN106792071A (en) * | 2016-12-19 | 2017-05-31 | 北京小米移动软件有限公司 | Method for processing caption and device |
CN108401192A (en) * | 2018-04-25 | 2018-08-14 | 腾讯科技(深圳)有限公司 | Video stream processing method, device, computer equipment and storage medium |
CN108600773A (en) * | 2018-04-25 | 2018-09-28 | 腾讯科技(深圳)有限公司 | Caption data method for pushing, subtitle methods of exhibiting, device, equipment and medium |
CN109213974A (en) * | 2018-08-22 | 2019-01-15 | 北京慕华信息科技有限公司 | A kind of electronic document conversion method and device |
CN109246472A (en) * | 2018-08-01 | 2019-01-18 | 平安科技(深圳)有限公司 | Video broadcasting method, device, terminal device and storage medium |
-
2019
- 2019-04-04 CN CN201910272387.9A patent/CN110035326A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2005115607A (en) * | 2003-10-07 | 2005-04-28 | Matsushita Electric Ind Co Ltd | Video retrieving device |
CN103984772A (en) * | 2014-06-04 | 2014-08-13 | 百度在线网络技术(北京)有限公司 | Method and device for generating text retrieval subtitle library and video retrieval method and device |
CN106162293A (en) * | 2015-04-22 | 2016-11-23 | 无锡天脉聚源传媒科技有限公司 | A kind of video sound and the method and device of image synchronization |
CN106792071A (en) * | 2016-12-19 | 2017-05-31 | 北京小米移动软件有限公司 | Method for processing caption and device |
CN108401192A (en) * | 2018-04-25 | 2018-08-14 | 腾讯科技(深圳)有限公司 | Video stream processing method, device, computer equipment and storage medium |
CN108600773A (en) * | 2018-04-25 | 2018-09-28 | 腾讯科技(深圳)有限公司 | Caption data method for pushing, subtitle methods of exhibiting, device, equipment and medium |
CN109246472A (en) * | 2018-08-01 | 2019-01-18 | 平安科技(深圳)有限公司 | Video broadcasting method, device, terminal device and storage medium |
CN109213974A (en) * | 2018-08-22 | 2019-01-15 | 北京慕华信息科技有限公司 | A kind of electronic document conversion method and device |
Cited By (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3817395A1 (en) * | 2019-10-30 | 2021-05-05 | Beijing Xiaomi Mobile Software Co., Ltd. | Video recording method and apparatus, device, and readable storage medium |
CN110740275A (en) * | 2019-10-30 | 2020-01-31 | 中央电视台 | nonlinear editing systems |
WO2021120190A1 (en) * | 2019-12-20 | 2021-06-24 | 深圳市欢太科技有限公司 | Data processing method and apparatus, electronic device, and storage medium |
CN112309391A (en) * | 2020-03-06 | 2021-02-02 | 北京字节跳动网络技术有限公司 | Method and apparatus for outputting information |
CN112309391B (en) * | 2020-03-06 | 2024-07-12 | 北京字节跳动网络技术有限公司 | Method and device for outputting information |
CN111683266A (en) * | 2020-05-06 | 2020-09-18 | 厦门盈趣科技股份有限公司 | Method and terminal for configuring subtitles through simultaneous translation of videos |
CN112055261A (en) * | 2020-07-14 | 2020-12-08 | 北京百度网讯科技有限公司 | Subtitle display method and device, electronic equipment and storage medium |
CN111986656A (en) * | 2020-08-31 | 2020-11-24 | 上海松鼠课堂人工智能科技有限公司 | Teaching video automatic caption processing method and system |
CN111949805B (en) * | 2020-09-23 | 2024-09-20 | 深圳前海知行科技有限公司 | Subtitle generation method, device, equipment and storage medium based on artificial intelligence |
CN111949805A (en) * | 2020-09-23 | 2020-11-17 | 深圳前海知行科技有限公司 | Subtitle generating method, device and equipment based on artificial intelligence and storage medium |
CN112163102A (en) * | 2020-09-29 | 2021-01-01 | 北京字跳网络技术有限公司 | Search content matching method and device, electronic equipment and storage medium |
CN112163102B (en) * | 2020-09-29 | 2023-03-17 | 北京字跳网络技术有限公司 | Search content matching method and device, electronic equipment and storage medium |
CN112511910A (en) * | 2020-11-23 | 2021-03-16 | 浪潮天元通信信息系统有限公司 | Real-time subtitle processing method and device |
CN112929758A (en) * | 2020-12-31 | 2021-06-08 | 广州朗国电子科技有限公司 | Multimedia content subtitle generating method, equipment and storage medium |
CN112995749B (en) * | 2021-02-07 | 2023-05-26 | 北京字节跳动网络技术有限公司 | Video subtitle processing method, device, equipment and storage medium |
CN112995749A (en) * | 2021-02-07 | 2021-06-18 | 北京字节跳动网络技术有限公司 | Method, device and equipment for processing video subtitles and storage medium |
CN112684967A (en) * | 2021-03-11 | 2021-04-20 | 荣耀终端有限公司 | Method for displaying subtitles and electronic equipment |
CN113345439A (en) * | 2021-05-28 | 2021-09-03 | 北京达佳互联信息技术有限公司 | Subtitle generating method, device, electronic equipment and storage medium |
CN113345439B (en) * | 2021-05-28 | 2024-04-30 | 北京达佳互联信息技术有限公司 | Subtitle generation method, subtitle generation device, electronic equipment and storage medium |
CN113490057A (en) * | 2021-06-30 | 2021-10-08 | 海信电子科技(武汉)有限公司 | Display device and media asset recommendation method |
CN113490057B (en) * | 2021-06-30 | 2023-03-24 | 海信电子科技(武汉)有限公司 | Display device and media asset recommendation method |
CN113490058A (en) * | 2021-08-20 | 2021-10-08 | 云知声(上海)智能科技有限公司 | Intelligent subtitle matching system applied to later stage of movie and television |
CN115034233A (en) * | 2022-06-16 | 2022-09-09 | 安徽听见科技有限公司 | Translation method, translation device, electronic equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110035326A (en) | Subtitle generation, the video retrieval method based on subtitle, device and electronic equipment | |
CN108401192B (en) | Video stream processing method and device, computer equipment and storage medium | |
CN109246472A (en) | Video broadcasting method, device, terminal device and storage medium | |
US11252444B2 (en) | Video stream processing method, computer device, and storage medium | |
US11917344B2 (en) | Interactive information processing method, device and medium | |
US10034028B2 (en) | Caption and/or metadata synchronization for replay of previously or simultaneously recorded live programs | |
US8966360B2 (en) | Transcript editor | |
CN111955013B (en) | Method and system for facilitating interactions during real-time streaming events | |
CN110781328A (en) | Video generation method, system, device and storage medium based on voice recognition | |
BR112016006860B1 (en) | APPARATUS AND METHOD FOR CREATING A SINGLE DATA FLOW OF COMBINED INFORMATION FOR RENDERING ON A CUSTOMER COMPUTING DEVICE | |
CN103491429A (en) | Audio processing method and audio processing equipment | |
CN103414948A (en) | Method and device for playing video | |
CN110691271A (en) | News video generation method, system, device and storage medium | |
JP2022160519A (en) | Media environment-driven content distribution platform | |
CN114040255A (en) | Live caption generating method, system, equipment and storage medium | |
US8913869B2 (en) | Video playback apparatus and video playback method | |
JP2021090172A (en) | Caption data generation device, content distribution system, video reproduction device, program, and caption data generation method | |
US20230300429A1 (en) | Multimedia content sharing method and apparatus, device, and medium | |
KR20130023461A (en) | Caption management method and caption search method | |
US8896708B2 (en) | Systems and methods for determining, storing, and using metadata for video media content | |
CN114341866A (en) | Simultaneous interpretation method, device, server and storage medium | |
CN113709521B (en) | System for automatically matching background according to video content | |
KR101749420B1 (en) | Apparatus and method for extracting representation image of video contents using closed caption | |
CN113891108A (en) | Subtitle optimization method and device, electronic equipment and storage medium | |
CN114501160A (en) | Method for generating subtitles and intelligent subtitle system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190719 |
|
RJ01 | Rejection of invention patent application after publication |