CN109040779B - Caption content generation method, device, computer equipment and storage medium - Google Patents

Caption content generation method, device, computer equipment and storage medium Download PDF

Info

Publication number
CN109040779B
CN109040779B CN201810777015.7A CN201810777015A CN109040779B CN 109040779 B CN109040779 B CN 109040779B CN 201810777015 A CN201810777015 A CN 201810777015A CN 109040779 B CN109040779 B CN 109040779B
Authority
CN
China
Prior art keywords
video
caption content
task
subtitle processing
video clip
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810777015.7A
Other languages
Chinese (zh)
Other versions
CN109040779A (en
Inventor
阮志强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201810777015.7A priority Critical patent/CN109040779B/en
Publication of CN109040779A publication Critical patent/CN109040779A/en
Application granted granted Critical
Publication of CN109040779B publication Critical patent/CN109040779B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/23424Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for inserting or substituting an advertisement
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/235Processing of additional data, e.g. scrambling of additional data or processing content descriptors
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/254Management at additional data server, e.g. shopping server, rights management server
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/435Processing of additional data, e.g. decrypting of additional data, reconstructing software from modules extracted from the transport stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8456Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Biophysics (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

This application involves a kind of caption content generation method, device, computer equipment and storage mediums, this method comprises: obtaining the original video of caption content to be generated;The original video is divided into video clip;Each video clip is concurrently pushed to each subtitle processing apparatus being in idle condition;The video clip is used to indicate corresponding subtitle processing equipment and generates the first caption content corresponding with the video clip;The caption content of redundancy in the first caption content that each subtitle processing apparatus is fed back is removed, and according to remaining caption content after removal redundancy caption content, combination producing the second caption content corresponding with the original video.The scheme of the application improves caption content formation efficiency.

Description

Caption content generation method, device, computer equipment and storage medium
Technical field
The present invention relates to field of computer technology, set more particularly to a kind of caption content generation method, device, computer Standby and storage medium.
Background technique
With the rapid development of science and technology, video is due to that can be used to convey more, more details letter with intuitive Therefore breath plays increasingly important role in people's lives and work.
In order to more intuitively and effectively convey information, video is often furnished with caption content.In conventional method, one is being generated When the caption content of a complete video, need sequentially to handle each video frame one by one, after being disposed to all video frames, The caption content of this complete video could finally be exported.So, cause the efficiency for generating caption content relatively low.
Summary of the invention
Based on this, it is necessary to for the relatively low problem of caption content formation efficiency in conventional method, provide a kind of subtitle Content generating method, device, computer equipment and storage medium.
A kind of caption content generation method, which comprises
Obtain the original video of caption content to be generated;
The original video is divided into video clip;
Each video clip is concurrently pushed to each subtitle processing apparatus being in idle condition;The video clip is used for Indicate that corresponding subtitle processing equipment generates the first caption content corresponding with the video clip;
The caption content of redundancy in the first caption content that each subtitle processing apparatus is fed back is removed, and according to removal Remaining caption content after redundancy caption content, combination producing the second caption content corresponding with the original video.
A kind of caption content generating means, described device include:
Divide module, for obtaining the original video of caption content to be generated;The original video is divided into piece of video Section;
Pushing module, for each video clip concurrently to be pushed to each subtitle processing apparatus being in idle condition;Institute It states video clip and is used to indicate corresponding subtitle processing equipment generation the first caption content corresponding with the video clip;
De-redundancy module, for removing the subtitle of redundancy in the first caption content that each subtitle processing apparatus is fed back Content, and according to remaining caption content after removal redundancy caption content, combination producing corresponding with the original video second Caption content.
A kind of computer equipment, including memory and processor are stored with computer program, the meter in the memory When calculation machine program is executed by processor, so that the processor executes following steps:
Obtain the original video of caption content to be generated;
The original video is divided into video clip;
Each video clip is concurrently pushed to each subtitle processing apparatus being in idle condition;The video clip is used for Indicate that corresponding subtitle processing equipment generates the first caption content corresponding with the video clip;
The caption content of redundancy in the first caption content that each subtitle processing apparatus is fed back is removed, and according to removal Remaining caption content after redundancy caption content, combination producing the second caption content corresponding with the original video.
A kind of storage medium being stored with computer program, when the computer program is executed by processor, so that processing Device executes following steps:
Obtain the original video of caption content to be generated;
The original video is divided into video clip;
Each video clip is concurrently pushed to each subtitle processing apparatus being in idle condition;The video clip is used for Indicate that corresponding subtitle processing equipment generates the first caption content corresponding with the video clip;
The caption content of redundancy in the first caption content that each subtitle processing apparatus is fed back is removed, and according to removal Remaining caption content after redundancy caption content, combination producing the second caption content corresponding with the original video.
Above-mentioned caption content generation method, device, computer equipment and storage medium, by the way that original video is divided into view Each video clip is concurrently pushed to each subtitle processing apparatus being in idle condition by frequency segment, so, at each subtitle Reason equipment can concurrently handle each video clip, generate the first caption content corresponding with video clip respectively.It will removal The caption content of redundancy in each first caption content, according to remaining caption content after removal redundancy caption content, combination producing The second caption content corresponding with original video, the scheme of the application are compared in such a way that parallel processing generates caption content For the caption content of video could be generated after traditional needs have sequentially handled all video frames one by one, improve in subtitle Hold formation efficiency.
Detailed description of the invention
Fig. 1 is the application scenario diagram of caption content generation method in one embodiment;
Fig. 2 is the flow diagram of caption content generation method in one embodiment;
Fig. 3 is the applied environment figure of caption content generation method in another embodiment;
Fig. 4 is the schematic illustration that the first caption content is generated in one embodiment;
Fig. 5 is the schematic illustration of caption content generation method in one embodiment;
Fig. 6 is the block diagram of caption content generating means in one embodiment;
Fig. 7 is the block diagram of caption content generating means in another embodiment;
Fig. 8 is the block diagram of caption content generating means in another embodiment;
Fig. 9 is the schematic diagram of internal structure of computer equipment in one embodiment;
Figure 10 is the schematic diagram of internal structure of computer equipment in another embodiment.
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and It is not used in the restriction present invention.
Fig. 1 is the application scenario diagram of caption content generation method in one embodiment.Referring to Fig.1, it is wrapped in the application scenarios Include the terminal 110 by network connection, slice server 120 and the cluster including multiple subtitle processing apparatus 130, slice clothes Business device 120 establishes connection by network and terminal 110 and subtitle processing apparatus 130 respectively.Being sliced server 120 is to have to regard Frequency is sliced the server of function.It is appreciated that the other function of slice server 120 is not limited here, for example, slice service Device 120 is also equipped with the function of removal redundancy caption content.Subtitle processing apparatus 130 is that have setting for caption content systematic function It is standby.
Terminal 110 can be intelligent TV set, desktop computer or mobile terminal, and mobile terminal may include mobile phone, put down At least one of plate computer, laptop, personal digital assistant and wearable device etc..Slice server 120 can be used The server cluster of independent server either multiple physical servers composition is realized.
User can upload the original video extremely slice server 120 of shooting or editing by terminal 110.It is sliced server 120 can be divided into original video video clip, and each video clip is concurrently pushed to each subtitle being in idle condition Processing equipment 130;Video clip is used to indicate corresponding subtitle processing equipment 130 and generates in the first subtitle corresponding with video clip Hold.First caption content of generation can be fed back to slice server 120 by subtitle processing apparatus 130.Being sliced server 120 can To remove the caption content of redundancy in each first caption content fed back, and according to remaining word after removal redundancy caption content Curtain content, combination producing the second caption content corresponding with original video.It is appreciated that slice server 120 can be further The caption content of generation is sent to terminal 110 by ground, so that terminal 110 can show corresponding word when playing original video Curtain content.
It is appreciated that in other embodiments, original video must be sent to being sliced server by terminal 110 by not limiting 120, slice server 120 can also directly acquire stored original video.
In one embodiment, subtitle processing apparatus 130 including subtitle processing server and/or has caption content generation The user terminal of function.Have the user terminal of caption content systematic function, refers to, it is used by a user to have caption content life At the terminal of function.Here " user " is a general term, is not limited to upload the user of original video.Similarly, user is whole End is not limited to upload the terminal of original video.In one embodiment, user terminal refers to installation and use has video volume The application program of function is collected, or logs in the terminal with the webpage of video editing function.
It is appreciated that the user terminal for having caption content systematic function can be intelligent TV set, desktop computer or Mobile terminal.Subtitle processing server can use the server cluster of the either multiple physical server compositions of independent server To realize.
It should be noted that " first " and " second " in each embodiment of the application is used only as distinguishing, and it is not used to big The restriction of small, successive, subordinate etc..
Fig. 2 is the flow diagram of caption content generation method in one embodiment.The present embodiment is mainly in the subtitle Hold generation method to be applied to be illustrated in computer equipment, which can be the slice server in Fig. 1 120.Referring to Fig. 2, this method specifically comprises the following steps:
S202 obtains the original video of caption content to be generated.
It is appreciated that not having caption content in the original video obtained.
Computer equipment can receive the original video of the caption content to be generated of terminal upload, also itself available sheet The original video stored in ground storage or database.
In one embodiment, the original view that the available terminal of computer equipment is uploaded by application program or webpage Frequently.Application program can be the application program with video editing function.
Original video is divided into video clip by S204.
It is appreciated that the video content that multiple video clips are combined is the complete video content of original video.
In one embodiment, video clip can be evenly divided into multiple uniform video clips by computer equipment. In other embodiments, computer equipment can also by video clip random division at video clip, i.e., each video clip Size does not require unanimously.
Each video clip is concurrently pushed to each subtitle processing apparatus being in idle condition by S206;Video clip is used The first caption content corresponding with video clip is generated in instruction corresponding subtitle processing equipment.
It is appreciated that the subtitle processing apparatus being in idle condition, has system needed for completing subtitle treatment process and provides Source.I.e. when a subtitle processing apparatus has system resource required for completing subtitle treatment process, then illustrate at the subtitle Reason equipment is in idle condition.The subtitle processing apparatus being in idle condition refers to the subtitle processing for being currently at idle state Equipment.Parallel, refer to that multiple video clips independently can be asynchronously pushed to the different subtitle processing being in idle condition Equipment is mutually independent of each other between each video clip.
It should be noted that subtitle processing apparatus may include subtitle processing server and/or have a caption content generation The user terminal of function.
In one embodiment, each subtitle processing apparatus in cluster can detecte the state of itself, and set to computer The standby status information for reporting itself.When computer equipment is determined at the subtitle being in idle condition according to the status information reported When managing equipment, then each video clip can concurrently be pushed to each subtitle processing apparatus of be determined to be in idle state.
It in one embodiment, may include idle state or busy state in the status information reported.Calculate Machine equipment is whether or not in idle state, all to subtitle processing apparatus uploaded state information.In another embodiment, it reports Status information can be only the information being in idle condition.I.e. computer equipment can be in the free time when detecting oneself state When state, then to computer equipment uploaded state information.It, then can not be to when detecting that oneself state is in busy state Computer equipment uploaded state information.
In other embodiments, computer equipment can also be with the state of actively monitoring subtitle processing apparatus, when monitoring word When curtain processing equipment is in idle condition, then each video clip is concurrently pushed to each subtitle processing being in idle condition and is set It is standby.
It is appreciated that computer equipment can be from the obtained video clip set of segmentation, randomly selecting video segment, The video clip of selection is concurrently pushed to the subtitle processing apparatus being in idle condition.Computer equipment can also be from segmentation In obtained video clip set, according to the sequencing of video clip present position in original video, view is sequentially chosen The video clip of selection is concurrently pushed to the subtitle processing apparatus being in idle condition by frequency segment.
For example, original video is successively divided into video clip 1, video clip 2, video clip 3 ... video clip N, that , when subtitle processing apparatus A and subtitle processing apparatus B is in idle condition, computer equipment then can randomly choose view Frequency segment 2 and video clip 5, then concurrently push to subtitle processing apparatus A and subtitle processing apparatus B, so that subtitle is handled Equipment A and subtitle processing apparatus B handles a video clip respectively.Computer equipment can also be according to video clip in original view The sequencing of present position in frequency, sequentially selecting video segment 1 and video clip 2, concurrently push to subtitle processing and set Standby A and subtitle processing apparatus B.
It is appreciated that the subtitle processing apparatus being in idle condition in cluster can be one or more, when there is multiple words When curtain processing equipment is in idle condition, computer equipment can be concurrently pushed to each video clip respectively in idle shape Each subtitle processing apparatus of state, each subtitle processing apparatus can carry out subtitle generation processing to the received video clip of institute, with Generate the first caption content corresponding with video clip.First caption content of generation can be fed back to meter by subtitle processing apparatus Calculate machine equipment.
In one embodiment, video clip can be inputted in neural network model and carry out subtitle life by subtitle processing apparatus At processing, to generate the first caption content corresponding with video clip.Wherein, neural network model is trained in advance is used for Generate the machine learning model of caption content corresponding with video clip.
First caption content is the textual form of sound corresponding in video clip.For example, personage says in video clip When words, so that it may the first caption content of text is shown in broadcasting pictures, in order to user's viewing, thus convey it is more, More accurate information.
S208, removes the caption content of redundancy in the first caption content that each subtitle processing apparatus is fed back, and according to going Except remaining caption content after redundancy caption content, combination producing the second caption content corresponding with original video.
Wherein, the caption content of redundancy, i.e., extra duplicate caption content.
Specifically, computer equipment can receive the first caption content that each subtitle processing apparatus is fed back, and to institute Received all first caption contents carry out de-redundancy processing, to get rid of the caption content of redundancy.Computer equipment can root According to remaining caption content after removal redundancy caption content, combination producing the second caption content corresponding with original video.It can be with Understand, the second caption content is original video eventually for the caption content for playing display.
In one embodiment, the first caption content can be pressed corresponding video segment in original video institute by computer equipment The sequencing of place position is ranked up, and the caption content of redundance in the first caption content adjacent after sequence is removed, According to caption content combination producing the second caption content corresponding with original video after residue.
In one embodiment, the first adjacent caption content can directly be carried out content comparison, root by computer equipment The caption content of redundance is removed according to comparison result.In another embodiment, computer equipment can also be by each first Caption content inputs in neural network model, carries out de-redundancy processing, and output obtains in the second subtitle corresponding with original video Hold.
Above-mentioned caption content generation method, by the way that original video is divided into video clip, concurrently by each video clip Each subtitle processing apparatus being in idle condition is pushed to, so, each subtitle processing apparatus can be handled concurrently respectively Video clip generates the first caption content corresponding with video clip respectively.The word of redundancy in each first caption content will be removed Curtain content, according to remaining caption content after removal redundancy caption content, combination producing the second subtitle corresponding with original video Content, the scheme of the application are sequentially handled in such a way that parallel processing generates caption content compared to traditional needs one by one For the caption content that video could be generated after complete all video frames, caption content formation efficiency is improved.
In one embodiment, each video clip is concurrently pushed to each subtitle processing apparatus packet being in idle condition It includes: each video clip being added to task queue, the video task in each video clip and task queue corresponds;From appoint It is engaged in successively selecting video task in queue, and the video task of selection is concurrently distributed to each subtitle being in idle condition Manage equipment;Video task is used to indicate each subtitle processing apparatus and generates corresponding with video clip corresponding to video task the One caption content.
Specifically, each video clip that segmentation obtains can be added to task queue by computer equipment, then, each view Frequency segment then corresponds to a video task in task queue.It is appreciated that each video task is according to being added in task queue The sequencing of task queue is ranked up.Computer equipment can from task queue successively selecting video task, will choose Video task concurrently distribute to the subtitle processing apparatus being in idle condition.Subtitle processing apparatus is receiving video task Afterwards, subtitle generation processing can be carried out to the video clip corresponding to video task, to generate corresponding first caption content.
For example, task queue there are 20 video tasks, currently there are 4 subtitle processing apparatus being in idle condition.It calculates Machine equipment can successively choose preceding 4 video tasks, be concurrently respectively allocated to this 4 subtitle being in idle condition processing and set It is standby.If subsequent when having 2 subtitle processing apparatus to be in idle condition again, computer equipment can appoint the 5th and the 6th video Business is concurrently respectively allocated to this 2 subtitle processing apparatus being in idle condition.
It is appreciated that video task is added to the sequencing of task queue, it is locating in original video with video clip The sequencing of position is consistent.For example, for most preceding video clip in the original video, corresponding to video Task is exactly first video task in task queue, for second video clip divided by original video, Corresponding video task is exactly second video task in task queue.
In above-described embodiment, each video clip is added to task queue, the view in each video clip and task queue Frequency task corresponds;The successively selecting video task from task queue, and the video task of selection concurrently distributed to locating In each subtitle processing apparatus of idle state, to generate corresponding first caption content.While video clip parallel processing, It is able to maintain orderly processing again, improves the accuracy and efficiency of processing.
In one embodiment, subtitle processing apparatus includes subtitle processing server, this method further include: when there is no places When the subtitle processing server of idle state, then the video task in task queue is concurrently distributed to being in idle condition User terminal;Video task is used to indicate user terminal and generates corresponding with video clip corresponding to video task first Caption content.
It is appreciated that user terminal here has caption content systematic function.Similarly, the user being in idle condition Terminal has system resource needed for completing subtitle treatment process.Subtitle treatment process is completed when a user terminal has When required system resource, then illustrate that the user terminal is in idle condition.
In one embodiment, user terminal can detecte the state of itself, and the shape of itself is reported to computer equipment State information.Computer equipment determines the user terminal being in idle condition according to the status information reported.In another implementation In example, computer equipment can also be with the state of actively monitoring user terminal, to determine the user terminal being in idle condition.
Specifically, computer equipment can detecte with the presence or absence of the subtitle processing server being in idle condition, and work as calculating It, then can be by the video task in task queue simultaneously when machine equipment determines the subtitle processing server for being not present and being in idle condition It is distributed capablely to the user terminal being in idle condition.User terminal can carry out word to the video clip corresponding to video task Curtain generation processing, to generate the first caption content corresponding with video clip.
It should be noted that here, user terminal carries out the premise that subtitle generates processing, it is built upon the base of legal processes It is carried out on plinth, and there is no conceal, usurp the problem of user terminal flow and resource.
Fig. 3 is the applied environment figure of caption content generation method in another embodiment.Referring to Fig. 3, the applied environment figure In include terminal 110, slice server 120, subtitle processing server 130a and have caption content systematic function user it is whole Hold 130b, it will be understood that subtitle processing server 130a and the user terminal 130b for having caption content systematic function can unite Referred to as subtitle processing apparatus 130.Original video is uploaded to slice server 120 by terminal 110, and slice server 120 will be original Video segmentation is video clip, and is added to task queue.When there is the subtitle processing server 130a being in idle condition, Be sliced server 120 can will from task queue successively selecting video task, concurrently video task is distributed in sky The subtitle processing server 130a of not busy state, to be generated and video corresponding to video task by subtitle processing server 130a Corresponding first caption content of segment.When there is no the subtitle processing server 130a being in idle condition, slice servers 120 Then the video task in task queue can concurrently be distributed to the user terminal 130b being in idle condition.It is appreciated that Computer equipment in the embodiment of the present application can be the slice server 120 in Fig. 3.
In one embodiment, video clip can be inputted in neural network model and be carried out at subtitle generation by user terminal Reason, to generate the first caption content corresponding with video clip.Wherein, neural network model is trained in advance for generating The machine learning model of caption content corresponding with video clip.
In one embodiment, when detect beyond preset duration do not receive that subtitle processing server reports in the free time When the status information of state, computer equipment then can be determined that there is no the subtitle processing servers being in idle condition.It can be with Understand, exceed preset duration, refers to from the last time and receive the state letter being in idle condition that subtitle processing server reports It has ceased beyond preset duration.
In another embodiment, this method further include: add timestamp for each video task;Timestamp is for recording phase Answer the addition time of video task;When the gap of the addition time of time stab record and current time are greater than preset threshold, then Determine that there is no the subtitle processing servers being in idle condition.
Specifically, computer equipment can be video corresponding to video clip when adding video clip to task queue Task adds timestamp.Timestamp is used to record the addition time of corresponding video task, wherein the addition time is video task The initial time being added.Computer equipment the addition time that timestamp records can be compared with current time, at that time Between stamp record addition the time and current time gap be greater than preset threshold when, then telltale title processing server is busy, It does not handle in time, therefore, computer equipment then can be determined that there is no the subtitle processing servers being in idle condition.
It, then will be in task queue when there is no the subtitle processing server being in idle condition in above-described embodiment Video task is concurrently distributed to the user terminal being in idle condition, can be reasonable to generate corresponding first caption content Using resource, to improve caption content formation efficiency.
In one embodiment, video clip is also used to indicate that video clip is converted by corresponding subtitle processing equipment respectively Color image sequence of frames of video and light stream image/video frame sequence are regarded from each color image of color image sequence of frames of video respectively The first characteristics of image is extracted in frequency frame, and extracts the second figure from each light stream image/video frame of light stream image/video frame sequence As feature, objective pattern is determined according to the first characteristics of image and object of which movement feature is determined according to the second characteristics of image;Point Objective pattern and object of which movement feature are analysed, the first caption content corresponding with video clip is obtained.
Wherein, color image refers to the image that each pixel is made of R (Red), G (Green), B (Blue) component.RGB It is the color for representing three channels of red, green, blue.Color image video frame illustrates that every frame video frame is color image.It can manage It solves, being in video clip principle includes a series of video frame.So the color image video frame that video clip is converted into Sequence, including a series of color image video frame.
Light stream is a kind of motor pattern, and this motor pattern refers to an object under a visual angle by an observer The apparent motion formed between (such as eyes, camera etc.) and background.Light stream image refers to the image being made of optical flow data. Optical flow data, for embodying objects in images motion conditions.Light stream image/video frame illustrates that every frame video frame is light stream image. Light stream image/video frame sequence, including a series of light stream image/video frame.Objective pattern is for characterizing object form Data.Object of which movement feature is the data for characterizing object of which movement behavior.
Specifically, after video clip is sent to subtitle processing apparatus by computer equipment, subtitle processing apparatus can will be regarded Frequency segment is respectively converted into color image sequence of frames of video and light stream image/video frame sequence.
In one embodiment, the first convolution neural network model and second are had trained in subtitle processing apparatus in advance respectively Convolutional neural networks model, wherein the first convolution neural network model is for handling color image sequence of frames of video, the second convolution Neural network model is for handling light stream image/video frame sequence.Subtitle processing apparatus can be defeated by color image sequence of frames of video Enter in corresponding first convolution neural network model, to export from each color image video frame of color image sequence of frames of video Light stream image/video frame sequence can be inputted corresponding second convolutional Neural by the first characteristics of image extracted, subtitle processing apparatus In network model, to export the second characteristics of image extracted from each light stream image/video frame of light stream image/video frame sequence. It is appreciated that the first characteristics of image is the feature extracted in terms of object form.Second characteristics of image is in terms of object of which movement The feature of extraction.
Wherein, convolutional neural networks (Convolutional Neural Network, CNN) are a kind of feedforward artificial neurons Network, artificial neuron can respond surrounding cells, to carry out image procossing.Feedforward neural network is the one of artificial neural network Kind, in feedforward neural network, each layer includes several neurons, and it is defeated to receive previous stage since input layer for each neuron Enter, and be output to next stage, is not interconnected up between output layer, the neuron of same layer.
The first recurrent neural network corresponding to color image sequence of frames of video is also had trained in subtitle processing apparatus in advance Model, and corresponding to the second recurrent neural networks model of light stream image/video frame sequence, subtitle processing apparatus can be by first Characteristics of image inputs the first recurrent neural networks model, to pass through the first recurrent neural network according to the first characteristics of image output Volume morphing feature, the second characteristics of image can be inputted the second recurrent neural networks model by subtitle processing apparatus, to pass through second Recurrent neural networks model exports object motion feature according to the second characteristics of image.It is appreciated that the first recurrent neural network mould Type can be from time dimension in conjunction with the relevance between the first characteristics of image of multiple video frames, to determine more accurately Objective pattern, the second recurrent neural networks model can combine the second characteristics of image of multiple video frames from time dimension Between relevance, to determine more accurate object of which movement feature.In one embodiment, recurrent neural networks model can It is shot and long term memory network model to be LSTM (Long Short-Term Memory).
It is appreciated that subtitle processing apparatus can analyze each objective pattern and each object of which movement feature, obtains and regard Corresponding first caption content of frequency segment.
In above-described embodiment, pass through color image sequence of frames of video and the respective characteristics of image of light stream image/video frame sequence Objective pattern and object of which movement feature are determined, thus according to objective pattern and object of which movement feature these two aspects Feature obtains the first caption content corresponding with video clip, improves the accuracy for generating the first caption content.
In one embodiment, it is same in video clip to be also used to indicate that corresponding subtitle processing equipment will correspond to for video clip The first characteristics of image and the second characteristics of image of one video frame are merged, and obtain fusion feature, and determine according to fusion feature Object integrates motion feature;By the comprehensive motion feature of objective pattern, object of which movement feature and object, input is corresponded to respectively Subtitle generate network model in, export corresponding score vector;Each score of same video frame in video clip will be corresponded to Vector is weighted summation according to corresponding ballot weight, word corresponding to top score in summed result is exported, as needle The word that same video frame is exported;It is corresponding to video clip according to being obtained for the combinations of words that each video frame is exported The first caption content.
Wherein, subtitle generates network model, is trained in advance for generating the neural network model of caption content.
Specifically, subtitle processing apparatus is special in the first image for extracting each video frame by corresponding convolutional neural networks It seeks peace after the second characteristics of image, can will correspond to the first characteristics of image and the second image spy of same video frame in video clip Sign is merged, and fusion feature is obtained.Subtitle processing apparatus can determine the comprehensive motion feature of object according to fusion feature.One In a embodiment, third recurrent neural network is had trained in subtitle processing apparatus in advance, subtitle processing apparatus can be comprehensive by object Resultant motion feature inputs third recurrent neural network, to export object according to fusion feature by third recurrent neural networks model Comprehensive motion feature.It is appreciated that third recurrent neural networks model can combine melting for multiple video frames from time dimension The relevance between feature is closed, to determine the comprehensive motion feature of more accurate object.
It is appreciated that being to the video in video clip in color image sequence of frames of video and light stream image/video frame sequence What frame was handled, then, color image video frame and light stream image/video frame sequence in color image sequence of frames of video Light stream image/video frame in column, there are corresponding same video frames all in video clip, then from corresponding in video clip The first characteristics of image and the second characteristics of image extracted in the color image video frame and light stream image/video frame of same video frame, Also correspond to the same video frame.In addition, by the first characteristics of image and the second multi-features that correspond to same video frame Obtained fusion feature also corresponds to the same video frame.In turn, by the first figure corresponding to same video frame in video clip The comprehensive movement of objective pattern, object of which movement feature and the object determined as feature, the second characteristics of image and fusion feature Feature also corresponds to the same video frame.
For example, the video frame A in video clip is converted, color image video frame a1 is respectively obtained, light stream figure is obtained As video frame a2, then, the objective pattern determined by the first characteristics of image extracted in color image video frame a1, With the object of which movement feature determined by the second characteristics of image extracted in light stream image/video frame a2, and by the first image The comprehensive motion feature of the object that the fusion feature that feature and the second multi-features obtain is determined, all corresponds to piece of video Video frame A in section.
The first subtitle for handling objective pattern is had trained in subtitle processing apparatus in advance to generate network model, use Network model, and the third word for handling the comprehensive motion feature of object are generated in the second subtitle of processing object motion feature Curtain generates network model.Objective pattern can be inputted corresponding first subtitle and generate network model by subtitle processing apparatus, Object of which movement feature is inputted into corresponding second subtitle and generates network model, the comprehensive motion feature of object is inputted into corresponding third Subtitle generates network model.
Computer equipment can generate network model by the first subtitle and encode to each objective pattern, defeated Layer carries out corresponding decoding out, obtains score corresponding to each word in default dictionary, according to score corresponding to each word, i.e., The first score vector can be generated.Computer equipment can generate network model to each object of which movement feature by the second subtitle It is encoded, carries out corresponding decoding in output layer, score corresponding to each word in default dictionary is obtained, according to each word institute Corresponding score, it can generate the second score vector.Computer equipment can generate network model to every by third subtitle The comprehensive motion feature of a object is encoded, and carries out corresponding decoding in output layer, is obtained in default dictionary corresponding to each word Score, according to score corresponding to each word, it can generate third score vector.In advance for difference in computer equipment Subtitle generate network model be provided with corresponding ballot weight, computer equipment can will correspond to video clip in same view The first score vector, the second score vector and the third score vector of frequency frame, are weighted summation according to corresponding ballot weight, Export word corresponding to top score in summed result, the list as exported for the same video frame in video clip Word.So, after determining the word exported for each video frame in video clip, each video frame will can be directed to The combinations of words exported obtains the first caption content corresponding with video clip.
Fig. 4 is the schematic illustration that the first caption content is generated in one embodiment.Referring to Fig. 4, RGB image video frame sequence The multilayer process of convolution that (i.e. color image sequence of frames of video) passes through the first convolution neural network model 402A is arranged, extracts first Characteristics of image, light stream image/video frame sequence pass through the multilayer process of convolution of the second convolution neural network model 402B, extract Second characteristics of image.First characteristics of image of extraction can be inputted the first recurrent neural network by computer equipment, so that first Recurrent neural network combines the relevance between multiframe from time dimension, generates objective pattern.Computer equipment can be with Second characteristics of image of extraction is inputted into the second recurrent neural network, so that the first recurrent neural network is combined from time dimension Relevance between multiframe generates object motion feature.Computer equipment can will correspond to same video frame in video clip The first characteristics of image and the second characteristics of image merged, obtain fusion feature, by fusion feature input third recurrent neural Network, the comprehensive motion feature of output object.Computer equipment can be comprehensive by objective pattern, object of which movement feature and object Resultant motion feature inputs the first subtitle respectively and generates network model, the second subtitle generation network model and third subtitle generation network Model, to obtain the first score vector, the second score vector and third score vector.Same view in video clip will be corresponded to 3 score vectors of frequency frame are weighted summation according to corresponding ballot weight, export in summed result corresponding to top score Word, as the word exported for the same video frame, the combinations of words that each frame is exported obtains caption content.
It is special by color image sequence of frames of video and the respective image of light stream image/video frame sequence in above-described embodiment The fusion feature of sign and the two is directed to the word that same video frame is exported to weight determination, substantially increases identified The accuracy of word.In turn, corresponding with video clip first is obtained according to for the combinations of words that each video frame is exported Caption content, it is also just more accurate.
In one embodiment, step S208 includes: the first caption content for obtaining each subtitle processing apparatus feedback;It will be each According to corresponding video segment, the sequencing of present position in original video is spliced first caption content;It will be spliced Each word in caption content is converted to term vector, obtains term vector sequence;De-redundancy is carried out to term vector sequence to encode to obtain language Adopted vector;Semantic vector is decoded, the second caption content corresponding with original video is generated.
In one embodiment, text simplified network model is had trained in computer equipment in advance.Wherein, text simplifies net Network model is the neural network model for removing redundant content in text.Computer equipment can feed back each subtitle processing apparatus The first caption content according to corresponding video segment, the sequencing of present position in original video is spliced, after splicing Caption content input text simplified network model, text simplified network model can be by each word in spliced caption content Term vector is converted to, term vector sequence is obtained.Text simplified network model can carry out de-redundancy to term vector sequence and encode To semantic vector;It is appreciated that the semantic vector after coding has been de-redundancy treated semantic vector, semantic vector is in The hidden layer of text simplified network model.Computer equipment can solve semantic vector by text simplified network model Code generates the second caption content corresponding with original video.
Fig. 5 is the schematic illustration of caption content generation method in one embodiment.Referring to Fig. 5, user is by applying journey Sequence APP uploads original video and carries out Video segmentation processing to server, slice server is sliced to original video, segmentation is obtained Video clip be added to task queue, if the subtitle processing server that subtitle processing server cluster is available free, by task Video clip in queue is concurrently distributed to subtitle processing server, if the subtitle that subtitle processing server cluster is not idle Processing server then further determines whether available free user terminal, if so, then notifying the user terminal of each free time concurrently The video clip in task queue is handled, all first caption contents that processing generates are summarized, first after summarizing Caption content input text simplified network model in carry out de-redundancy processing, after de-redundancy is handled remaining caption content as The corresponding final caption content of original video is back to the application program for uploading original video.
In above-described embodiment, by each first caption content according to the elder generation of corresponding video segment present position in original video Sequentially spliced afterwards, carries out de-redundancy coding to by spliced caption content, can be improved the accuracy of de-redundancy, in turn Semantic vector after decoding de-redundancy generates the second caption content corresponding with original video, can be improved the standard of caption content True property.
As shown in fig. 6, in one embodiment, a kind of caption content generating means 600 are provided, the device 600 packet It includes: segmentation module 602, pushing module 604 and de-redundancy module 606, in which:
Divide module 602, for obtaining the original video of caption content to be generated;The original video is divided into video Segment.
Pushing module 604, for each video clip concurrently to be pushed to each subtitle processing apparatus being in idle condition; The video clip is used to indicate corresponding subtitle processing equipment and generates the first caption content corresponding with the video clip.
De-redundancy module 606, for removing redundancy in the first caption content that each subtitle processing apparatus is fed back Caption content, and according to remaining caption content after removal redundancy caption content, combination producing is corresponding with the original video Second caption content.
As shown in fig. 7, in one embodiment, the pushing module 604 includes:
Task adding module 604a, for each video clip to be added to task queue;Each video clip with described The video task being engaged in queue corresponds.
Task allocating module 604b, for the successively selecting video task from the task queue, and by the video of selection Task is concurrently distributed to each subtitle processing apparatus being in idle condition;The video task is used to indicate each subtitle processing Equipment generates the first caption content corresponding with video clip corresponding to the video task.
In one embodiment, the subtitle processing apparatus includes subtitle processing server;The task allocating module 604b is also used to when there is no the subtitle processing server being in idle condition, then by the video task in the task queue It concurrently distributes to the user terminal being in idle condition;The video task is used to indicate the user terminal generation and institute State corresponding first caption content of video clip corresponding to video task.
In one embodiment, the task allocating module 604b is also used to add timestamp for each video task;It is described Timestamp is used to record the addition time of corresponding video task;When the difference of the addition time and current time of timestamp record When away from being greater than preset threshold, then determine that there is no the subtitle processing servers being in idle condition.
In one embodiment, the video clip is also used to indicate corresponding subtitle processing equipment by the video clip point It is not converted into color image sequence of frames of video and light stream image/video frame sequence, respectively from the color image sequence of frames of video The first characteristics of image, and each light stream image/video frame from light stream image/video frame sequence are extracted in each color image video frame The second characteristics of image of middle extraction determines objective pattern according to the first image feature and is determined according to the second characteristics of image Object of which movement feature;Each objective pattern and the object of which movement feature are analyzed, is obtained corresponding to the video clip The first caption content.
In one embodiment, the video clip is also used to indicate that corresponding subtitle processing equipment will correspond to the video The first characteristics of image and the second characteristics of image of same video frame are merged in segment, obtain fusion feature, and according to fusion Feature determines the comprehensive motion feature of object;The objective pattern, object of which movement feature and object are integrated into motion feature, Corresponding subtitle is inputted respectively to generate in network model, exports corresponding score vector;Same view in video clip will be corresponded to Each score vector of frequency frame is weighted summation according to corresponding ballot weight, exports in summed result corresponding to top score Word, as the word exported for the same video frame;It is obtained according to for the combinations of words that each video frame is exported To the first caption content corresponding with the video clip.
As shown in figure 8, in one embodiment, the de-redundancy module 606 includes:
Subtitle splicing module 606a, for obtaining the first caption content of each subtitle processing apparatus feedback;By each institute Stating the first caption content, the sequencing of present position in original video is spliced according to corresponding video segment;
De-redundancy coding module 606b obtains word for each word in spliced caption content to be converted to term vector Sequence vector;De-redundancy is carried out to the term vector sequence to encode to obtain semantic vector;
Decoder module 606c is generated in the second subtitle corresponding with the original video for decoding the semantic vector Hold.
Fig. 9 is the schematic diagram of internal structure of computer equipment in one embodiment.Referring to Fig. 9, which can be with It is slice server 120 shown in Fig. 1.It is appreciated that computer equipment is also possible to terminal.The computer equipment includes logical Cross processor, memory and the network interface of system bus connection.Wherein, memory includes non-volatile memory medium and memory Reservoir.The non-volatile memory medium of the computer equipment can storage program area and computer program.The computer program quilt When execution, processor may make to execute a kind of caption content generation method.The processor of the computer equipment is for providing calculating And control ability, support the operation of entire computer equipment.Computer program can be stored in the built-in storage, the computer journey When sequence is executed by processor, processor may make to execute a kind of caption content generation method.The network interface of computer equipment is used In progress network communication.
It will be understood by those skilled in the art that structure shown in Fig. 9, only part relevant to application scheme is tied The block diagram of structure does not constitute the restriction for the computer equipment being applied thereon to application scheme, specific computer equipment It may include perhaps combining certain components or with different component layouts than more or fewer components as shown in the figure.
In one embodiment, caption content generating means provided by the present application can be implemented as a kind of computer program Form, computer program can be run in computer equipment as shown in Figure 9, and the non-volatile memory medium of computer equipment can Storage forms each program module of the caption content generating means, for example, dividing module 602, pushing module shown in Fig. 9 604 and de-redundancy module 606.Computer program composed by each program module is for making the computer equipment execute this theory Step in the caption content generation method of each embodiment of the application described in bright book, for example, computer equipment can lead to The segmentation module 602 crossed in caption content generating means 600 as shown in FIG. 6 obtains the original video of caption content to be generated; The original video is divided into video clip, and is concurrently pushed to each video clip in sky by pushing module 604 Each subtitle processing apparatus of not busy state;The video clip is used to indicate corresponding subtitle processing equipment and generates and the video clip Corresponding first caption content.It is anti-that computer equipment can remove each subtitle processing apparatus institute by de-redundancy module 606 The caption content of redundancy in first caption content of feedback, and according to remaining caption content after removal redundancy caption content, combination Generate the second caption content corresponding with the original video.
It is appreciated that caption content generation method provided in each embodiment of the application is not limited to be applied to service Device, also can be applied to one can be realized Video segmentation slice, task distributes and the terminal of subtitle de-redundancy processing function Equipment.Figure 10 is the schematic diagram of internal structure of computer equipment in another embodiment.Referring to Fig.1 0, the computer equipment Be this can be realized Video segmentation slice, task distributes and the terminal device of subtitle de-redundancy processing function, the computer Equipment includes processor, memory, network interface, display screen and the input unit connected by system bus.Wherein, memory Including non-volatile memory medium and built-in storage.The non-volatile memory medium of the computer equipment can storage program area and Computer program.The computer program is performed, and processor may make to execute a kind of caption content generation method.The computer The processor of equipment supports the operation of entire computer equipment for providing calculating and control ability.It can be stored up in the built-in storage There is computer program, when which is executed by processor, processor may make to execute a kind of caption content generation side Method.The network interface of computer equipment is for carrying out network communication.The display screen of computer equipment can be liquid crystal display or Person's electric ink display screen etc..The input unit of computer equipment can be the touch layer covered on display screen, be also possible to end Key, trace ball or the Trackpad being arranged on end housing, are also possible to external keyboard, Trackpad or mouse etc..The computer Equipment can be personal computer, mobile terminal or mobile unit, and mobile terminal includes that mobile phone, tablet computer, individual digital help At least one of reason or wearable device etc..
It will be understood by those skilled in the art that structure shown in Figure 10, only part relevant to application scheme The block diagram of structure, does not constitute the restriction for the computer equipment being applied thereon to application scheme, and specific computer is set Standby may include perhaps combining certain components or with different component layouts than more or fewer components as shown in the figure.
Similarly, caption content generating means provided by the present application can be implemented as a kind of form of computer program, meter Calculation machine program can be run in computer equipment as shown in Figure 10.
A kind of computer equipment, including memory and processor are stored with computer program, computer program in memory When being executed by processor, so that processor executes following steps: obtaining the original video of caption content to be generated;By original video It is divided into video clip;Each video clip is concurrently pushed to each subtitle processing apparatus being in idle condition;Video clip It is used to indicate corresponding subtitle processing equipment and generates the first caption content corresponding with video clip;Remove each subtitle processing apparatus institute The caption content of redundancy in first caption content of feedback, and according to remaining caption content after removal redundancy caption content, group Symphysis is at the second caption content corresponding with original video.
In one embodiment, each video clip is concurrently pushed to each subtitle processing apparatus packet being in idle condition It includes: each video clip is added to task queue;Video task in each video clip and task queue corresponds;From appoint It is engaged in successively selecting video task in queue, and the video task of selection is concurrently distributed to each subtitle being in idle condition Manage equipment;Video task is used to indicate each subtitle processing apparatus and generates corresponding with video clip corresponding to video task the One caption content.
In one embodiment, subtitle processing apparatus includes subtitle processing server;Computer program is executed by processor When, so that processor executes following steps: when there is no the subtitle processing server being in idle condition, then by task queue In video task concurrently distribute to the user terminal being in idle condition;Video task is used to indicate user terminal generation The first caption content corresponding with video clip corresponding to video task.
In one embodiment, when computer program is executed by processor, so that processor executes following steps: for each view Frequency task adds timestamp;Timestamp is used to record the addition time of corresponding video task;When the addition time of time stab record When being greater than preset threshold with the gap of current time, then determine that there is no the subtitle processing servers being in idle condition.
In one embodiment, video clip is also used to indicate that video clip is converted by corresponding subtitle processing equipment respectively Color image sequence of frames of video and light stream image/video frame sequence are regarded from each color image of color image sequence of frames of video respectively The first characteristics of image is extracted in frequency frame, and extracts the second figure from each light stream image/video frame of light stream image/video frame sequence As feature, objective pattern is determined according to the first characteristics of image and object of which movement feature is determined according to the second characteristics of image;Point Each objective pattern and object of which movement feature are analysed, the first caption content corresponding with video clip is obtained.
In one embodiment, it is same in video clip to be also used to indicate that corresponding subtitle processing equipment will correspond to for video clip The first characteristics of image and the second characteristics of image of one video frame are merged, and obtain fusion feature, and determine according to fusion feature Object integrates motion feature;By the comprehensive motion feature of objective pattern, object of which movement feature and object, input is corresponded to respectively Subtitle generate network model in, export corresponding score vector;Each score of same video frame in video clip will be corresponded to Vector is weighted summation according to corresponding ballot weight, word corresponding to top score in summed result is exported, as needle The word that same video frame is exported;It is corresponding to video clip according to being obtained for the combinations of words that each video frame is exported The first caption content.
In one embodiment, it removes in the first caption content that each subtitle processing apparatus is fed back in the subtitle of redundancy Hold, and according to remaining caption content after removal redundancy caption content, in combination producing the second subtitle corresponding with original video Appearance includes: to obtain the first caption content of each subtitle processing apparatus feedback;By each first caption content according to corresponding video segment The sequencing of present position is spliced in original video;By each word in spliced caption content be converted to word to Amount, obtains term vector sequence;De-redundancy is carried out to term vector sequence to encode to obtain semantic vector;Decode semantic vector, generate with Corresponding second caption content of original video.
A kind of storage medium being stored with computer program, when the computer program is executed by processor, so that processing Device executes following steps: obtaining the original video of caption content to be generated;Original video is divided into video clip;Concurrently will Each video clip pushes to each subtitle processing apparatus being in idle condition;Video clip is used to indicate corresponding subtitle processing equipment Generate the first caption content corresponding with video clip;Remove redundancy in the first caption content that each subtitle processing apparatus is fed back Caption content, and according to remaining caption content after removal redundancy caption content, combination producing corresponding with original video the Two caption contents.
In one embodiment, each video clip is concurrently pushed to each subtitle processing apparatus packet being in idle condition It includes: each video clip is added to task queue;Video task in each video clip and task queue corresponds;From appoint It is engaged in successively selecting video task in queue, and the video task of selection is concurrently distributed to each subtitle being in idle condition Manage equipment;Video task is used to indicate each subtitle processing apparatus and generates corresponding with video clip corresponding to video task the One caption content.
In one embodiment, subtitle processing apparatus includes subtitle processing server;Computer program is executed by processor When, so that processor executes following steps: when there is no the subtitle processing server being in idle condition, then by task queue In video task concurrently distribute to the user terminal being in idle condition;Video task is used to indicate user terminal generation The first caption content corresponding with video clip corresponding to video task.
In one embodiment, when computer program is executed by processor, so that processor executes following steps: for each view Frequency task adds timestamp;Timestamp is used to record the addition time of corresponding video task;When the addition time of time stab record When being greater than preset threshold with the gap of current time, then determine that there is no the subtitle processing servers being in idle condition.
In one embodiment, video clip is also used to indicate that video clip is converted by corresponding subtitle processing equipment respectively Color image sequence of frames of video and light stream image/video frame sequence are regarded from each color image of color image sequence of frames of video respectively The first characteristics of image is extracted in frequency frame, and extracts the second figure from each light stream image/video frame of light stream image/video frame sequence As feature, objective pattern is determined according to the first characteristics of image and object of which movement feature is determined according to the second characteristics of image;Point Each objective pattern and object of which movement feature are analysed, the first caption content corresponding with video clip is obtained.
In one embodiment, it is same in video clip to be also used to indicate that corresponding subtitle processing equipment will correspond to for video clip The first characteristics of image and the second characteristics of image of one video frame are merged, and obtain fusion feature, and determine according to fusion feature Object integrates motion feature;By the comprehensive motion feature of objective pattern, object of which movement feature and object, input is corresponded to respectively Subtitle generate network model in, export corresponding score vector;Each score of same video frame in video clip will be corresponded to Vector is weighted summation according to corresponding ballot weight, word corresponding to top score in summed result is exported, as needle The word that same video frame is exported;It is corresponding to video clip according to being obtained for the combinations of words that each video frame is exported The first caption content.
In one embodiment, it removes in the first caption content that each subtitle processing apparatus is fed back in the subtitle of redundancy Hold, and according to remaining caption content after removal redundancy caption content, in combination producing the second subtitle corresponding with original video Appearance includes: to obtain the first caption content of each subtitle processing apparatus feedback;By each first caption content according to corresponding video segment The sequencing of present position is spliced in original video;By each word in spliced caption content be converted to word to Amount, obtains term vector sequence;De-redundancy is carried out to term vector sequence to encode to obtain semantic vector;Decode semantic vector, generate with Corresponding second caption content of original video.
It should be understood that although each step in each embodiment of the application is not necessarily to indicate according to step numbers Sequence successively execute.Unless expressly stating otherwise herein, there is no stringent sequences to limit for the execution of these steps, these Step can execute in other order.Moreover, in each embodiment at least part step may include multiple sub-steps or Multiple stages, these sub-steps or stage are not necessarily to execute completion in synchronization, but can be at different times Execute, these sub-steps perhaps the stage execution sequence be also not necessarily successively carry out but can with other steps or its The sub-step or at least part in stage of its step execute in turn or alternately.
Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with Relevant hardware is instructed to complete by computer program, the program can be stored in a non-volatile computer and can be read In storage medium, the program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, provided herein Each embodiment used in any reference to memory, storage, database or other media, may each comprise non-volatile And/or volatile memory.Nonvolatile memory may include that read-only memory (ROM), programming ROM (PROM), electricity can be compiled Journey ROM (EPROM), electrically erasable ROM (EEPROM) or flash memory.Volatile memory may include random access memory (RAM) or external cache.By way of illustration and not limitation, RAM is available in many forms, such as static state RAM (SRAM), dynamic ram (DRAM), synchronous dram (SDRAM), double data rate sdram (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronization link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) directly RAM (RDRAM), straight Connect memory bus dynamic ram (DRDRAM) and memory bus dynamic ram (RDRAM) etc..
Each technical characteristic of embodiment described above can be combined arbitrarily, for simplicity of description, not to above-mentioned reality It applies all possible combination of each technical characteristic in example to be all described, as long as however, the combination of these technical characteristics is not deposited In contradiction, all should be considered as described in this specification.
The embodiments described above only express several embodiments of the present invention, and the description thereof is more specific and detailed, but simultaneously It cannot therefore be construed as limiting the scope of the patent.It should be pointed out that coming for those of ordinary skill in the art It says, without departing from the inventive concept of the premise, various modifications and improvements can be made, these belong to protection of the invention Range.Therefore, the scope of protection of the patent of the invention shall be subject to the appended claims.

Claims (13)

1. a kind of caption content generation method, which comprises
Obtain the original video of caption content to be generated;
The original video is divided into video clip;
Each video clip is concurrently pushed to each subtitle processing apparatus being in idle condition;The video clip is used to indicate Corresponding subtitle processing equipment generates the first caption content corresponding with the video clip;The video clip is also used to indicate phase Answer subtitle processing apparatus that the video clip is converted into color image sequence of frames of video and light stream image/video frame sequence respectively, Extract the first characteristics of image from each color image video frame of the color image sequence of frames of video respectively, and from light stream figure As sequence of frames of video each light stream image/video frame in extract the second characteristics of image, object is determined according to the first image feature Morphological feature and object of which movement feature is determined according to the second characteristics of image;Analyze each objective pattern and object fortune Dynamic feature, obtains the first caption content corresponding with the video clip;
The caption content of redundancy in the first caption content that each subtitle processing apparatus is fed back is removed, and according to removal redundancy Remaining caption content after caption content, combination producing the second caption content corresponding with the original video.
2. the method according to claim 1, wherein described concurrently push to each video clip in the free time Each subtitle processing apparatus of state includes:
Each video clip is added to task queue;Video task one in each video clip and the task queue is a pair of It answers;
The successively selecting video task from the task queue, and the video task of selection is concurrently distributed in idle shape Each subtitle processing apparatus of state;The video task, is used to indicate each subtitle processing apparatus generation and the video task institute is right Corresponding first caption content of the video clip answered.
3. according to the method described in claim 2, it is characterized in that, the subtitle processing apparatus includes subtitle processing server;
The method also includes:
When there is no the subtitle processing server being in idle condition, then
Video task in the task queue is concurrently distributed to the user terminal being in idle condition;The video is appointed Business is used to indicate the user terminal and generates the first caption content corresponding with video clip corresponding to the video task.
4. according to the method described in claim 3, it is characterized in that, the method also includes:
Timestamp is added for each video task;The timestamp is used to record the addition time of corresponding video task;
When the gap of the addition time of timestamp record and current time are greater than preset threshold, then
Determine that there is no the subtitle processing servers being in idle condition.
5. the method according to claim 1, wherein the video clip is also used to indicate that corresponding subtitle processing is set Standby the first characteristics of image and the second characteristics of image that will correspond to same video frame in the video clip merges, and is melted Feature is closed, and the comprehensive motion feature of object is determined according to fusion feature;By the objective pattern, object of which movement feature and Object integrates motion feature, inputs corresponding subtitle respectively and generates in network model, exports corresponding score vector;It will correspond to Each score vector of same video frame is weighted summation according to corresponding ballot weight in video clip, exports in summed result Word corresponding to top score, as the word exported for the same video frame;According to for each video frame institute The combinations of words of output obtains the first caption content corresponding with the video clip.
6. the method according to any one of claims 1 to 5, which is characterized in that each subtitle processing of removal is set The caption content of redundancy in standby the first caption content fed back, and according in remaining subtitle after removal redundancy caption content Hold, combination producing the second caption content corresponding with the original video includes:
Obtain the first caption content of each subtitle processing apparatus feedback;
By each first caption content, according to corresponding video segment, the sequencing of present position in original video is spelled It connects;
Each word in spliced caption content is converted into term vector, obtains term vector sequence;
De-redundancy is carried out to the term vector sequence to encode to obtain semantic vector;
The semantic vector is decoded, the second caption content corresponding with the original video is generated.
7. a kind of caption content generating means, which is characterized in that described device includes:
Divide module, for obtaining the original video of caption content to be generated;The original video is divided into video clip;
Pushing module, for each video clip concurrently to be pushed to each subtitle processing apparatus being in idle condition;The view Frequency segment is used to indicate corresponding subtitle processing equipment and generates the first caption content corresponding with the video clip;The piece of video Section is also used to indicate that the video clip is converted into color image sequence of frames of video and light stream by corresponding subtitle processing equipment respectively Image/video frame sequence extracts the first image spy from each color image video frame of the color image sequence of frames of video respectively Sign, and the second characteristics of image is extracted from each light stream image/video frame of light stream image/video frame sequence, according to described first Characteristics of image determines objective pattern and determines object of which movement feature according to the second characteristics of image;Analyze each object form Feature and the object of which movement feature, obtain the first caption content corresponding with the video clip;
De-redundancy module, for removing in the first caption content that each subtitle processing apparatus is fed back in the subtitle of redundancy Hold, and according to remaining caption content after removal redundancy caption content, combination producing the second word corresponding with the original video Curtain content.
8. device according to claim 7, which is characterized in that the pushing module includes:
Task adding module, for each video clip to be added to task queue;In each video clip and the task queue Video task correspond;
Task allocating module is used for from the task queue successively selecting video task, and the video task of selection is parallel Ground is distributed to each subtitle processing apparatus being in idle condition;The video task is used to indicate each subtitle processing apparatus and generates The first caption content corresponding with video clip corresponding to the video task.
9. device according to claim 8, which is characterized in that the subtitle processing apparatus includes subtitle processing server;
The task allocating module is also used to when there is no the subtitle processing server being in idle condition, then by the task Video task in queue is concurrently distributed to the user terminal being in idle condition;The video task is used to indicate described User terminal generates the first caption content corresponding with video clip corresponding to the video task.
10. device according to claim 9, which is characterized in that the task allocating module is also used to as each video task Add timestamp;The timestamp is used to record the addition time of corresponding video task;When the addition of timestamp record Between when being greater than preset threshold with the gap of current time, then determine that there is no the subtitle processing servers being in idle condition.
11. device according to any one of claims 7 to 10, which is characterized in that the de-redundancy module includes:
Subtitle splicing module, for obtaining the first caption content of each subtitle processing apparatus feedback;By each first word According to corresponding video segment, the sequencing of present position in original video is spliced curtain content;
De-redundancy coding module obtains term vector sequence for each word in spliced caption content to be converted to term vector; De-redundancy is carried out to the term vector sequence to encode to obtain semantic vector;
Decoder module generates the second caption content corresponding with the original video for decoding the semantic vector.
12. a kind of computer equipment, which is characterized in that including memory and processor, be stored with computer in the memory Program, when the computer program is executed by the processor, so that the processor perform claim requires any one of 1 to 6 The step of the method.
13. a kind of computer readable storage medium, which is characterized in that be stored with computer on the computer readable storage medium Program, when the computer program is executed by processor, so that the processor perform claim requires described in any one of 1 to 6 The step of method.
CN201810777015.7A 2018-07-16 2018-07-16 Caption content generation method, device, computer equipment and storage medium Active CN109040779B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810777015.7A CN109040779B (en) 2018-07-16 2018-07-16 Caption content generation method, device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810777015.7A CN109040779B (en) 2018-07-16 2018-07-16 Caption content generation method, device, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN109040779A CN109040779A (en) 2018-12-18
CN109040779B true CN109040779B (en) 2019-11-26

Family

ID=64642596

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810777015.7A Active CN109040779B (en) 2018-07-16 2018-07-16 Caption content generation method, device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN109040779B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110246160B (en) * 2019-06-20 2022-12-06 腾讯科技(深圳)有限公司 Video target detection method, device, equipment and medium
CN110611840B (en) * 2019-09-03 2021-11-09 北京奇艺世纪科技有限公司 Video generation method and device, electronic equipment and storage medium
CN110602566B (en) 2019-09-06 2021-10-01 Oppo广东移动通信有限公司 Matching method, terminal and readable storage medium
CN111221657B (en) * 2020-01-14 2023-04-07 新华智云科技有限公司 Efficient video distributed scheduling synthesis method
CN112261314B (en) * 2020-09-24 2023-09-15 北京美摄网络科技有限公司 Video description data generation system, method, storage medium and equipment
CN112995532B (en) * 2021-02-03 2023-06-13 上海哔哩哔哩科技有限公司 Video processing method and device
CN113315931B (en) * 2021-07-06 2022-03-11 伟乐视讯科技股份有限公司 HLS stream-based data processing method and electronic equipment
CN113891113B (en) * 2021-09-29 2024-03-12 阿里巴巴(中国)有限公司 Video clip synthesis method and electronic equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103561217A (en) * 2013-10-14 2014-02-05 深圳创维数字技术股份有限公司 Method and terminal for generating captions
CN103761261A (en) * 2013-12-31 2014-04-30 北京紫冬锐意语音科技有限公司 Voice recognition based media search method and device
CN106162323A (en) * 2015-03-26 2016-11-23 无锡天脉聚源传媒科技有限公司 A kind of video data handling procedure and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5116664B2 (en) * 2005-04-26 2013-01-09 トムソン ライセンシング Synchronized stream packing

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103561217A (en) * 2013-10-14 2014-02-05 深圳创维数字技术股份有限公司 Method and terminal for generating captions
CN103761261A (en) * 2013-12-31 2014-04-30 北京紫冬锐意语音科技有限公司 Voice recognition based media search method and device
CN106162323A (en) * 2015-03-26 2016-11-23 无锡天脉聚源传媒科技有限公司 A kind of video data handling procedure and device

Also Published As

Publication number Publication date
CN109040779A (en) 2018-12-18

Similar Documents

Publication Publication Date Title
CN109040779B (en) Caption content generation method, device, computer equipment and storage medium
CN109803180B (en) Video preview generation method and device, computer equipment and storage medium
WO2022184117A1 (en) Deep learning-based video clipping method, related device, and storage medium
CN112740709A (en) Gated model for video analysis
Mitra et al. A machine learning based approach for deepfake detection in social media through key video frame extraction
CN104735468B (en) A kind of method and system that image is synthesized to new video based on semantic analysis
JP2022537170A (en) Cognitive video and voice search aggregation
CN111708941A (en) Content recommendation method and device, computer equipment and storage medium
CN110619284B (en) Video scene division method, device, equipment and medium
CN111368141B (en) Video tag expansion method, device, computer equipment and storage medium
US11037604B2 (en) Method for video investigation
US20180143741A1 (en) Intelligent graphical feature generation for user content
KR20090093904A (en) Apparatus and method for scene variation robust multimedia image analysis, and system for multimedia editing based on objects
CN113515997B (en) Video data processing method and device and readable storage medium
CN114339362B (en) Video bullet screen matching method, device, computer equipment and storage medium
JP2020528680A (en) Modify digital video content
KR20230021144A (en) Machine learning-based image compression settings reflecting user preferences
Guerrini et al. Interactive film recombination
CN113992973A (en) Video abstract generation method and device, electronic equipment and storage medium
CN107369450A (en) Recording method and collection device
Saravanan Segment based indexing technique for video data file
Wu et al. Cold start problem for automated live video comments
CN114422745A (en) Method and device for rapidly arranging conference summary of audio and video conference and computer equipment
Wu et al. Knowing where and what to write in automated live video comments: A unified multi-task approach
CN113762056A (en) Singing video recognition method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant