WO2018177139A1 - Method and apparatus for generating video abstract, server and storage medium - Google Patents

Method and apparatus for generating video abstract, server and storage medium Download PDF

Info

Publication number
WO2018177139A1
WO2018177139A1 PCT/CN2018/079246 CN2018079246W WO2018177139A1 WO 2018177139 A1 WO2018177139 A1 WO 2018177139A1 CN 2018079246 W CN2018079246 W CN 2018079246W WO 2018177139 A1 WO2018177139 A1 WO 2018177139A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
target
sub
frames
frame
Prior art date
Application number
PCT/CN2018/079246
Other languages
French (fr)
Chinese (zh)
Inventor
曾佩玲
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2018177139A1 publication Critical patent/WO2018177139A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/8549Creating video summaries, e.g. movie trailer
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/488Data services, e.g. news ticker
    • H04N21/4884Data services, e.g. news ticker for displaying subtitles

Abstract

Disclosed are a method and apparatus for generating a video abstract, a server and a storage medium, which are used for automatically generating different video abstracts for different users, increasing the viewing amount of a video, providing effective information for more users, and improving the efficiency of video abstract generation. The method comprises: segmenting a target video into several video frames; according to a user characteristic, determining a corresponding N target frames from the several video frames, N being an integer greater than 1; extracting a subtitle in the N target frames; and generating a target video abstract according to the subtitle.

Description

一种视频摘要生成方法、装置、服务器及存储介质Video summary generation method, device, server and storage medium
本申请要求于2017年03月28日提交中国国家知识产权局、申请号为201710192629.4、发明名称为“一种视频摘要生成方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。The present application claims priority to Chinese Patent Application No. 200910192629.4, entitled "A Video Abstract Generation Method and Apparatus", filed on March 28, 2017, the entire contents of which are incorporated by reference. In this application.
技术领域Technical field
本发明实施例涉及计算机应用领域,尤其涉及一种视频摘要生成方法、装置、服务器及存储介质。The embodiments of the present invention relate to the field of computer applications, and in particular, to a video summary generation method, apparatus, server, and storage medium.
背景技术Background technique
用户点击网址进入视频网站或打开视频网站的应用程序(APP,Application)时,会在视频网站中看到与视频相关的文字介绍,其主要作用是对视频的重点内容进行描述,以吸引用户浏览视频,这类文字介绍称为视频摘要。视频摘要的描述对视频的浏览量有着重要影响,那么如何制作效果更佳的视频摘要,是视频网站或视频生产者需要关注的问题。When the user clicks on the URL to enter the video website or opens the application (APP, Application) of the video website, the video related text description will be displayed on the video website, and its main function is to describe the key content of the video to attract the user to browse. Video, this type of text is called a video summary. The description of the video summary has a significant impact on the number of page views, so how to create a better-performing video summary is a concern for video sites or video producers.
目前,视频摘要都是通过人工制作而成的,即工作人员撰写对视频的描述,撰写完成后将描述作为视频摘要投放在对应的网站上,供用户浏览。At present, the video summary is manually created, that is, the staff writes a description of the video, and after the completion of the writing, the description is displayed as a video summary on the corresponding website for the user to browse.
由于是人工制作,所以制作出来的视频摘要只能针对视频本身,每个用户看到的视频摘要都是一样的,但不同的用户有不同的喜好,针对同一个视频,不同的用户想要获取的有效信息是不相同的,而人工制作出来的视频摘要的针对性较差,无法针对每个用户提供与视频相关的有效信息。另外像一些连载中的电视剧,每天都会有更新的剧集,如果要随剧情更新每集电视剧的视频摘要,就需要大量的人力。Because it is artificially produced, the video summary produced can only be directed to the video itself. The video summary seen by each user is the same, but different users have different preferences. For the same video, different users want to obtain The effective information is not the same, and the manually produced video summaries are less targeted and cannot provide effective information related to the video for each user. In addition, like some serial TV series, there will be updated episodes every day. If you want to update the video summary of each episode with the plot, you need a lot of manpower.
发明内容Summary of the invention
本发明实施例提供了一种视频摘要生成方法、装置、服务器及存储介质,用于针对不同的用户自动生成不同的视频摘要,提升视频的浏览量,为更多用户提供有效信息,并提高了视频摘要生成的效率。The embodiment of the invention provides a method, a device, a server and a storage medium for generating a video summary, which are used to automatically generate different video summaries for different users, improve the browsing amount of videos, provide effective information for more users, and improve the number of users. The efficiency of video summary generation.
有鉴于此,本发明实施例的一方面提供了一种视频摘要生成方法,用于服 务器中,所述方法包括:In view of this, an aspect of the embodiments of the present invention provides a video summary generating method, which is used in a server, where the method includes:
将目标视频分割成若干个视频帧;Segmenting the target video into a number of video frames;
根据用户特征,从所述若干个视频帧中确定用户对应的N个目标帧,所述N为大于1的整数;Determining N target frames corresponding to the user from the plurality of video frames according to user characteristics, where N is an integer greater than 1;
提取所述N个目标帧中的字幕;Extracting subtitles in the N target frames;
根据所述字幕生成目标视频摘要。Generating a target video summary based on the subtitles.
本发明实施例的一方面提供了一种视频摘要生成装置,所述装置包括:An aspect of an embodiment of the present invention provides a video summary generating apparatus, where the apparatus includes:
分割模块,用于将目标视频分割成若干个视频帧;a segmentation module, configured to divide the target video into a plurality of video frames;
第一确定模块,用于根据用户特征,从所述若干个视频帧中确定用户对应的N个目标帧,所述N为大于1的整数;a first determining module, configured to determine, according to user characteristics, N target frames corresponding to the user from the plurality of video frames, where N is an integer greater than 1;
提取模块,用于提取所述N个目标帧中的字幕;An extracting module, configured to extract subtitles in the N target frames;
生成模块,用于根据所述字幕生成目标视频摘要。And a generating module, configured to generate a target video summary according to the subtitle.
本发明实施例的一方面提供了一种服务器,所述服务器包括:An aspect of an embodiment of the present invention provides a server, where the server includes:
一个或多个处理器;和,One or more processors; and,
存储器;Memory
所述存储器存储有一个或多个程序,所述一个或多个程序被配置成由所述一个或多个处理器执行,所述一个或多个程序包含用于执行以下操作的指令:The memory stores one or more programs, the one or more programs being configured to be executed by the one or more processors, the one or more programs including instructions for performing the following operations:
将目标视频分割成若干个视频帧;Segmenting the target video into a number of video frames;
根据用户特征,从所述若干个视频帧中确定用户对应的N个目标帧,所述N为大于1的整数;Determining N target frames corresponding to the user from the plurality of video frames according to user characteristics, where N is an integer greater than 1;
提取所述N个目标帧中的字幕;Extracting subtitles in the N target frames;
根据所述字幕生成目标视频摘要。Generating a target video summary based on the subtitles.
本发明实施例的一方面提供了一种计算机可读存储介质,所述存储介质中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由所述处理器加载并执行以实现如上所述的一种视频摘要生成方法。An aspect of an embodiment of the present invention provides a computer readable storage medium, where the storage medium stores at least one instruction, at least one program, a code set, or a set of instructions, the at least one instruction, the at least one program, The code set or set of instructions is loaded and executed by the processor to implement a video summary generation method as described above.
从以上技术方案可以看出,本发明实施例具有以下优点:It can be seen from the above technical solutions that the embodiments of the present invention have the following advantages:
本发明实施例可以将目标视频分割成若干个视频帧,根据用户特征确定用户对应的N个目标帧,提取该N个目标帧中的字幕,并根据提取的字幕生成该用户的目标视频摘要。可见,本方案能够自动生成视频摘要,并且能够依据用户特征向不同的用户展示不同的视频摘要,更具有针对性,能够提升视频的 浏览量,为更多用户提供有效信息,并提高了视频摘要生成的效率。The embodiment of the present invention may divide the target video into a plurality of video frames, determine N target frames corresponding to the user according to the user characteristics, extract subtitles in the N target frames, and generate a target video summary of the user according to the extracted subtitles. It can be seen that the solution can automatically generate a video summary, and can display different video summaries to different users according to user characteristics, which is more targeted, can improve the video browsing amount, provide effective information for more users, and improve the video summary. The efficiency of the generation.
附图说明DRAWINGS
为了更清楚地说明本发明实施例的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例。In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the accompanying drawings which are used in the description of the embodiments will be briefly described. It is obvious that the drawings in the following description are only some embodiments of the present invention.
图1是本发明实施例中视频摘要生成系统的一个实施例示意图;1 is a schematic diagram of an embodiment of a video summary generating system in an embodiment of the present invention;
图2是本发明实施例中视频摘要生成方法的一个实施例流程图;2 is a flowchart of an embodiment of a video summary generating method in an embodiment of the present invention;
图3是本发明实施例中视频摘要生成方法的另一实施例流程图;3 is a flowchart of another embodiment of a video summary generating method in an embodiment of the present invention;
图4是本发明实施例中视频摘要生成装置的一个实施例示意图;4 is a schematic diagram of an embodiment of a video summary generating apparatus in an embodiment of the present invention;
图5是本发明实施例中视频摘要生成装置的另一实施例示意图;FIG. 5 is a schematic diagram of another embodiment of a video summary generating apparatus according to an embodiment of the present invention; FIG.
图6是本发明实施例中视频摘要生成装置的另一实施例示意图;FIG. 6 is a schematic diagram of another embodiment of a video summary generating apparatus according to an embodiment of the present invention; FIG.
图7是本发明实施例中视频摘要生成装置的另一实施例示意图;FIG. 7 is a schematic diagram of another embodiment of a video summary generating apparatus according to an embodiment of the present invention; FIG.
图8是本发明实施例中视频摘要生成装置的另一实施例示意图;FIG. 8 is a schematic diagram of another embodiment of a video summary generating apparatus according to an embodiment of the present invention; FIG.
图9是本发明实施例中视频摘要生成装置的另一实施例示意图;FIG. 9 is a schematic diagram of another embodiment of a video summary generating apparatus according to an embodiment of the present invention; FIG.
图10是本发明实施例中视频摘要生成装置的另一实施例示意图。FIG. 10 is a schematic diagram of another embodiment of a video summary generating apparatus according to an embodiment of the present invention.
具体实施方式detailed description
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is obvious that the described embodiments are only a part of the embodiments of the present invention, but not all embodiments.
本发明实施例的说明书和权利要求书及上述附图中的术语“第一”、“第二”、“第三”“第四”等(如果存在)是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本发明的实施例例如能够以除了在这里图示或描述的那些以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。The terms "first", "second", "third", "fourth", etc. (if present) in the specification and claims of the embodiments of the invention and the above figures are used to distinguish similar objects without Used to describe a specific order or order. It is to be understood that the data so used may be interchanged as appropriate, such that the embodiments of the invention described herein can be implemented, for example, in a sequence other than those illustrated or described herein. In addition, the terms "comprises" and "comprises" and "the" and "the" are intended to cover a non-exclusive inclusion, for example, a process, method, system, product, or device that comprises a series of steps or units is not necessarily limited to Those steps or units may include other steps or units not explicitly listed or inherent to such processes, methods, products or devices.
本发明实施例提供了一种视频摘要生成方法、装置、服务器及存储介质,用于针对每个用户自动生成不同的视频摘要,提升视频的浏览量,为更多用户 提供有效信息,并提高了视频摘要生成的效率。The embodiment of the invention provides a method, a device, a server and a storage medium for generating a video summary, which are used to automatically generate different video summaries for each user, improve the browsing amount of the video, provide effective information for more users, and improve the number of users. The efficiency of video summary generation.
为了便于理解本发明实施例,下面对本发明实施例所适用的场景进行简单介绍,参见图1,其示出了本发明实施例提供的一种视频摘要生成方法、装置、服务器及存储介质所适用的一种系统组成结构示意图。In order to facilitate the understanding of the embodiments of the present invention, the following is a brief description of the applicable scenarios of the embodiments of the present invention. Referring to FIG. 1 , a video summary generation method, apparatus, server, and storage medium are provided. A schematic diagram of a system composition.
如图1所示,该系统可以包括由至少一台服务器101组成的服务系统,以及多台终端102。其中,服务系统中的服务器101中可以存储用于生成视频摘要的数据,并将生成的视频摘要传输给终端102。终端102可以用于向服务器101上传需要生成视频摘要的目标视频数据,展现服务器101返回的视频摘要。应理解,终端102不限于图1所示的个人计算机(PC,Personal Computer),还可以是手机、平板电脑等其他能够获取并展示视频摘要的设备。As shown in FIG. 1, the system may include a service system composed of at least one server 101, and a plurality of terminals 102. The server 101 in the service system may store data for generating a video summary, and transmit the generated video summary to the terminal 102. The terminal 102 can be configured to upload the target video data that needs to generate a video summary to the server 101, and display the video summary returned by the server 101. It should be understood that the terminal 102 is not limited to the personal computer (PC, Personal Computer) shown in FIG. 1 , and may be another device capable of acquiring and displaying a video summary, such as a mobile phone or a tablet computer.
如,用户可以通过终端102向服务器101上传目标视频,服务器101通过本发明实施例中的视频摘要生成方法,针对每个用户,生成该用户对应的视频摘要,并向各终端102返回与该终端102的用户匹配的视频摘要,终端102再将服务器返回的视频摘要呈现给用户。For example, the user can upload the target video to the server 101 through the terminal 102. The server 101 generates a video summary corresponding to the user for each user by using the video summary generating method in the embodiment of the present invention, and returns the terminal to the terminal 102. The user-matched video summary of 102, the terminal 102 then presents the video summary returned by the server to the user.
应理解,本发明实施例中的视频摘要生成方法、装置、服务器及存储介质除了适用于上述场景,还可以适用于其他场景,此处不作限定。为了便于理解本发明实施例,下面对本发明实施例中的一些术语进行介绍:It should be understood that the video digest generating method, the device, the server, and the storage medium in the embodiments of the present invention are applicable to other scenarios, and are not limited thereto. In order to facilitate the understanding of the embodiments of the present invention, some terms in the embodiments of the present invention are introduced below:
视频帧,就是影像动画中最小单位的单幅影像画面。一帧就是一幅静止的画面,连续的帧就形成影像动画,如电视等。影像动画中,每一帧都是静止的图像快速连续地显示帧便形成了运动的假象。A video frame is a single image of the smallest unit in an image animation. A frame is a still picture, and continuous frames form an image animation, such as a TV. In an image animation, each frame is a still image. The frame is displayed continuously and continuously to form an illusion of motion.
关键帧,要表现任何影像动画的运动或变化,至少要给出前后两个不同的关键状态,而位于这两个关键状态之间的中间状态的变化和衔接可以由电脑自动完成,在Flash中,表示关键状态的帧叫做关键帧。Key frames, to represent the movement or change of any image animation, must at least give two different key states before and after, and the change and connection between the intermediate states between the two key states can be automatically completed by the computer, in Flash. , the frame representing the critical state is called a key frame.
镜头数据,指的是摄像机一次连续拍摄的一段视频数据,它是视频结构化的基础物理单元。Lens data refers to a piece of video data captured by the camera at one time. It is the basic physical unit of video structuring.
K均值聚类,是很典型的基于距离的聚类算法,采用距离作为相似性的评价指标,即认为两个对象的距离越近,其相似度就越大。该算法认为簇是由距离靠近的对象组成的,因此把得到紧凑且独立的簇作为最终目标。该算法的原理是,输入聚类个数k,以及包含n个数据对象的数据库,最终会输出满足方差最小的标准的k个聚类。k个聚类具有以下特点:各聚类本身尽可能的紧凑,而各聚类之间尽可能的分开。处理过程如下:首先从n个数据对象任意选择k 个对象作为初始聚类中心;而对于所剩下其它对象,则根据它们与这些聚类中心的相似度(距离),分别将它们分配给与其最相似的(聚类中心所代表的)聚类,得到每个新聚类;然后再计算每个新聚类的聚类中心(该聚类中所有对象的均值);不断重复这一过程直到标准测度函数开始收敛为止。一般都采用均方差作为标准测度函数。K-means clustering is a typical distance-based clustering algorithm. Distance is used as the evaluation index of similarity, that is, the closer the distance between two objects is, the greater the similarity is. The algorithm considers clusters to be composed of objects that are close together, thus making compact and independent clusters the ultimate goal. The principle of the algorithm is to input the number of clusters k and the database containing n data objects, and finally output the k clusters that meet the standard of the smallest variance. The k clusters have the following characteristics: each cluster itself is as compact as possible, and each cluster is separated as much as possible. The process is as follows: firstly, k objects are arbitrarily selected from n data objects as the initial cluster center; and for other objects remaining, according to their similarity (distance) with these cluster centers, they are respectively assigned to The most similar clusters (represented by cluster centers) get each new cluster; then calculate the cluster center of each new cluster (the mean of all objects in the cluster); repeat this process until The standard measure function begins to converge. The mean square error is generally used as a standard measure function.
应理解,本发明实施例中的视频摘要生成方法、装置、服务器及存储介质除了适用于上述提到的视频摘要制作,还可以适用于电影海报中文字部分的制作等其他与视频相关的文字介绍,此处不作限定。It should be understood that the video summary generation method, apparatus, server, and storage medium in the embodiments of the present invention are applicable to the video summary production mentioned above, and can also be applied to other video-related text introductions such as the creation of the text portion of the movie poster. This is not limited here.
基于上述背景,下面先介绍本发明实施例中的视频摘要生成方法,请参阅图2,本发明实施例中视频摘要生成方法的一个实施例包括:Based on the foregoing background, the video summary generation method in the embodiment of the present invention is first introduced. Referring to FIG. 2, an embodiment of the video summary generation method in the embodiment of the present invention includes:
201、将目标视频分割成若干个视频帧;201. Split the target video into several video frames.
当用户需要制作目标视频的视频摘要时,首先向视频摘要生成装置输入该目标视频,视频摘要生成装置获取该目标视频,并将该目标视频分割成若干个视频帧。视频摘要生成装置可以位于图1所示的服务器101中。目标视频可以是一个或多个视频序列,比如一部电影,某电视剧的其中几集或其他视频,此处不作限定。When the user needs to create a video summary of the target video, the target video is first input to the video summary generating device, and the video summary generating device acquires the target video and divides the target video into a plurality of video frames. The video summary generating means can be located in the server 101 shown in FIG. The target video may be one or more video sequences, such as a movie, a few episodes of a TV series, or other videos, which are not limited herein.
202、根据用户特征,从若干个视频帧中确定用户对应的N个目标帧;202. Determine, according to user characteristics, N target frames corresponding to the user from the plurality of video frames.
视频摘要生成装置将目标视频分割成若干个视频帧后,根据用户特征确定用户对应的N个目标帧,其中,目标帧是从目标视频的若干个视频帧中选择的,即,视频摘要生成装置根据用户特征,从若干个视频帧中选择用户对应的N个目标帧。目标帧的数量N为大于1的整数,N的数值可由用户或系统设定,此处不作限定。After the video summary generating device divides the target video into a plurality of video frames, determining N target frames corresponding to the user according to the user characteristics, wherein the target frame is selected from several video frames of the target video, that is, the video summary generating device According to the user characteristics, the N target frames corresponding to the user are selected from a plurality of video frames. The number of the target frames N is an integer greater than 1, and the value of N can be set by the user or the system, which is not limited herein.
203、提取该N个目标帧中的字幕;203. Extract subtitles in the N target frames.
视频摘要生成装置确定用户对应的N个目标帧后,提取该用户对应的N个目标帧中的字幕。应理解,字幕指的是以文字形式表示电视剧、电影等影视作品里面的对话、动作等非影像内容,也泛指影视作品后期加工的文字。除了文字,字幕还可以包括符号、表情等等,此处不作限定。After determining the N target frames corresponding to the user, the video summary generating device extracts the subtitles in the N target frames corresponding to the user. It should be understood that subtitle refers to the non-image content such as dialogues and actions in TV dramas, movies and other film and television works in the form of words, and also refers to the texts processed in the post-production of film and television works. In addition to the text, the subtitles may also include symbols, expressions, and the like, which are not limited herein.
204、根据提取的字幕生成目标视频摘要。204. Generate a target video summary according to the extracted subtitles.
视频摘要生成装置提取了N个目标帧中的字幕后,会根据这些提取的字幕生成目标视频摘要。应理解,目标视频摘要指的是目标视频的视频摘要,用于向用户描述目标视频的内容。应理解,根据字幕生成的目标视频摘要应当符合 自然语言的要求,由一个或多个完整的句子所组成。After the video summary generating device extracts the subtitles in the N target frames, the target video digest is generated based on the extracted subtitles. It should be understood that the target video summary refers to a video summary of the target video for describing the content of the target video to the user. It should be understood that the target video summary generated from the subtitles should conform to the requirements of natural language and consist of one or more complete sentences.
需要说明的是,本发明实施例以针对一个用户生成视频摘要为例进行说明的,当需要针对多个用户生成视频摘要时,可以针对每个用户执行步骤201-204。It should be noted that the embodiment of the present invention is described by taking a video summary for one user as an example. When it is required to generate a video summary for multiple users, steps 201-204 may be performed for each user.
不管针对哪个用户生成视频摘要,第一个步骤都是将目标视频分割成若干个视频帧,而针对不同用户分割成的若干个视频帧相同,所以,分割成的若干个视频帧可以被复用。即,如果是针对第一个用户生成视频摘要,则需要执行步骤201-204;如果是针对后续的用户生成视频摘要,可以读取已经分割好的若干个视频帧,再执行步骤202-204。Regardless of which user generates a video digest, the first step is to split the target video into several video frames, and the several video frames divided for different users are the same, so the divided video frames can be multiplexed. . That is, if the video summary is generated for the first user, steps 201-204 need to be performed; if the video summary is generated for subsequent users, several video frames that have been divided can be read, and then steps 202-204 are performed.
本发明实施例可以将目标视频分割成若干个视频帧,根据用户特征确定每个用户对应的N个目标帧,提取该N个目标帧中的字幕,并根据提取的字幕生成该用户的目标视频摘要。可见,本方案能够自动生成视频摘要,并且能够依据用户特征向不同的用户展示不同的视频摘要,更具有针对性,能够提升视频的浏览量,为更多用户提供有效信息,并提高了视频摘要生成的效率。The embodiment of the present invention may divide the target video into a plurality of video frames, determine N target frames corresponding to each user according to the user characteristics, extract subtitles in the N target frames, and generate a target video of the user according to the extracted subtitles. Summary. It can be seen that the solution can automatically generate a video summary, and can display different video summaries to different users according to user characteristics, which is more targeted, can improve the video browsing amount, provide effective information for more users, and improve the video summary. The efficiency of the generation.
基于上述图2对应的实施例,目标视频可以通过多种方式分割成视频帧,基于不同的分割方式,确定目标帧的方式也不同,下面以其中一种方式为例对本发明实施例中的视频摘要生成方法进行详细说明,请参阅图3,本发明实施例中视频摘要生成方法的另一实施例包括:Based on the embodiment corresponding to FIG. 2, the target video can be divided into video frames in a plurality of manners, and the manner of determining the target frame is different according to different manners, and the video in the embodiment of the present invention is taken as an example. For a detailed description of the method for generating a summary, please refer to FIG. 3, another embodiment of the method for generating a video summary in the embodiment of the present invention includes:
301、将目标视频分割成若干个镜头数据;301. Divide the target video into a plurality of lens data;
当用户需要制作目标视频的视频摘要时,首先向视频摘要生成装置输入该目标视频,视频摘要生成装置获取该目标视频,将该目标视频分割成若干个镜头数据,比如,可以根据颜色空间的距离或其他参数进行分割,此处不作限定。视频摘要生成装置可以位于图1所示的服务器101中。目标视频可以是一个或多个视频序列,比如一部电影,某电视剧的其中几集或其他视频,此处不作限定。When the user needs to create a video summary of the target video, the target video is first input to the video summary generating device, and the video summary generating device acquires the target video, and divides the target video into a plurality of lens data, for example, according to the distance of the color space. Or other parameters are divided, which is not limited here. The video summary generating means can be located in the server 101 shown in FIG. The target video may be one or more video sequences, such as a movie, a few episodes of a TV series, or other videos, which are not limited herein.
302、将每个镜头数据分割成若干个子镜头数据;302. Divide each lens data into a plurality of sub-shot data;
将目标视频分割成若干个镜头数据后,还将每个镜头数据分割成子镜头数据,比如,可以根据相机运动方向等其他参数进行分割,此处不作限定。After the target video is segmented into a plurality of lens data, each lens data is also divided into sub-lens data. For example, the segmentation may be performed according to other parameters such as the camera motion direction, which is not limited herein.
303、将每个子镜头数据分割成若干个视频帧;303. Divide each sub-lens data into a plurality of video frames.
视频摘要生成装置将每个镜头数据分割成若干个子镜头数据后,还将每个子镜头数据分割成若干个视频帧。The video summary generating means divides each shot data into a plurality of sub-shot data, and also divides each sub-lens data into a plurality of video frames.
304、根据用户特征,从若干个视频帧中确定用户对应的L个子镜头数据;304. Determine, according to user characteristics, L sub-shot data corresponding to the user from the plurality of video frames;
视频摘要生成装置将每个镜头数据分割成若干个子镜头数据后,根据用户特征确定用户对应的L个子镜头数据,即,视频摘要生成装置根据用户特征,从若干个视频帧中选择用户对应的L个子镜头数据,L为等于或大于1的整数。After the video summary generating device divides each lens data into a plurality of sub-lens data, the L sub-shot data corresponding to the user is determined according to the user feature, that is, the video summary generating device selects the L corresponding to the user from the plurality of video frames according to the user feature. Sub-shot data, L is an integer equal to or greater than one.
本实施例中,视频摘要生成装置可以确定目标视频对应的子镜头数据中,包含该用户对应的标签信息的目标子镜头数据,再确定这些目标子镜头数据中,预设子镜头权重排名前L的子镜头数据。In this embodiment, the video summary generating device may determine, in the sub-shot data corresponding to the target video, target sub-lens data including the tag information corresponding to the user, and determine, in the target sub-shot data, the preset sub-lens weights before the ranking L. Sub-shot data.
需要说明的是,本发明实施例中的子镜头权重可以通过如下方式确定:视频摘要生成装置将每个子镜头数据分割成若干个视频帧后,根据子镜头对应的持续时间长度,即将子镜头包含的视频帧的数量作为该子镜头权重的数值。除了根据视频帧的数量,也可以根据子镜头包含的视频帧权重来确定子镜头权重,还可以根据其他参数确定,此处不作限定。It should be noted that the sub-lens weights in the embodiment of the present invention may be determined by the video summary generating device dividing each sub-lens data into a plurality of video frames, and according to the duration length of the sub-shots, the sub-shots are included. The number of video frames is used as the value of the weight of the sub-lens. In addition to the number of video frames, the weight of the sub-lens may be determined according to the weight of the video frame included in the sub-lens, and may be determined according to other parameters, which is not limited herein.
还需要说明的是,本发明实施例中用户对应的标签信息可以是用户标签中的演员名字,可以是用户标签中的导演名字,可以是用户标签中的电影类型,还可以是用户标签中的其他信息,此处不作限定。It should be noted that the tag information corresponding to the user in the embodiment of the present invention may be the name of the actor in the user tag, may be the name of the director in the user tag, may be the type of the movie in the user tag, or may be in the user tag. Other information is not limited here.
应理解,如果用户没有对应的标签信息,视频摘要生成装置可以直接将子镜头权重排名前L的子镜头数据作为该用户对应的L个子镜头数据。其中,子镜头权重排名前L的子镜头数据可以通过以下方式确定:视频摘要生成装置按照子镜头权重从大到小的顺序对所有的子镜头数据进行排序,从排序后的子镜头数据中选择排名在前L的子镜头数据,将选出的L个子镜头数据作为用户对应的L个子镜头数据。It should be understood that if the user does not have corresponding label information, the video summary generating apparatus may directly use the sub-lens data of the top L-weight of the sub-lens weight as the L sub-shot data corresponding to the user. The sub-lens data of the top L of the sub-lens weight ranking may be determined by the following method: the video summary generating device sorts all the sub-shot data according to the sub-lens weights in descending order, and selects the sorted sub-shot data from the sorted sub-lens data. The sub-lens data ranked in the top L, and the selected L sub-lens data are used as L sub-shot data corresponding to the user.
如果目标子镜头数据的数量M小于L,那么视频摘要生成装置选择所有目标子镜头数据后,剩余的L-M个目标子镜头数据再按照子镜头权重从目标视频对应的子镜头数据中选取。即,如果包含用户对应的标签信息的目标子镜头数据的数量M小于L,那么视频摘要生成装置选择所有目标子镜头数据后,按照子镜头权重从大到小的顺序,对目标视频中未被选择的子镜头数据进行排序,从排序后的子镜头数据中选取剩余的L-M个目标子镜头数据。If the number M of target sub-lens data is less than L, after the video digest generating device selects all the target sub-shot data, the remaining L-M target sub-shot data are further selected from the sub-shot data corresponding to the target video according to the sub-shot weight. That is, if the number M of target sub-lens data including the tag information corresponding to the user is less than L, the video digest generating device selects all the target sub-lens data, and the sub-lens weights are not in the order of the target video. The selected sub-shot data is sorted, and the remaining LM target sub-shot data are selected from the sorted sub-shot data.
应理解,除了用户对应的标签信息,视频摘要生成装置还可以根据用户观看过的视频信息,用户收藏过的视频信息,用户搜索过的关键词等用户特征确定目标子镜头数据,此处不作限定。It should be understood that, in addition to the tag information corresponding to the user, the video summary generating device may determine the target sub-lens data according to the video information viewed by the user, the video information collected by the user, and the keyword searched by the user, which is not limited herein. .
305、根据预设帧权重,确定L个子镜头数据中每个子镜头数据中的X个 目标帧;305. Determine, according to preset frame weights, X target frames in each sub-shot data of the L sub-shot data.
视频摘要生成装置确定用户对应的L个子镜头数据后,根据预设帧权重,确定这L个子镜头数据中每个子镜头数据中的X个目标帧。X为等于或大于1的整数,且X乘以L等于N。After determining the L sub-shot data corresponding to the user, the video summary generating device determines X target frames in each of the L sub-shot data according to the preset frame weight. X is an integer equal to or greater than 1, and X is multiplied by L equal to N.
应理解,帧权重是视频摘要生成装置将子镜头数据分割成若干个视频帧之后确定的,可以通过如下方式确定:针对每个子镜头数据,通过K均值聚类将该子镜头数据中的视频帧分成K类,将每类视频帧中离聚类中心最近的视频帧确定为该类视频帧的关键帧,根据帧参数确定每个关键帧的帧权重。其中,帧参数包括人脸占比,或相机运动方向,或相机焦距,或相机是否摇摆,或其他参数。It should be understood that the frame weight is determined after the video summary generating device divides the sub-lens data into several video frames, and may be determined by: for each sub-shot data, the video frames in the sub-shot data are clustered by K-means. Divided into class K, the video frames closest to the cluster center in each type of video frame are determined as key frames of the video frame, and the frame weight of each key frame is determined according to the frame parameters. Among them, the frame parameters include the proportion of the face, or the direction of camera movement, or the focal length of the camera, or whether the camera is rocking, or other parameters.
这里的每个子镜头数据可以是目标视频中所有的子镜头数据中的,也可以是为用户确定的L个子镜头数据中的,此处不作限定。Each of the sub-lens data herein may be in all of the sub-shot data in the target video, or may be in the L sub-shot data determined for the user, which is not limited herein.
相应地,根据上述方式确定帧权重后,视频摘要生成装置可以确定L个子镜头数据中每个子镜头数据包含的关键帧,再针对这L个子镜头数据,确定每个子镜头数据包含的关键帧中,帧权重最大的X个视频帧,这X个视频帧即为这个子镜头数据中的X个目标帧。Correspondingly, after the frame weight is determined according to the foregoing manner, the video summary generating apparatus may determine a key frame included in each of the L sub-shot data, and determine, in the L-sub-shot data, a key frame included in each sub-shot data, The X video frames with the largest frame weight, the X video frames are the X target frames in the sub-shot data.
除了上述方式,视频摘要生成装置还可以通过其他方式确定帧权重以及X个目标帧,此处不作限定。In addition to the above manner, the video summary generating apparatus may determine the frame weight and the X target frames by other means, which are not limited herein.
306、提取该N个目标帧中的字幕;306. Extract subtitles in the N target frames.
视频摘要生成装置确定用户对应的目标帧后,提取该用户对应的N个目标帧中的字幕。应理解,字幕指的是以文字形式表示电视剧、电影等影视作品里面的对话、动作等非影像内容,也泛指影视作品后期加工的文字。除了文字,字幕还可以包括符号、表情等等,此处不作限定。After determining the target frame corresponding to the user, the video summary generating device extracts the subtitles in the N target frames corresponding to the user. It should be understood that subtitle refers to the non-image content such as dialogues and actions in TV dramas, movies and other film and television works in the form of words, and also refers to the texts processed in the post-production of film and television works. In addition to the text, the subtitles may also include symbols, expressions, and the like, which are not limited herein.
视频摘要生成装置可以通过如下方式提取字幕:The video summary generating device can extract the subtitles as follows:
(1)针对N个目标帧中的每个目标帧,提取该目标帧中的所有字幕,即提取N个目标帧中的所有字幕。(1) For each of the N target frames, extract all the subtitles in the target frame, that is, extract all the subtitles in the N target frames.
(2)针对N个目标帧中的每个目标帧,提取该目标帧中的预置长度的字幕。应理解,预置长度由用户或视频摘要生成装置设定,预置长度可以是对文字数量的限定,也可以是对句子数量的限定,还可以是对段落的限定,比如说,预置长度可以是30个字,可以是3句话,可以是1段话,还可以是其他长度限定,此处不作限定。(2) For each of the N target frames, a preset length of the subtitle in the target frame is extracted. It should be understood that the preset length is set by the user or the video summary generating device, and the preset length may be a limitation on the number of characters, a limitation on the number of sentences, or a limitation on the paragraph, for example, a preset length. It can be 30 words, it can be 3 sentences, it can be 1 paragraph, or it can be other lengths, which is not limited here.
(3)针对N个目标帧中的每个目标帧,提取该目标帧中的前后一定长度的字幕。应理解,前后指的是字幕在目标帧中出现顺序的先后,一定长度即预先设定的长度,与上述预置长度类似,此处不再赘述。为了便于理解,下面举例进行说明:针对每个目标帧,提取该目标帧的字幕中的前三句话和后三句话。应理解,上述仅为示例,不构成对本发明实施例的限定。(3) For each of the N target frames, a caption of a certain length before and after the target frame is extracted. It should be understood that the front and rear refer to the order in which the subtitles appear in the target frame, and the length is a preset length, which is similar to the preset length, and will not be described here. For ease of understanding, the following description is given by way of example: for each target frame, the first three sentences and the last three sentences in the subtitles of the target frame are extracted. It should be understood that the above is only an example and does not constitute a limitation of the embodiments of the present invention.
还应理解,除了上述几种方式,还可以通过其他方式提取目标帧中的字幕,此处不作限定。It should also be understood that, in addition to the above manners, the subtitles in the target frame may be extracted by other means, which is not limited herein.
307、根据字幕生成目标视频摘要。307. Generate a target video summary according to the subtitle.
视频摘要生成装置提取了N个目标帧中的字幕后,会根据这些提取的字幕生成目标视频摘要。应理解,目标视频摘要指的是目标视频的视频摘要,用于向用户描述目标视频的内容。应理解,根据字幕生成的目标视频摘要应当符合自然语言的要求,由一个或多个完整的句子所组成。After the video summary generating device extracts the subtitles in the N target frames, the target video digest is generated based on the extracted subtitles. It should be understood that the target video summary refers to a video summary of the target video for describing the content of the target video to the user. It should be understood that the target video summary generated from the subtitles should conform to the requirements of natural language and consist of one or more complete sentences.
本实施例中,视频摘要生成装置可以通过如下方式生成目标视频摘要:In this embodiment, the video summary generating apparatus may generate the target video summary by:
提取字幕中的多个关键词,对提取的多个关键词进行组合,生成至少一个句子,将所组成的一个或多个句子即组成用户对应的目标视频摘要。应理解,关键词可以是字幕中出现频率大于预设值的词语,可以是字幕中词性为预设类型的词语,可以是字幕中与预先设定词语匹配的词语,还可以是通过其他方式确定的词语,此处不作限定。应理解,组合生成的句子应当满足自然语言要求,应当是一个完整的句子。The plurality of keywords in the subtitle are extracted, and the extracted plurality of keywords are combined to generate at least one sentence, and the composed one or more sentences constitute a target video summary corresponding to the user. It should be understood that the keyword may be a word whose frequency of occurrence in the subtitle is greater than a preset value, may be a word whose word form is a preset type in the subtitle, may be a word in the subtitle that matches a preset word, or may be determined by other means. The words are not limited here. It should be understood that the sentence generated by the combination should satisfy the natural language requirement and should be a complete sentence.
视频摘要生成装置还可以通过其他方式生成用户对应的目标视频摘要,此处不作限定。The video summary generating device may also generate a target video summary corresponding to the user by other means, which is not limited herein.
需要说明的是,本发明实施例以针对一个用户生成视频摘要为例进行说明的,当需要针对多个用户生成视频摘要时,可以针对每个用户执行步骤301-307。It should be noted that the embodiment of the present invention is described by taking a video summary for one user as an example. When it is required to generate a video summary for multiple users, steps 301-307 may be performed for each user.
不管针对哪个用户生成视频摘要,前三个步骤都是将目标视频分割成若干个视频帧,而针对不同用户分割成的若干个视频帧相同,所以,分割成的若干个视频帧可以被复用。即,如果是针对第一个用户生成视频摘要,则需要执行步骤301-307;如果是针对后续的用户生成视频摘要,可以读取已经分割好的若干个视频帧,再执行步骤304-307。Regardless of which user generates the video digest, the first three steps divide the target video into several video frames, and the several video frames divided for different users are the same, so the divided video frames can be multiplexed. . That is, if the video digest is generated for the first user, steps 301-307 are required; if the video digest is generated for subsequent users, a plurality of video frames that have been divided may be read, and then steps 304-307 are performed.
还应理解,本发明实施例中,视频摘要生成装置针对每个用户生成视频摘要后,还可以根据预设规则更新该视频摘要。预设规则指的是预先设定好的更 新规则,可以是时间周期,即定期更新该视频摘要,比如一周更新一次,一个月更新一次等,可以是达到触发条件,比如电视剧每更新一集,则更新一次视频摘要,还可以是其他规则,此处不作限定。It should also be understood that, in the embodiment of the present invention, after generating a video digest for each user, the video digest generating device may further update the video digest according to a preset rule. The preset rule refers to a preset update rule, which may be a time period, that is, the video summary is updated periodically, such as updating once a week, updating once a month, etc., and may be a trigger condition, such as an episode of each episode of the TV series. Then update the video summary, and it can be other rules, which is not limited here.
本发明实施例可以将目标视频分割成若干个视频帧,根据用户特征确定用户对应的N个目标帧,提取该N个目标帧中的字幕,并根据提取的字幕生成该用户的目标视频摘要。可见,本方案能够自动生成视频摘要,并且能够依据用户特征向不同的用户展示不同的视频摘要,更具有针对性,能够提升视频的浏览量,为更多用户提供有效信息,并提高了视频摘要生成的效率。The embodiment of the present invention may divide the target video into a plurality of video frames, determine N target frames corresponding to the user according to the user characteristics, extract subtitles in the N target frames, and generate a target video summary of the user according to the extracted subtitles. It can be seen that the solution can automatically generate a video summary, and can display different video summaries to different users according to user characteristics, which is more targeted, can improve the video browsing amount, provide effective information for more users, and improve the video summary. The efficiency of the generation.
其次,本发明实施例提供了一种将目标视频分割成若干视频帧的方式,提高了方案的可实现性。Secondly, the embodiment of the invention provides a method for dividing a target video into a plurality of video frames, which improves the achievability of the solution.
再次,本发明实施例提供了多种确定目标帧的方式,以及多种提取字幕和生成摘要的方式,提高了方案的灵活性性。The embodiment of the present invention provides a plurality of manners for determining a target frame, and various manners of extracting subtitles and generating a digest, thereby improving the flexibility of the solution.
进一步地,本发明实施例可以更新该视频摘要,进一步地提升了视频摘要的时效性。Further, the embodiment of the present invention may update the video summary to further improve the timeliness of the video summary.
为了便于理解,下面以一应用场景对本发明实施例中的视频摘要生成方法进行详细描述:For ease of understanding, the video summary generation method in the embodiment of the present invention is described in detail in an application scenario:
系统输入电视剧《小别离》的第1集和第2集两个视频(目标视频),视频摘要生成装置按照颜色空间距离将这两个视频分割成6个镜头数据,然后将这6个镜头数据根据摄像机的运动方向分割成24个子镜头数据,再将这24个子镜头数据分割成100个视频帧。The system inputs two videos (target video) of the first episode and the second episode of the TV series "Small Divor", and the video summary generating device divides the two videos into six lens data according to the color space distance, and then the six lens data. The 24 sub-lens data is divided into 24 sub-lens data according to the moving direction of the camera, and then the 24 sub-lens data is divided into 100 video frames.
将目标视频分割成100个视频帧之后,视频摘要生成装置将子镜头数据中包含的视频帧的数量作为子镜头数据的权重。同时,视频摘要生成装置针对每个子镜头数据,通过均值聚类将该子镜头数据中的视频帧分成3类,并将每类视频帧中离聚类中心最近的视频帧确定为该类视频帧的关键帧,即每个子镜头数据对应3个关键帧,再根据关键帧对应的图像中的人脸占比确定该关键帧对应的帧权重。After dividing the target video into 100 video frames, the video summary generating means uses the number of video frames included in the sub-lens data as the weight of the sub-lens data. At the same time, the video summary generating device divides the video frames in the sub-shot data into three categories by mean clustering for each sub-shot data, and determines the video frames closest to the cluster center in each type of video frames as the video frames. The key frame, that is, each sub-lens data corresponds to three key frames, and then the frame weight corresponding to the key frame is determined according to the proportion of the face in the image corresponding to the key frame.
现有A和B两个用户,其中用户A对应的标签信息为海清,用户B没有设置标签信息。则视频摘要生成装置确定目标视频中的24个子镜头数据中,包含的视频帧数量最多的前3个(L=3)子镜头数据,即子镜头权重排名前3的子镜头数据,作为用户B的子镜头数据,分别记为a,b,c。同时,视频摘要生成装置确定目标视频中包含有海清的子镜头数据,结果显示包含海清的子 镜头数据有15个(目标子镜头数据),然后视频摘要生成装置再确定这15个子镜头数据中视频帧数量最多的前3个(L=3)子镜头数据,即从这15个子镜头数据中选取子镜头权重排名前3的子镜头数据,这3个子镜头数据分别为b,c,d。There are two users A and B. The label information corresponding to user A is Haiqing, and user B does not set label information. The video summary generating device determines the first three (L=3) sub-shot data of the 24 sub-shot data in the target video, which is the top 3 sub-lens data of the sub-lens weight ranking, as user B. The sub-shot data is recorded as a, b, and c, respectively. At the same time, the video summary generating means determines that the target video contains sub-shot data of Haiqing, and the result shows that there are 15 sub-lens data including Haiqing (target sub-shot data), and then the video summary generating means determines the 15 sub-shot data. The first 3 (L=3) sub-shot data with the largest number of video frames, that is, the sub-lens data of the top 3 sub-lens weights are selected from the 15 sub-shot data, and the three sub-lens data are b, c, d respectively. .
确定A对应的3个子镜头数据(a,b,c)后,视频摘要生成装置确定a,b,c中的关键帧,然后根据上述确定的关键帧的帧权重,从a包含的3个关键帧中选取帧权重最大的关键帧a1(X=1),从b包含的3个关键帧中选取帧权重最大的1个关键帧b1,从c包含的3个关键帧中选取帧权重最大的1个关键帧c1,然后再将a1,b1和c1作为A对应的目标帧。After determining the three sub-shot data (a, b, c) corresponding to A, the video summary generating means determines the key frames in a, b, c, and then according to the frame weight of the determined key frame, the three keys included from a The key frame a1 (X=1) with the largest frame weight is selected in the frame, and one key frame b1 with the largest frame weight is selected from the three key frames included in b, and the frame weight is the largest among the three key frames included in c. 1 key frame c1, then a1, b1 and c1 are taken as the target frames corresponding to A.
确定A对应的3个目标帧后,视频摘要生成装置提取这3个目标帧中的所有字幕,其中a1对应的字幕为“爸爸,我的英语考试不及格”,“朵朵,怎么会不及格了呢,英语成绩不是一直都挺好的吗”,“妈妈知道了肯定要骂我,周日你能去开家长会吗”,“行,周日我会去开家长会的”。b1对应的字幕为:“英语成绩不及格还瞒着妈妈,还有没把妈妈放在眼里了”。c1对应的字幕为“朵朵,怎么可以不经过我同意就把狗带回来呢,家里不能养狗”,“我一直就想养狗,你就答应我嘛”。After determining the three target frames corresponding to A, the video summary generating device extracts all the subtitles in the three target frames, wherein the subtitle corresponding to a1 is “Dad, my English test fails”, “Flowing, how can fail” "Is it not always good at English?" "Mom knows that I must marry me. Can you go to the parent meeting on Sunday?" "Well, I will go to the parent meeting on Sunday." The subtitles corresponding to b1 are: "The English score failed to pass the mother, and the mother did not put it in the eyes." The subtitle corresponding to c1 is "Flowering, how can I bring the dog back without my consent? I can't raise a dog at home." "I always wanted to raise a dog, and you promised me."
视频摘要生成装置根据a1,b1和c1对应的字幕,提取关键词“朵朵”,“英语成绩”,“不及格”,“爸爸”,“去开家长会”,“想养狗”,“不经过同意”,“瞒着妈妈”,然后再对这些字幕进行组合,生成句子“朵朵英语成绩不及格,爸爸瞒着妈妈去开家长会。朵朵想要养狗”,上述句子即为A对应的视频摘要。The video summary generating device extracts keywords "flowering", "English grade", "fail", "dad", "go to the parent club", "want to raise a dog", according to the subtitles corresponding to a1, b1 and c1. Without consent, "mothering the mother", and then combining these subtitles, the sentence "following English scores failed, my father took the mother to open a parent meeting. The blossoming wants to raise a dog", the above sentence is A corresponding video summary.
确定B对应的3个子镜头数据(b,c,d)后,视频摘要生成装置确定b,c,d中的关键帧,然后根据上述确定的关键帧的帧权重,从b包含的3个关键帧中选取帧权重最大的关键帧b1,从c包含的3个关键帧中选取帧权重最大的1个关键帧c1,从d包含的3个关键帧中选取帧权重最大的1个关键帧d1,然后再将b1,c1和d1作为B对应的目标帧。确定B对应的3个目标帧后,视频摘要生成装置提取这3个目标帧中的所有字幕,b1和c1对应的字幕如上所述,d1对应的字幕为:“朵朵,妈妈给你请了英语家教,你要好好配合老师,才能提升你的英语成绩。”上述句子即为B对应的目标视频的视频摘要。After determining the three sub-shot data (b, c, d) corresponding to B, the video summary generating means determines the key frame in b, c, d, and then according to the frame weight of the determined key frame, the three keys included from b The key frame b1 with the largest frame weight is selected in the frame, and one key frame c1 with the largest frame weight is selected from the three key frames included in c, and one key frame d1 with the largest frame weight is selected from the three key frames included in d. Then, b1, c1, and d1 are taken as the target frames corresponding to B. After determining the three target frames corresponding to B, the video summary generating means extracts all the subtitles in the three target frames, and the subtitles corresponding to b1 and c1 are as described above, and the subtitle corresponding to d1 is: "Flower, mother invited you. English tutor, you have to cooperate with the teacher to improve your English score." The above sentence is the video summary of the target video corresponding to B.
视频摘要生成装置根据b1,c1和d1对应的字幕,提取关键词“朵朵”,“英语成绩”,“不及格”,“爸爸”,“去开家长会”,“瞒着”,“妈妈”,“请了英语家教”,“提升”然后再对这些字幕进行组合,生成句子“朵朵英语成绩不及格, 爸爸瞒着妈妈去开家长会。妈妈请英语家教,提升朵朵英语成绩”,上述句子即为B对应的视频摘要。The video summary generating device extracts keywords "flowering", "English grade", "fail", "dad", "go to the parent club", "kick", "mother" according to the subtitles corresponding to b1, c1 and d1. "," invited English tutor", "improved" and then combined these subtitles to generate a sentence "The blossoming English score failed, Dad took the mother to open the parent meeting. Mom asked English tutor to improve the English score" The above sentence is the video summary corresponding to B.
另外,视频摘要生成装置预先设定了更新规则:电视剧每更新两集更新一次视频摘要。一周后,电视剧《小别离》又更新了两集,系统输入《小别离》的第3集和第4集,生成装置根据新输入的第3集视频和第4集视频更新各个用户对应的视频摘要。In addition, the video summary generating device presets an update rule: the TV show updates the video summary every two episodes of the update. A week later, the TV series "Small Separation" updated two episodes. The system entered the third and fourth episodes of "Small Divorce", and the generating device updates the video corresponding to each user according to the newly input Episode 3 video and Episode 4 video. Summary.
上面介绍了本发明实施例中的视频摘要生成方法,下面介绍本发明实施例中的视频摘要生成装置,请参阅图4,本发明实施例中视频摘要生成装置的一个实施例包括:The video digest generating method in the embodiment of the present invention is described above. The video digest generating apparatus in the embodiment of the present invention is described below. Referring to FIG. 4, an embodiment of the video digest generating apparatus in the embodiment of the present invention includes:
分割模块401,用于将目标视频分割成若干个视频帧;a segmentation module 401, configured to divide the target video into a plurality of video frames;
第一确定模块402,用于根据用户特征,从若干个视频帧中确定用户对应的N个目标帧,N为大于1的整数;The first determining module 402 is configured to determine, according to the user feature, N target frames corresponding to the user from the plurality of video frames, where N is an integer greater than one;
提取模块403,用于提取该N个目标帧中的字幕;An extracting module 403, configured to extract subtitles in the N target frames;
生成模块404,用于根据提取模块403提取的字幕生成目标视频摘要。The generating module 404 is configured to generate a target video summary according to the subtitles extracted by the extracting module 403.
本发明实施例可以将目标视频分割成若干个视频帧,根据用户特征确定用户对应的N个目标帧,提取该N个目标帧中的字幕,并根据提取的字幕生成该用户的目标视频摘要。可见,本方案能够自动生成视频摘要,并且能够依据用户特征向不同的用户展示不同的视频摘要,更具有针对性,能够提升视频的浏览量,为更多用户提供有效信息,并提高了视频摘要生成的效率。The embodiment of the present invention may divide the target video into a plurality of video frames, determine N target frames corresponding to the user according to the user characteristics, extract subtitles in the N target frames, and generate a target video summary of the user according to the extracted subtitles. It can be seen that the solution can automatically generate a video summary, and can display different video summaries to different users according to user characteristics, which is more targeted, can improve the video browsing amount, provide effective information for more users, and improve the video summary. The efficiency of the generation.
基于上述图4对应的实施例,请参阅图5,在本发明实施例提供的视频摘要生成装置的另一实施例中,生成模块404包括:Based on the embodiment corresponding to FIG. 4, referring to FIG. 5, in another embodiment of the video summary generating apparatus provided by the embodiment of the present invention, the generating module 404 includes:
第一提取单元4041,用于提取字幕中的多个关键词;a first extracting unit 4041, configured to extract a plurality of keywords in the subtitle;
生成单元4042,用于对多个关键词进行组合,生成至少一个句子,将至少一个句子作为目标视频摘要。The generating unit 4042 is configured to combine a plurality of keywords to generate at least one sentence, and use at least one sentence as the target video summary.
可选地,在本发明实施例中,提取模块403可以包括:Optionally, in the embodiment of the present invention, the extracting module 403 may include:
第二提取单元4031,用于针对N个目标帧中的每个目标帧,提取该目标帧中的所有字幕;a second extracting unit 4031, configured to extract, for each target frame of the N target frames, all the subtitles in the target frame;
或,or,
第三提取单元4032,用于针对N个目标帧中的每个目标帧,提取该目标帧中的预置长度的字幕。The third extracting unit 4032 is configured to extract a preset length of the subtitle in the target frame for each of the N target frames.
本发明实施例提供了一种生成视频摘要的实现方式,提高了方案的可实现 性。The embodiment of the invention provides an implementation manner for generating a video summary, which improves the achievability of the solution.
其次本发明实施例提供了多种提取目标帧中的字幕的方式,提高了方案的灵活性。Secondly, the embodiment of the present invention provides a plurality of ways of extracting subtitles in a target frame, which improves the flexibility of the solution.
基于上述图4或图5对应的实施例,请参阅图6,在本发明实施例提供的视频摘要生成装置的另一实施例中,分割模块401包括:Based on the embodiment corresponding to FIG. 4 or FIG. 5, please refer to FIG. 6. In another embodiment of the video summary generating apparatus provided by the embodiment of the present invention, the segmentation module 401 includes:
第一分割单元4011,用于将目标视频分割成若干个镜头数据;a first dividing unit 4011, configured to divide the target video into a plurality of lens data;
第二分割单元4012,用于将每个镜头数据分割成若干个子镜头数据;a second dividing unit 4012, configured to divide each lens data into a plurality of sub-lens data;
第三分割单元4013,用于将每个子镜头数据分割成若干个视频帧。The third dividing unit 4013 is configured to divide each sub-lens data into a plurality of video frames.
本发明实施例提供了一种分割目标视频的实现方式,提高了方案的可实现性。The embodiment of the invention provides an implementation manner of splitting the target video, and improves the achievability of the solution.
基于上述图6对应的实施例,请参阅图7,在本发明实施例提供的视频摘要生成装置的另一实施例中,第一确定模块402包括:Based on the embodiment corresponding to FIG. 6 above, referring to FIG. 7, in another embodiment of the video summary generating apparatus provided by the embodiment of the present invention, the first determining module 402 includes:
第一确定单元4021,用于根据用户特征,从若干个视频帧中确定用户对应的L个子镜头数据,L为等于或大于1的整数;The first determining unit 4021 is configured to determine L sub-shot data corresponding to the user from the plurality of video frames according to the user feature, where L is an integer equal to or greater than 1;
第二确定单元4022,根据预设帧权重,确定L个子镜头数据中每个子镜头数据中的X个目标帧,X为等于或大于1的整数,X乘以L等于N。The second determining unit 4022 determines, according to the preset frame weight, X target frames in each of the L sub-shot data, where X is an integer equal to or greater than 1, and X is multiplied by L equal to N.
本发明实施例提供了一种确定目标帧的实现方式,提高了方案的可实现性。The embodiment of the invention provides an implementation manner for determining a target frame, and improves the achievability of the solution.
基于上述图7对应的实施例,请参阅图8,在本发明实施例提供的视频摘要生成装置的另一实施例中,第一确定单元4021包括:Based on the embodiment corresponding to FIG. 7 above, referring to FIG. 8, in another embodiment of the video summary generating apparatus provided by the embodiment of the present invention, the first determining unit 4021 includes:
第一确定子单元40211,用于确定目标视频对应的若干个子镜头数据中,包含该用户对应的标签信息的目标子镜头数据;The first determining sub-unit 40211 is configured to determine, in the plurality of sub-shot data corresponding to the target video, the target sub-shot data including the tag information corresponding to the user;
第二确定子单元40212,用于确定目标子镜头数据中,预设子镜头权重排名前L的子镜头数据。The second determining sub-unit 40212 is configured to determine, in the target sub-shot data, the sub-lens data of the top L of the preset sub-lens weights.
本发明实施例中,视频摘要生成装置提供了一种确定每个用户对应的L个子镜头数据的方式,提高了方案的可实现性。In the embodiment of the present invention, the video summary generating apparatus provides a method for determining L sub-shot data corresponding to each user, thereby improving the achievability of the solution.
基于上述图7或图8对应的实施例,请参阅图9,在本发明实施例提供的视频摘要生成装置的另一实施例中,该视频摘要生成装置还包括:Based on the embodiment corresponding to FIG. 7 or FIG. 8 , referring to FIG. 9 , in another embodiment of the video summary generating apparatus provided by the embodiment of the present invention, the video summary generating apparatus further includes:
分类模块405,用于针对每个子镜头数据,通过K均值聚类将该子镜头数据中的视频帧分成K类;a classification module 405, configured to divide the video frames in the sub-shot data into K classes by K-means clustering for each sub-shot data;
第二确定模块406,用于将每类视频帧中离聚类中心最近的视频帧确定为 该类视频帧的关键帧;a second determining module 406, configured to determine, in each type of video frame, a video frame that is closest to a cluster center as a key frame of the video frame;
第三确定模块407,用于根据帧参数确定每个关键帧的帧权重;a third determining module 407, configured to determine a frame weight of each key frame according to the frame parameter;
第二确定单元4022包括:The second determining unit 4022 includes:
第三确定子单元40221,用于针对L个子镜头数据中的每个子镜头数据,确定该子镜头数据包含的关键帧中,帧权重最大的X个目标帧。The third determining sub-unit 40221 is configured to determine, for each of the L sub-shot data, the X target frames with the largest frame weight among the key frames included in the sub-shot data.
本发明实施例提供了一种确定L个子镜头数据中的目标帧的方式,提高了方案的可实现性。The embodiment of the invention provides a method for determining a target frame in L sub-shot data, which improves the achievability of the solution.
基于上述图4至图9对应的实施例中任意一个实施例,本发明实施例提供的视频摘要生成装置的另一些实施例中,视频摘要生成装置还可以包括:In another embodiment of the video digest generating apparatus provided by the embodiment of the present invention, the video digest generating apparatus may further include:
更新模块,用于根据预设规则更新视频摘要。An update module for updating a video summary according to a preset rule.
本发明实施例中视频摘要生成装置还可以根据预设规则更新视频摘要,提高了方案的灵活性。In the embodiment of the present invention, the video summary generating apparatus may further update the video summary according to the preset rule, thereby improving the flexibility of the solution.
上面从功能模块的角度介绍了本发明实施例中的视频摘要生成装置,下面从硬件实体的角度介绍本发明实施例中的视频摘要生成装置,请参阅图10,图10是本发明实施例中视频摘要生成装置50的结构示意图。视频摘要生成装置50可包括输入设备510、输出设备520、处理器530和存储器540。本发明实施例中的输出设备可以是显示设备。The video summary generating apparatus in the embodiment of the present invention is described above from the perspective of a functional module. The video summary generating apparatus in the embodiment of the present invention is introduced from the perspective of a hardware entity. Referring to FIG. 10, FIG. 10 is an embodiment of the present invention. A schematic diagram of the structure of the video summary generating device 50. The video summary generating device 50 can include an input device 510, an output device 520, a processor 530, and a memory 540. The output device in the embodiment of the present invention may be a display device.
存储器540可以包括只读存储器和随机存取存储器,并向处理器530提供指令和数据。存储器540的一部分还可以包括非易失性随机存取存储器(英文全称:Non-Volatile Random Access Memory,英文缩写:NVRAM)。Memory 540 can include read only memory and random access memory and provides instructions and data to processor 530. A portion of the memory 540 may also include a non-volatile random access memory (English name: Non-Volatile Random Access Memory, English abbreviation: NVRAM).
存储器540存储了如下的元素,可执行模块或者数据结构,或者它们的子集,或者它们的扩展集:Memory 540 stores the following elements, executable modules or data structures, or subsets thereof, or their extended sets:
操作指令:包括各种操作指令,用于实现各种操作。Operation instructions: include various operation instructions for implementing various operations.
操作系统:包括各种系统程序,用于实现各种基础业务以及处理基于硬件的任务。Operating system: Includes a variety of system programs for implementing various basic services and handling hardware-based tasks.
本发明实施例中处理器530用于:In the embodiment of the present invention, the processor 530 is configured to:
将目标视频分割成若干个视频帧;Segmenting the target video into a number of video frames;
根据用户特征,从若干个视频帧中确定用户对应的N个目标帧,N为大于1的整数;Determining N target frames corresponding to the user from a plurality of video frames according to user characteristics, where N is an integer greater than one;
提取N个目标帧中的字幕;Extracting subtitles in N target frames;
根据字幕生成目标视频摘要。Generate a target video summary based on the captions.
本发明实施例中处理器530用于:In the embodiment of the present invention, the processor 530 is configured to:
提取所述字幕中的多个关键词;Extracting a plurality of keywords in the subtitle;
对所述多个关键词进行组合,生成至少一个句子,将所述至少一个句子作为所述目标视频摘要。Combining the plurality of keywords, generating at least one sentence, using the at least one sentence as the target video summary.
本发明实施例中处理器530用于:In the embodiment of the present invention, the processor 530 is configured to:
针对所述N个目标帧中的每个目标帧,提取所述目标帧中的所有字幕;Extracting all subtitles in the target frame for each of the N target frames;
或,or,
针对所述N个目标帧中的每个目标帧,提取所述目标帧中的预置长度的字幕。For each of the N target frames, a preset length of the subtitle in the target frame is extracted.
本发明实施例中处理器530用于:In the embodiment of the present invention, the processor 530 is configured to:
将所述目标视频分割成若干个镜头数据;Segmenting the target video into a plurality of shot data;
将每个镜头数据分割成若干个子镜头数据;Dividing each lens data into a plurality of sub-shot data;
将每个子镜头数据分割成若干个视频帧。Each sub-lens data is segmented into several video frames.
本发明实施例中处理器530用于:In the embodiment of the present invention, the processor 530 is configured to:
根据所述用户特征,从所述若干个视频帧中确定所述用户对应的L个子镜头数据,所述L为等于或大于1的整数;Determining, according to the user feature, L sub-shot data corresponding to the user from the plurality of video frames, where L is an integer equal to or greater than 1;
根据预设帧权重,确定所述L个子镜头数据中每个子镜头数据中的X个目标帧,所述X为等于或大于1的整数,所述X乘以所述L等于所述N。Determining X target frames in each of the L sub-shot data according to a preset frame weight, the X being an integer equal to or greater than 1, and the X multiplied by the L being equal to the N.
本发明实施例中处理器530用于:In the embodiment of the present invention, the processor 530 is configured to:
确定所述目标视频对应的若干个子镜头数据中,包含所述用户对应的标签信息的目标子镜头数据;Determining, in a plurality of sub-lens data corresponding to the target video, target sub-lens data including tag information corresponding to the user;
确定所述目标子镜头数据中,预设子镜头权重排名前L的子镜头数据。Determining, in the target sub-shot data, sub-lens data of the top L of the preset sub-lens weights.
本发明实施例中处理器530用于:In the embodiment of the present invention, the processor 530 is configured to:
针对每个子镜头数据,通过K均值聚类将所述子镜头数据中的视频帧分成K类;For each sub-shot data, the video frames in the sub-shot data are divided into K classes by K-means clustering;
将每类视频帧中离聚类中心最近的视频帧确定为所述一类视频帧的关键帧;Determining, in each type of video frame, a video frame closest to a cluster center as a key frame of the one type of video frame;
根据帧参数确定每个关键帧的所述帧权重;Determining the frame weight of each key frame according to a frame parameter;
本发明实施例中处理器530用于:In the embodiment of the present invention, the processor 530 is configured to:
针对所述L个子镜头数据中的每个子镜头数据,确定所述子镜头数据包含的关键帧中,帧权重最大的X个目标帧。Determining X target frames with the largest frame weight among the key frames included in the sub-shot data for each of the L sub-shot data.
处理器530控制视频摘要生成装置50的操作,处理器530还可以称为中央处理单元(英文全称:Central Processing Unit,英文缩写:CPU)。存储器540可以包括只读存储器和随机存取存储器,并向处理器530提供指令和数据。存储器540的一部分还可以包括NVRAM。实际应用中,视频摘要生成装置50的各个组件通过总线系统550耦合在一起,其中总线系统550除包括数据总线之外,还可以包括电源总线、控制总线和状态信号总线等。但是为了清楚说明起见,在图中将各种总线都标为总线系统550。The processor 530 controls the operation of the video summary generating device 50. The processor 530 may also be referred to as a central processing unit (English full name: Central Processing Unit: CPU). Memory 540 can include read only memory and random access memory and provides instructions and data to processor 530. A portion of the memory 540 may also include an NVRAM. In practical applications, the various components of the video summary generating device 50 are coupled together by a bus system 550. The bus system 550 may include a power bus, a control bus, a status signal bus, and the like in addition to the data bus. However, for clarity of description, various buses are labeled as bus system 550 in the figure.
上述本发明实施例揭示的方法可以应用于处理器530中,或者由处理器530实现。处理器530可能是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过处理器530中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器530可以是通用处理器、数字信号处理器(英文全称:Digital Signal Processing,英文缩写:DSP)、专用集成电路(英文全称:Application Specific Integrated Circuit,英文缩写:ASIC)、现成可编程门阵列(英文全称:Field-Programmable Gate Array,英文缩写:FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本发明实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本发明实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器540,处理器530读取存储器540中的信息,结合其硬件完成上述方法的步骤。The method disclosed in the foregoing embodiments of the present invention may be applied to the processor 530 or implemented by the processor 530. Processor 530 may be an integrated circuit chip with signal processing capabilities. In the implementation process, each step of the foregoing method may be completed by an integrated logic circuit of hardware in the processor 530 or an instruction in a form of software. The processor 530 may be a general-purpose processor, a digital signal processor (English name: Digital Signal Processing, English abbreviation: DSP), an application specific integrated circuit (English name: Application Specific Integrated Circuit, English abbreviation: ASIC), ready-made programmable Gate array (English name: Field-Programmable Gate Array, English abbreviation: FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components. The methods, steps, and logical block diagrams disclosed in the embodiments of the present invention may be implemented or carried out. The general purpose processor may be a microprocessor or the processor or any conventional processor or the like. The steps of the method disclosed in the embodiments of the present invention may be directly implemented by the hardware decoding processor, or may be performed by a combination of hardware and software modules in the decoding processor. The software module can be located in a conventional storage medium such as random access memory, flash memory, read only memory, programmable read only memory or electrically erasable programmable memory, registers, and the like. The storage medium is located in the memory 540, and the processor 530 reads the information in the memory 540 and performs the steps of the above method in combination with its hardware.
本发明一实施例提供了一种计算机可读存储介质,所述存储介质中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由所述处理器加载并执行以实现如上所述的一种视频摘要生成方法。An embodiment of the present invention provides a computer readable storage medium, where the storage medium stores at least one instruction, at least one program, a code set, or a set of instructions, the at least one instruction, the at least one program, and the A code set or set of instructions is loaded and executed by the processor to implement a video summary generation method as described above.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统,装置和单元的工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。A person skilled in the art can clearly understand that for the convenience and brevity of the description, the working process of the system, the device and the unit described above can refer to the corresponding process in the foregoing method embodiment, and details are not described herein again.
在本申请所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided by the present application, it should be understood that the disclosed system, apparatus, and method may be implemented in other manners. For example, the device embodiments described above are merely illustrative. For example, the division of the unit is only a logical function division. In actual implementation, there may be another division manner, for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed. In addition, the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit. The above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(英文全称:Read-Only Memory,英文缩写:ROM)、随机存取存储器(英文全称:Random Access Memory,英文缩写:RAM)、磁碟或者光盘等各种可以存储程序代码的介质。The integrated unit, if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the embodiments of the present invention may contribute to the prior art or all or part of the technical solution may be embodied in the form of a software product stored in a storage. The medium includes instructions for causing a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention. The foregoing storage medium includes: a U disk, a mobile hard disk, a read only memory (English full name: Read-Only Memory, English abbreviation: ROM), a random access memory (English full name: Random Access Memory, English abbreviation: RAM), magnetic A variety of media that can store program code, such as a disc or a disc.
以上所述,以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。The above embodiments are only used to explain the technical solutions of the present application, and are not limited thereto; although the present application has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that they can still The technical solutions described in the embodiments are modified, or the equivalents of the technical features are replaced by the equivalents of the technical solutions of the embodiments of the present invention.

Claims (22)

  1. 一种视频摘要生成方法,其特征在于,用于服务器中,所述方法包括:A method for generating a video summary, which is used in a server, the method includes:
    将目标视频分割成若干个视频帧;Segmenting the target video into a number of video frames;
    根据用户特征,从所述若干个视频帧中确定用户对应的N个目标帧,所述N为大于1的整数;Determining N target frames corresponding to the user from the plurality of video frames according to user characteristics, where N is an integer greater than 1;
    提取所述N个目标帧中的字幕;Extracting subtitles in the N target frames;
    根据所述字幕生成目标视频摘要。Generating a target video summary based on the subtitles.
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述字幕生成目标视频摘要,包括:The method according to claim 1, wherein the generating a target video summary according to the subtitle comprises:
    提取所述字幕中的多个关键词;Extracting a plurality of keywords in the subtitle;
    对所述多个关键词进行组合,生成至少一个句子,将所述至少一个句子作为所述目标视频摘要。Combining the plurality of keywords, generating at least one sentence, using the at least one sentence as the target video summary.
  3. 根据权利要求1所述的方法,其特征在于,所述提取所述N个目标帧中的字幕,包括:The method according to claim 1, wherein the extracting the subtitles in the N target frames comprises:
    针对所述N个目标帧中的每个目标帧,提取所述目标帧中的所有字幕;Extracting all subtitles in the target frame for each of the N target frames;
    或,or,
    针对所述N个目标帧中的每个目标帧,提取所述目标帧中的预置长度的字幕。For each of the N target frames, a preset length of the subtitle in the target frame is extracted.
  4. 根据权利要求1至3中任一项所述的方法,其特征在于,所述将目标视频分割成若干个视频帧,包括:The method according to any one of claims 1 to 3, wherein the dividing the target video into a plurality of video frames comprises:
    将所述目标视频分割成若干个镜头数据;Segmenting the target video into a plurality of shot data;
    将每个镜头数据分割成若干个子镜头数据;Dividing each lens data into a plurality of sub-shot data;
    将每个子镜头数据分割成若干个视频帧。Each sub-lens data is segmented into several video frames.
  5. 根据权利要求4所述的方法,其特征在于,所述根据用户特征,从所述若干个视频帧中确定用户对应的N个目标帧,包括:The method according to claim 4, wherein the determining the N target frames corresponding to the user from the plurality of video frames according to the user feature comprises:
    根据所述用户特征,从所述若干个视频帧中确定所述用户对应的L个子镜头数据,所述L为等于或大于1的整数;Determining, according to the user feature, L sub-shot data corresponding to the user from the plurality of video frames, where L is an integer equal to or greater than 1;
    根据预设帧权重,确定所述L个子镜头数据中每个子镜头数据中的X个目标帧,所述X为等于或大于1的整数,所述X乘以所述L等于所述N。Determining X target frames in each of the L sub-shot data according to a preset frame weight, the X being an integer equal to or greater than 1, and the X multiplied by the L being equal to the N.
  6. 根据权利要求5所述的方法,其特征在于,所述根据所述用户特征,从 所述若干个视频帧中确定所述用户对应的L个子镜头数据,包括:The method according to claim 5, wherein the determining the L sub-shot data corresponding to the user from the plurality of video frames according to the user feature comprises:
    确定所述目标视频对应的若干个子镜头数据中,包含所述用户对应的标签信息的目标子镜头数据;Determining, in a plurality of sub-lens data corresponding to the target video, target sub-lens data including tag information corresponding to the user;
    确定所述目标子镜头数据中,预设子镜头权重排名前L的子镜头数据。Determining, in the target sub-shot data, sub-lens data of the top L of the preset sub-lens weights.
  7. 根据权利要求5所述的方法,其特征在于,所述方法还包括:The method of claim 5, wherein the method further comprises:
    针对每个子镜头数据,通过K均值聚类将所述子镜头数据中的视频帧分成K类;For each sub-shot data, the video frames in the sub-shot data are divided into K classes by K-means clustering;
    将每类视频帧中离聚类中心最近的视频帧确定为所述一类视频帧的关键帧;Determining, in each type of video frame, a video frame closest to a cluster center as a key frame of the one type of video frame;
    根据帧参数确定每个关键帧的所述帧权重;Determining the frame weight of each key frame according to a frame parameter;
    所述根据预设帧权重,确定所述L个子镜头数据中每个子镜头数据中的X个目标帧,包括:Determining, according to the preset frame weight, the X target frames in each of the L sub-shot data, including:
    针对所述L个子镜头数据中的每个子镜头数据,确定所述子镜头数据包含的关键帧中,帧权重最大的X个目标帧。Determining X target frames with the largest frame weight among the key frames included in the sub-shot data for each of the L sub-shot data.
  8. 一种视频摘要生成装置,其特征在于,所述装置包括:A video summary generating device, the device comprising:
    分割模块,用于将目标视频分割成若干个视频帧;a segmentation module, configured to divide the target video into a plurality of video frames;
    第一确定模块,用于根据用户特征,从所述若干个视频帧中确定用户对应的N个目标帧,所述N为大于1的整数;a first determining module, configured to determine, according to user characteristics, N target frames corresponding to the user from the plurality of video frames, where N is an integer greater than 1;
    提取模块,用于提取所述N个目标帧中的字幕;An extracting module, configured to extract subtitles in the N target frames;
    生成模块,用于根据所述字幕生成目标视频摘要。And a generating module, configured to generate a target video summary according to the subtitle.
  9. 根据权利要求8所述的装置,其特征在于,所述生成模块,包括:The device according to claim 8, wherein the generating module comprises:
    第一提取单元,用于提取所述字幕中的多个关键词;a first extracting unit, configured to extract a plurality of keywords in the subtitle;
    生成单元,用于对所述多个关键词进行组合,生成至少一个句子,将所述至少一个句子作为所述目标视频摘要。And a generating unit, configured to combine the plurality of keywords, generate at least one sentence, and use the at least one sentence as the target video summary.
  10. 根据权利要求8所述的装置,其特征在于,所述提取模块,包括:The device according to claim 8, wherein the extraction module comprises:
    第二提取单元,用于针对所述N个目标帧中的每个目标帧,提取所述目标帧中的所有字幕;a second extracting unit, configured to extract all subtitles in the target frame for each of the N target frames;
    或,or,
    第三提取单元,用于针对所述N个目标帧中的每个目标帧,提取所述目标帧中的预置长度的字幕。And a third extracting unit, configured to extract a preset length of the subtitle in the target frame for each of the N target frames.
  11. 根据权利要求8至10中任一项所述的装置,其特征在于,所述分割模 块,包括:The apparatus according to any one of claims 8 to 10, wherein the segmentation module comprises:
    第一分割单元,用于将所述目标视频分割成若干个镜头数据;a first dividing unit, configured to divide the target video into a plurality of lens data;
    第二分割单元,用于将每个镜头数据分割成若干个子镜头数据;a second dividing unit, configured to divide each lens data into a plurality of sub-shot data;
    第三分割单元,用于将每个子镜头数据分割成若干个视频帧。And a third dividing unit, configured to divide each sub-lens data into a plurality of video frames.
  12. 根据权利要求11所述的装置,其特征在于,所述第一确定模块,包括:The device according to claim 11, wherein the first determining module comprises:
    第一确定单元,用于根据所述用户特征,从所述若干个视频帧中确定所述用户对应的L个子镜头数据,所述L为等于或大于1的整数;a first determining unit, configured to determine, according to the user feature, L sub-shot data corresponding to the user from the plurality of video frames, where L is an integer equal to or greater than 1;
    第二确定单元,用于根据预设帧权重,确定所述L个子镜头数据中每个子镜头数据中的X个目标帧,所述X为等于或大于1的整数,所述X乘以所述L等于所述N。a second determining unit, configured to determine, according to a preset frame weight, X target frames in each of the L sub-shot data, wherein the X is an integer equal to or greater than 1, and the X is multiplied by the L is equal to the N.
  13. 根据权利要求12所述的装置,其特征在于,所述第一确定单元,包括:The device according to claim 12, wherein the first determining unit comprises:
    第一确定子单元,用于确定所述目标视频对应的若干个子镜头数据中,包含所述用户对应的标签信息的目标子镜头数据;a first determining subunit, configured to determine target sub-shot data of the tag information corresponding to the user, among the plurality of sub-lens data corresponding to the target video;
    第二确定子单元,用于确定所述目标子镜头数据中,预设子镜头权重排名前L的子镜头数据。a second determining subunit, configured to determine, in the target sub-shot data, the sub-lens data of the top L of the preset sub-lens weights.
  14. 根据权利要求12所述的装置,其特征在于,所述装置还包括:The device of claim 12, wherein the device further comprises:
    分类模块,用于针对每个子镜头数据,通过K均值聚类将该子镜头数据中的视频帧分成K类;a classification module, configured to classify the video frames in the sub-shot data into K classes by K-means clustering for each sub-shot data;
    第二确定模块,用于将每类视频帧中离聚类中心最近的视频帧确定为所述一类视频帧的关键帧;a second determining module, configured to determine, in each type of video frame, a video frame that is closest to a cluster center as a key frame of the one type of video frame;
    第三确定模块,用于根据帧参数确定每个关键帧的所述帧权重;a third determining module, configured to determine the frame weight of each key frame according to a frame parameter;
    所述第二确定单元包括:The second determining unit includes:
    第三确定子单元,用于针对所述L个子镜头数据中的每个子镜头数据,确定所述子镜头数据包含的关键帧中,帧权重最大的X个目标帧。And a third determining subunit, configured to determine, for each of the L sub-shot data, X target frames with the largest frame weight among the key frames included in the sub-shot data.
  15. 一种服务器,其特征在于,所述服务器包括:A server, wherein the server comprises:
    一个或多个处理器;和,One or more processors; and,
    存储器;Memory
    所述存储器存储有一个或多个程序,所述一个或多个程序被配置成由所述一个或多个处理器执行,所述一个或多个程序包含用于执行以下操作的指令:The memory stores one or more programs, the one or more programs being configured to be executed by the one or more processors, the one or more programs including instructions for performing the following operations:
    将目标视频分割成若干个视频帧;Segmenting the target video into a number of video frames;
    根据用户特征,从所述若干个视频帧中确定用户对应的N个目标帧,所述 N为大于1的整数;Determining N target frames corresponding to the user from the plurality of video frames according to user characteristics, where N is an integer greater than 1;
    提取所述N个目标帧中的字幕;Extracting subtitles in the N target frames;
    根据所述字幕生成目标视频摘要。Generating a target video summary based on the subtitles.
  16. 根据权利要求15所述的服务器,其特征在于,所述一个或多个程序还包含用于执行以下操作的指令:The server of claim 15 wherein said one or more programs further comprise instructions for:
    提取所述字幕中的多个关键词;Extracting a plurality of keywords in the subtitle;
    对所述多个关键词进行组合,生成至少一个句子,将所述至少一个句子作为所述目标视频摘要。Combining the plurality of keywords, generating at least one sentence, using the at least one sentence as the target video summary.
  17. 根据权利要求15所述的服务器,其特征在于,所述一个或多个程序还包含用于执行以下操作的指令:The server of claim 15 wherein said one or more programs further comprise instructions for:
    针对所述N个目标帧中的每个目标帧,提取所述目标帧中的所有字幕;Extracting all subtitles in the target frame for each of the N target frames;
    或,or,
    针对所述N个目标帧中的每个目标帧,提取所述目标帧中的预置长度的字幕。For each of the N target frames, a preset length of the subtitle in the target frame is extracted.
  18. 根据权利要求15至17任一项所述的服务器,其特征在于,所述一个或多个程序还包含用于执行以下操作的指令:A server according to any one of claims 15 to 17, wherein the one or more programs further comprise instructions for performing the following operations:
    将所述目标视频分割成若干个镜头数据;Segmenting the target video into a plurality of shot data;
    将每个镜头数据分割成若干个子镜头数据;Dividing each lens data into a plurality of sub-shot data;
    将每个子镜头数据分割成若干个视频帧。Each sub-lens data is segmented into several video frames.
  19. 根据权利要求18所述的服务器,其特征在于,所述一个或多个程序还包含用于执行以下操作的指令:The server of claim 18, wherein the one or more programs further comprise instructions for performing the following operations:
    根据所述用户特征,从所述若干个视频帧中确定所述用户对应的L个子镜头数据,所述L为等于或大于1的整数;Determining, according to the user feature, L sub-shot data corresponding to the user from the plurality of video frames, where L is an integer equal to or greater than 1;
    根据预设帧权重,确定所述L个子镜头数据中每个子镜头数据中的X个目标帧,所述X为等于或大于1的整数,所述X乘以所述L等于所述N。Determining X target frames in each of the L sub-shot data according to a preset frame weight, the X being an integer equal to or greater than 1, and the X multiplied by the L being equal to the N.
  20. 根据权利要求19所述的服务器,其特征在于,所述一个或多个程序还包含用于执行以下操作的指令:The server of claim 19, wherein the one or more programs further comprise instructions for performing the following operations:
    确定所述目标视频对应的若干个子镜头数据中,包含所述用户对应的标签信息的目标子镜头数据;Determining, in a plurality of sub-lens data corresponding to the target video, target sub-lens data including tag information corresponding to the user;
    确定所述目标子镜头数据中,预设子镜头权重排名前L的子镜头数据。Determining, in the target sub-shot data, sub-lens data of the top L of the preset sub-lens weights.
  21. 根据权利要求19所述的服务器,其特征在于,所述一个或多个程序还 包含用于执行以下操作的指令:The server of claim 19, wherein the one or more programs further comprise instructions for performing the following operations:
    针对每个子镜头数据,通过K均值聚类将所述子镜头数据中的视频帧分成K类;For each sub-shot data, the video frames in the sub-shot data are divided into K classes by K-means clustering;
    将每类视频帧中离聚类中心最近的视频帧确定为所述一类视频帧的关键帧;Determining, in each type of video frame, a video frame closest to a cluster center as a key frame of the one type of video frame;
    根据帧参数确定每个关键帧的所述帧权重;Determining the frame weight of each key frame according to a frame parameter;
    针对所述L个子镜头数据中的每个子镜头数据,确定所述子镜头数据包含的关键帧中,帧权重最大的X个目标帧。Determining X target frames with the largest frame weight among the key frames included in the sub-shot data for each of the L sub-shot data.
  22. 一种计算机可读存储介质,其特征在于,所述存储介质中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由所述处理器加载并执行以实现如权利要求1至7任一所述的一种视频摘要生成方法。A computer readable storage medium, wherein the storage medium stores at least one instruction, at least one program, a code set or a set of instructions, the at least one instruction, the at least one program, the code set or An instruction set is loaded and executed by the processor to implement a video summary generation method as claimed in any one of claims 1 to 7.
PCT/CN2018/079246 2017-03-28 2018-03-16 Method and apparatus for generating video abstract, server and storage medium WO2018177139A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710192629.4A CN106888407B (en) 2017-03-28 2017-03-28 A kind of video abstraction generating method and device
CN201710192629.4 2017-03-28

Publications (1)

Publication Number Publication Date
WO2018177139A1 true WO2018177139A1 (en) 2018-10-04

Family

ID=59181973

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/079246 WO2018177139A1 (en) 2017-03-28 2018-03-16 Method and apparatus for generating video abstract, server and storage medium

Country Status (2)

Country Link
CN (1) CN106888407B (en)
WO (1) WO2018177139A1 (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106888407B (en) * 2017-03-28 2019-04-02 腾讯科技(深圳)有限公司 A kind of video abstraction generating method and device
CN109729425B (en) * 2017-10-27 2021-05-18 优酷网络技术(北京)有限公司 Method and system for predicting key segments
CN109756767B (en) * 2017-11-06 2021-12-14 腾讯科技(深圳)有限公司 Preview data playing method, device and storage medium
CN108683924B (en) * 2018-05-30 2021-12-28 北京奇艺世纪科技有限公司 Video processing method and device
CN109151576A (en) * 2018-06-20 2019-01-04 新华网股份有限公司 Multimedia messages clipping method and system
CN110753269B (en) * 2018-07-24 2022-05-03 Tcl科技集团股份有限公司 Video abstract generation method, intelligent terminal and storage medium
CN110769279B (en) 2018-07-27 2023-04-07 北京京东尚科信息技术有限公司 Video processing method and device
CN110933488A (en) * 2018-09-19 2020-03-27 传线网络科技(上海)有限公司 Video editing method and device
CN109413510B (en) * 2018-10-19 2021-05-18 深圳市商汤科技有限公司 Video abstract generation method and device, electronic equipment and computer storage medium
CN109348287B (en) * 2018-10-22 2022-01-28 深圳市商汤科技有限公司 Video abstract generation method and device, storage medium and electronic equipment
CN111050191B (en) * 2019-12-30 2021-02-02 腾讯科技(深圳)有限公司 Video generation method and device, computer equipment and storage medium
CN115190357A (en) * 2022-07-05 2022-10-14 三星电子(中国)研发中心 Video abstract generation method and device
CN115334367B (en) * 2022-07-11 2023-10-17 北京达佳互联信息技术有限公司 Method, device, server and storage medium for generating abstract information of video

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6751776B1 (en) * 1999-08-06 2004-06-15 Nec Corporation Method and apparatus for personalized multimedia summarization based upon user specified theme
CN101131850A (en) * 2006-08-21 2008-02-27 索尼株式会社 Program providing method and program providing apparatus
CN103646094A (en) * 2013-12-18 2014-03-19 上海紫竹数字创意港有限公司 System and method for automatic extraction and generation of audiovisual product content abstract
CN106528884A (en) * 2016-12-15 2017-03-22 腾讯科技(深圳)有限公司 Information presentation picture generation method and device
CN106888407A (en) * 2017-03-28 2017-06-23 腾讯科技(深圳)有限公司 A kind of video abstraction generating method and device
CN106921891A (en) * 2015-12-24 2017-07-04 北京奇虎科技有限公司 The methods of exhibiting and device of a kind of video feature information

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101109023B1 (en) * 2003-04-14 2012-01-31 코닌클리케 필립스 일렉트로닉스 엔.브이. Method and apparatus for summarizing a music video using content analysis
US8036263B2 (en) * 2005-12-23 2011-10-11 Qualcomm Incorporated Selecting key frames from video frames
CN101464893B (en) * 2008-12-31 2010-09-08 清华大学 Method and device for extracting video abstract
CN102184221B (en) * 2011-05-06 2012-12-19 北京航空航天大学 Real-time video abstract generation method based on user preferences
CN104185089B (en) * 2013-05-23 2018-02-16 三星电子(中国)研发中心 Video summary generation method and server, client
EP2960812A1 (en) * 2014-06-27 2015-12-30 Thomson Licensing Method and apparatus for creating a summary video

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6751776B1 (en) * 1999-08-06 2004-06-15 Nec Corporation Method and apparatus for personalized multimedia summarization based upon user specified theme
CN101131850A (en) * 2006-08-21 2008-02-27 索尼株式会社 Program providing method and program providing apparatus
CN103646094A (en) * 2013-12-18 2014-03-19 上海紫竹数字创意港有限公司 System and method for automatic extraction and generation of audiovisual product content abstract
CN106921891A (en) * 2015-12-24 2017-07-04 北京奇虎科技有限公司 The methods of exhibiting and device of a kind of video feature information
CN106528884A (en) * 2016-12-15 2017-03-22 腾讯科技(深圳)有限公司 Information presentation picture generation method and device
CN106888407A (en) * 2017-03-28 2017-06-23 腾讯科技(深圳)有限公司 A kind of video abstraction generating method and device

Also Published As

Publication number Publication date
CN106888407B (en) 2019-04-02
CN106888407A (en) 2017-06-23

Similar Documents

Publication Publication Date Title
WO2018177139A1 (en) Method and apparatus for generating video abstract, server and storage medium
CN111143610B (en) Content recommendation method and device, electronic equipment and storage medium
JP7201729B2 (en) Video playback node positioning method, apparatus, device, storage medium and computer program
CN111611436B (en) Label data processing method and device and computer readable storage medium
US10013487B2 (en) System and method for multi-modal fusion based fault-tolerant video content recognition
KR102276728B1 (en) Multimodal content analysis system and method
KR101944469B1 (en) Estimating and displaying social interest in time-based media
US11270123B2 (en) System and method for generating localized contextual video annotation
US8364660B2 (en) Apparatus and software system for and method of performing a visual-relevance-rank subsequent search
WO2018108047A1 (en) Method and device for generating information displaying image
US20150293928A1 (en) Systems and Methods for Generating Personalized Video Playlists
US20130014016A1 (en) Apparatus and software system for and method of performing a visual-relevance-rank subsequent search
TW202002611A (en) Video subtitle display method and apparatus
CN111274442B (en) Method for determining video tag, server and storage medium
CN113079417B (en) Method, device and equipment for generating bullet screen and storage medium
Thomas et al. Perceptual video summarization—A new framework for video summarization
EP2874102A2 (en) Generating models for identifying thumbnail images
CN111711869A (en) Label data processing method and device and computer readable storage medium
US10339146B2 (en) Device and method for providing media resource
Gagnon et al. Towards computer-vision software tools to increase production and accessibility of video description for people with vision loss
CN116051192A (en) Method and device for processing data
TWI725375B (en) Data search method and data search system thereof
CN109800326B (en) Video processing method, device, equipment and storage medium
CN116049490A (en) Material searching method and device and electronic equipment
Tapu et al. TV news retrieval based on story segmentation and concept association

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18776483

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18776483

Country of ref document: EP

Kind code of ref document: A1