CN104144181B - A network video system and polymerization process terminals - Google Patents

A network video system and polymerization process terminals Download PDF

Info

Publication number
CN104144181B
CN104144181B CN 201310166163 CN201310166163A CN104144181B CN 104144181 B CN104144181 B CN 104144181B CN 201310166163 CN201310166163 CN 201310166163 CN 201310166163 A CN201310166163 A CN 201310166163A CN 104144181 B CN104144181 B CN 104144181B
Authority
CN
Grant status
Grant
Patent type
Prior art keywords
video
interest
network
dimensional
module
Prior art date
Application number
CN 201310166163
Other languages
Chinese (zh)
Other versions
CN104144181A (en )
Inventor
张辉
李长路
孙鹏
潘梁
Original Assignee
中国科学院声学研究所
北京海力汇通数字系统技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Grant date

Links

Abstract

本发明公开了一种网络视频终端聚合方法及系统,尤其适合在智能电视终端应用,包括:订阅模块,用于指定网络视频聚合的源;爬虫模块,用于从订阅网站提取网络视频元数据;本地数据库模块,用于存储本地播放记录和本地视频信息;预处理模块,用于对本地数据库数据进行预处理,以适应兴趣挖掘需要;兴趣挖掘模块,用于根据本地数据库,挖掘用户多维兴趣主题;匹配过滤模块,用于根据网络视频与用户感兴趣的匹配程度对网络视频进行过滤排序;UI显示模块,用于显示经过滤、排序而得的网络视频列表。 The present invention discloses a method and a video terminal polymerization systems, especially for applications in the smart TV terminal, comprising: a subscription module configured to specify a network video sources polymerization; crawler module for extracting metadata from video network subscription site; local database module for storing local playback and recording local video information; pre-processing module, configured to preprocess the data to a local database, the need to adapt interest mining; mining module of interest, according to the local database, a multidimensional mining user interest topics ; matched filtering module for filtering the video network sorting according to the matching degree of interest to the user network video; the UI display module for displaying filtered, ordered list of videos obtained network. 该方法利用终端丰富的用户播放记录,挖掘用户多维兴趣主题,并利用订阅和兴趣等限制条件从海量的网络视频资源中聚合符合用户兴趣的网络视频到终端。 The method uses a rich end user to play records, multidimensional mining user interest topics, and use of subscription and interest and other restrictions converged network video in line with the user's interest from the mass-to-end network video resources.

Description

一种网络视频终端聚合方法及系统 A network video system and polymerization process terminals

技术领域 FIELD

[0001] 本发明涉及数据挖掘领域,信息聚合领域,实现利用终端用户信息对用户兴趣的挖掘,提取多维兴趣主题,并以此为据,将用户订阅的视频网站中,用户感兴趣的视频聚合到用户终端。 [0001] The present invention relates to data mining field, the information field of the polymerization, implemented by the end user on the user interest information mining, extraction multidimensional topic of interest, and on this data, the user subscribes to a video site, users interested in video polymerization to the user terminal.

背景技术 Background technique

[0002] 传统的终端媒体信息管理局限于本地媒体信息数据库的管理和更新,以供用户查阅,以及在用户操作时向交互系统提供必要的信息支持。 [0002] Traditional media terminal information management limited to manage and update the local media information database for user access, as well as provide the necessary information to support interactive system when the user operation. 在三网融合背景下,作为网络终端的电视机智能终端操作系统,不再满足于对本地存储信息的查阅,而需要根据用户的需要提供更丰富的网络视频信息。 In the context of triple play, smart terminal operating system as the TV network terminal, no longer satisfied with access to locally stored information, and the need to provide a richer online video information according to user needs. 把互联网视频网站中用户感兴趣的视频像本地视频一样呈现给用户点击播放,已经成为趋势。 The same as the local video showing Internet video sites of interest to the user clicks play video users, it has become a trend.

[0003] 目前用户获取网络视频的手段包括视频网站浏览,搜索,推荐,以及少量的c/s聚合系统。 Means [0003] Currently users to access online video including video site to browse, search, recommendation, and small amounts of c / s aggregation system. 网页浏览和搜索在pc终端上应用广泛,但对于电视机、手机等并不擅长键鼠操作的智能终端来讲,明显增加了用户负担,降低了用户体验。 Web browsing and search widely used in pc terminal, but for televisions, mobile phones and other intelligent terminal mouse and keyboard operation is not good at speaking, significantly increased the burden on the user, reducing the user experience. 现有的聚合系统都采用服务器/终端模式,使用户面临着必须注册、反馈,并且被迫接受服务端广告等冗余信息困境。 Existing systems use server aggregation / Terminal mode, the user is facing must be registered, feedback and forced to accept the plight of redundant information server advertisements.

[0004] 另一方面,网络信息资源的海量增长和用户贡献内容的不断扩充,给用户带来资源选择多样化和自主化的同时,也带来选择迷航的问题,而当互联网的信息量越来越大,搜索引擎这种基于内容本身呈现信息的方式再怎么改良,也无法避免冗余的信息。 [0004] On the other hand, the massive growth of Internet users and information resources contribution to the continued expansion of content, giving users a varied resources and autonomy, but also cause problems Trek selected, and when the amount of information the Internet to greater search engine based on this way of presenting information content itself, no matter how improved, can not avoid redundant information. 大量冗余信息的存在也会成为用户和终端的负担。 There are a lot of redundant information will become a burden on the user and the terminal.

发明内容 SUMMARY

[0005] 本发明的目的在于,提供一种终端主动聚合网络视频的方法,能够有效扩展视频来源,充分享受海量增长的网络视频源给用户带来资源选择多样化和自主化的同时,避免其带来的选择迷航问题。 [0005] The object of the present invention is to provide a network video terminal actively polymerization method, video source can be effectively extended, fully enjoy the massive growth in the video source to the user network resources and autonomy varied while avoiding its Trek to bring choice questions. 同时还要避免目前主要方案中需要注册、提交用户信息、显式获取兴趣等智能终端用户,尤其是电视机用户通常不愿意做的事情。 While avoiding major programs currently in need to register, submit user information, to obtain an explicit interest and other smart end-users, especially TV users are often reluctant to do things.

[0006] 为实现上述目的,本发明提供了一种网络视频终端聚合方法,所述方法包含: [0006] To achieve the above object, the present invention provides a method of polymerizing a video terminal, said method comprising:

[0007] 步骤101)通过订阅指定网络视频聚合的源; [0007] Step 101) specify the online video source polymerized by subscription;

[0008] 步骤102)利用爬虫从订阅网站提取网络视频元数据; [0008] Step 102) using a network crawler video metadata extracted from the subscriber sites;

[0009] 步骤103)将本地播放记录和本地视频信息存储在本地数据库; [0009] Step 103) will be played locally in a local database record video information and local storage;

[0010] 步骤104)对本地数据库数据进行预处理,以适应兴趣挖掘需要,其中,所述预处理是对数据库中存储的视频信息逐条过滤,剔除无效信息记录,选取符合条件的数据用于兴趣挖掘; [0010] Step 104) to the local database for data pre-processing, mining needs to accommodate interest, wherein said pretreatment of video information stored in the database by one filter, excluding invalid recording information, selected data for qualified interest mining;

[0011] 步骤105)根据本地数据库,挖掘用户多维兴趣主题,所述本地数据库以一定的数据结构存储若干条视频元数据描述,这些视频对象包括本地存储的视频文件,以及用户播放记录中的视频; [0011] Step 105) according to the local database, a multidimensional mining user interest topic, said local database to several pieces of a certain video metadata describes data structure stored, these video objects include a video locally stored video files, and playing records of the user ;

[0012] 步骤106)根据网络视频与用户兴趣的匹配程度对网络视频进行过滤排序,所述匹配过滤依次将每一条网络视频描述信息与兴趣主题匹配,过滤并保留匹配程度高于阈值的结果,并排序; [0012] Step 106) sort the filtered video network according to the degree of matching network video and user interest, each of the matched filter sequentially a video network description information matches the topic of interest, filter and retain the result of the matching degree higher than a threshold value, and sorting;

[0013] 步骤107)显示经过滤、排序而得的网络视频列表。 [0013] Step 107) displayed by filtration, obtained by the network ordered list of videos.

[0014] 上述网络视频元数据包括:视频名、视频源地址、年份、导演、演员或类型,将所有元数据形成网络视频的多维描述信息。 [0014] video metadata said network comprising: a video name, video source address, year, director, actors, or type, all the metadata information describing a multi-dimensional network formed of video.

[0015] 上述指定视频聚合源的网站是一个或多个视频网站的首页网址。 [0015] The specified video source site is the polymerization of one or more video site home page URL.

[0016] 上述爬虫模块以订阅模块指定的一个或多个网页为初始页面,提取视频元数据, 为每一个视频生成一条元数据描述,并嵌套地对其包含的二级页面逐一遍历,以获取符合条件的视频元数;同时,获取元数据的方式还可选地包括直接收割网站按一定规范发布的视频信息。 [0016] one or more of the above-described web crawler module to subscribe to the module specified for initial page, extracts video metadata, generate a metadata description for each video, and stepping through its nested page contains two to Gets the number of video metadata qualifying; at the same time, access to the metadata approach also optionally includes video information directly harvesting site published by a certain specification.

[0017] 多维兴趣主题即为在两个或两个以上维度进行描述的兴趣主题,其基础为,每一条视频信息都由多个维度的描述信息组成。 Topics of interest [0017] multidimensional topics of interest shall be described in two or more dimensions, its basis is, by the multiple dimensions of each piece of information consisting of video description.

[0018] 多维兴趣主题的提取分为以下步骤: [0018] extract multi-dimensional topic of interest into the following steps:

[0019] a、一维兴趣提取:对每一个拟挖掘的维度采取独立的兴趣挖掘策略和标准,得到该维度上的若干兴趣主题,成为一个集合; [0019] a, one-dimensional interest extract: take separate interest in each dimension of the proposed mining mining policy and standards, get a number of topics of interest on that dimension, as a collection;

[0020] b、二维兴趣提取:在不同维度间,若两个兴趣主题同时出现在一条多维信息中,则这两个兴趣主题有关联;同时出现越多,关联越大;把关联度大过阈值的组合在一起,成为一个二维兴趣主题,采用同样的方法找出所有的二维兴趣主题; [0020] b, two-dimensional interest extract: between different dimensions, if the two topics of interest occur simultaneously in a multi-dimensional information, the two topics of interest associated; at the same time appear more the greater the association; associate degree combination over the threshold together into a two-dimensional topic of interest, using the same method to find all of the two-dimensional topics of interest;

[0021] c、多维兴趣提取:若某维度上的主题出现在两个多维主题中,检查是否这两个多维主题中每个一维主题间都存在超过阈值的关联程度,若是,则合并这两个多维主题,成为更高维度的兴趣主题; [0021] c, multidimensional interest extract: If a certain topic appears in two dimensions multidimensional topic, check whether there is correlation degree exceeds the threshold between these two themes in each one-dimensional multi-dimensional theme, if so, to merge two multi-dimensional theme, topics of interest to become higher dimensions;

[0022] d、记录所有不能进一步合并的多维兴趣主题。 [0022] d, further consolidation can not record all multidimensional topics of interest.

[0023] 为了实现上述目的,本发明还提供了一种网络视频终端聚合系统,所述系统包含: [0023] To achieve the above object, the present invention further provides a polymeric network video terminal, the system comprising:

[0024] 订阅模块,用于指定网络视频聚合的源; [0024] Subscription module configured to specify a network video sources polymerization;

[0025] 爬虫模块,用于从订阅模块获得的网络视频聚合源的网站提取网络视频元数据; [0025] crawler module, available from the website for the network subscription module polymerization video source video metadata extraction network;

[0026] 本地数据库模块,用于存储本地播放记录和本地视频信息; [0026] The local database module for storing local playback and recording local video information;

[0027] 预处理模块,用于对本地数据库数据进行预处理,以适应兴趣挖掘需要; [0027] The preprocessing module for preprocessing the data to the local database, the need to accommodate interest mining;

[0028] 兴趣挖掘模块,用于根据本地数据库依据如下原则进行一维至多维的兴趣提取: [0028] interest in mining module for extracting local database of interest to a one-dimensional multi-dimensional according to principles based on the following:

[0029] 一维兴趣提取:对每一个拟挖掘的维度采取独立的兴趣挖掘策略和标准,得到该维度上的若干兴趣主题,成为一个集合; [0029] one-dimensional interest extract: take separate interest in each dimension of the proposed mining mining policy and standards, get a number of topics of interest on that dimension, as a collection;

[0030] 二维兴趣提取:在不同维度间,若两个兴趣主题同时出现在一条多维信息中,则这两个兴趣主题有关联;同时出现越多,关联越大;把关联度大过阈值的链接在一起,成为一个二维兴趣主题,依据此策略找到所有二维兴趣主题; [0030]-D Interests extract: between different dimensions, if the two topics of interest occur simultaneously in a multi-dimensional information, the two topics of interest associated; at the same time appear more, the greater the relevance; the correlation value is greater than the threshold It links together into a two-dimensional topic of interest, find all 2D topics of interest based on this policy;

[0031] 多维兴趣提取:若某维度上的主题出现在两个多维主题中,检查是否这两个多维主题中每个一维主题间都存在超过阈值的关联程度,若是,则合并这两个多维主题,成为更高维度的兴趣主题;记录所有不能进一步合并的多维兴趣主题,完成兴趣挖掘; [0031] multidimensional interest extract: If a certain topic appears in two dimensions multidimensional topic, check the degree of correlation exists between exceeding the threshold of whether these two themes in each one-dimensional multi-dimensional theme, if so, to merge the two multi-dimensional theme, topics of interest to become higher dimensions; all records can not be further consolidation of multi-dimensional topic of interest, complete interest in mining;

[0032] 匹配过滤模块,用于根据网络视频与用户兴趣的匹配程度对网络视频进行过滤排序; [0032] The match filter module for filtering the degree of ordering of the video network and the video matching network user interest;

[0033] 显示模块,用于显示经过滤、排序而得的网络视频列表。 [0033] The display module for displaying filtered, ordered list of videos obtained network.

[0034] 上述订阅模块允许用户指定一个或多个视频网站网址作为视频信息聚合的源,被指定网址的页面及其引用的二级页面包含的视频都包含在后续聚合范围内,并且能够指定视频网站的首页。 [0034] The subscription module allows the user to specify one or more video information aggregation site URL as a video source, the video URL and the page is designated by reference page comprising two are included within the scope of the subsequent polymerization, and can specify the video Home website.

[0035] 上述爬虫模块在订阅范围内的页面上提取视频元数据,或者直接收割网站按一定规范发布的视频信息,并将同属于一个视频的元数据按照数据结构整理为一条描述网络视频的信息,且该模块对每个页面的二级页面嵌套地抓取元数据。 [0035] The extraction module on said crawler subscription page in the video metadata, either directly harvested video information according to certain specifications published on the website, and belong to the same video metadata according to a data structure description information organized into a network video and the module nest crawling metadata of each page of the two.

[0036] 与现有技术相比,本发明的技术优势在于: [0036] Compared with the prior art, the technical advantages of the present invention:

[0037] 本发明提供一种终端主动聚合网络视频的方法,能够有效扩展网络视频来源,有效兼顾用户选择多样化自主化与有效去除冗余信息,避免选择迷航。 [0037] The present invention provides a method for a terminal polymerization active video network, the network can effectively expand the video source, the user selects both effective and efficient removal of diverse independent redundant information, avoid selecting Trek. 本方法基于终端对视频元数据的主动拉取,为用户带来便利高效体验的同时,充分利用终端丰富的用户信息来隐式地获取兴趣主题,避免了注册、评分等终端用户通常不愿意也不方便参与的环节。 This method of active terminal pull video metadata, convenient and efficient for the user experience, while based, full use of the rich end user access to information implicitly topics of interest to avoid registration, scoring and other end-users are often reluctant to be inconvenient to participate in the session. 总之,本发明改变了现有的聚合都是基于c/s模式的限制,由服务器端完成聚合后推送给终端,即本发明的技术方案由终端主动抓取,因此无需提交用户个人信息,也不必接受服务端强行推送的广告灯内容;此外,本发明将兴趣挖掘引入信息聚合的过滤过程中,结合聚合源的订制,有效提高了聚合结果的准确度,减少了冗余信息。 In summary, the present invention changes are based on a conventional polymerization c / s mode restrictions, after completion of the polymerization pushed by the server to the terminal, i.e. aspect of the present invention is active gripping by the terminal, so the user need not submit personal information, and without having to accept advertising lights forcibly pushed content server; Furthermore, the present invention will be introduced into the aggregated information of interest tap filtering process, the binding order syndicated feed, improve the accuracy of the result of polymerization, reducing redundant information.

附图说明 BRIEF DESCRIPTION

[0038] 图1本发明主要功能组成描述图; [0038] FIG main function of the present invention described in FIG composition;

[0039] 图2本发明提供的网络视频终端聚合方法流程示意图。 [0039] Network Video 2 of the present invention provides a terminal polymerisation process flow schematic.

具体实施方式 detailed description

[0040] 下面结合附图及具体实施例对本发明作进一步的描述。 [0040] Specific embodiments of the present invention will be further described below in conjunction with the accompanying drawings and.

[0041] 如图1所示,本文所述的方法,主要包含三个主要功能部分,即兴趣挖掘部分,信息聚合部分,过滤显示部分。 [0041] 1, according to the methods described herein mainly includes three main functional portions, i.e. digging portion interest, aggregators portion, the display portion was filtered. 兴趣挖掘部分通过对用户信息的挖掘输出多维兴趣主题,信息聚合部分从订阅网址获取视频元数据,并整理输出网络视频的多维描述。 Interest Mining part of a multi-dimensional topic of interest, information aggregation section for video metadata from the subscription website through mining output for user information, and organize multi-dimensional description of the network video output.

[0042] 如图2所示,信息聚合部分主要包括订阅模块和爬虫模块以及提供网络视频的互联网。 [0042] As shown in FIG 2, mainly includes information aggregation and a subscription module and crawler module provides Internet network video. 用户通过订阅模块在整个互联网范围内指定符合要求的网址,该网页及其引用的二级页面会被爬虫模块嵌套地遍历提取有用信息,即视频元数据,爬虫模块获取元数据方法包括直接从指定地址获取按照一定规范发布的网络视频元数据。 Subscription module is specified by the user over the entire range of the Internet to meet the requirements of the URL, the web page referenced by its two crawler module is nested traverse extract useful information, i.e., video metadata, the metadata acquiring module crawler comprising directly from Specifies the network address acquiring video metadata published according to certain specifications. 爬虫模块拿到元数据之后会重新按照系统规定的格式,将元数据整合成为规范的网络视频多维元数据描述,每一个描述代表一条网络视频。 Re crawler module system in accordance with a predetermined format, after the metadata integration metadata get the norm multidimensional network video metadata description, a description is representative of each of a video network.

[0043] 如图2所示,兴趣挖掘部分主要包括本地数据库模块、数据预处理模块和兴趣挖掘模块。 [0043] As shown in FIG 2, interest mining module mainly includes a local database, data mining module and interest pre-processing module. 本地数据库模块存储若干条视频多维元数据描述,所描述的视频包括本地存储的视频对象,以及视频播放记录等体现用户兴趣的视频对象。 Local database module stores a plurality of video multidimensional metadata description, the described video objects include video objects locally stored video, video playback and records reflect user interest. 考虑到用户并不一定对记录中每一条视频都有兴趣,例如观看时长过短的视频,我们认为是不能体现用户兴趣的视频记录, 数据预处理模块负责从数据库中剔除不体现用户兴趣的数据,留下体现用户兴趣的数据, 并提供给兴趣挖掘模块用于提取多维兴趣主题。 Taking into account the user does not have to record each video are interested in, such as long video viewing too short, we believe that user interest is not reflected in the video recording, data pre-processing module is responsible for not removing reflect user interest data from the database , leaving the data reflect the user's interest, and made available to interested mining module for extracting a multi-dimensional topic of interest.

[0044]兴趣挖掘模块按照以下步骤提取二维兴趣并逐渐形成更高维度的兴趣主题: [0044] Interest interest mining module extracts the two-dimensional and the following steps are formed gradually higher dimensional topic of interest:

[0045] a、一维兴趣提取:对每一个拟挖掘的维度采取独立的兴趣挖掘策略和标准,得到该维度上的若干兴趣主题,成为一个集合。 [0045] a, one-dimensional interest extract: take separate interest in each dimension of the proposed mining mining policy and standards, get a number of topics of interest on that dimension, as a collection. 例如,对于导演这个维度,只需要统计不同名字出现的频率,频率高于阈值的作为一个兴趣主题。 For example, the director for this dimension, just different names appear statistical frequency, the frequency is higher than the threshold value as a topic of interest. 阈值的挑选十分关键,阈值过低,提取的结果不能代表用户兴趣,阈值过高,则可能漏过一些兴趣。 Selected threshold is critical, the threshold is too low, the result can not be extracted on behalf of the user's interest, the threshold is too high, you may lose a few interest. 对于视频的实际地址这一元数据,则需要采用更为复杂的感兴趣路径挖掘方法,且在一个元数据是否符合某兴趣主题定义也会相应地变为,该元数据是否属于该主题的兴趣路径之下。 For the actual address of the video metadata, you need to use a more complex path of interest mining method, and a metadata meets the definition of a topic of interest will be changed accordingly, whether the metadata belongs to the subject of interest path under. 因此,在每个维度上既要采取独立的兴趣挖掘策略方法,也要定义不同的衡量兴趣标准。 Therefore, in each dimension it is necessary to adopt an independent policy interest mining method, but also define different standard measure of interest. 此外,考虑成本效益,不一定需要将所有维度都纳入兴趣挖掘的范围。 Also, consider the cost-effectiveness, do not necessarily need all the dimensions are included in the scope of interest in mining.

[0046] b、二维兴趣提取:在不同维度间,若两个兴趣主题同时出现在一条多维信息中,认为这两个兴趣主题有关联;同时出现越多,关联越大;把关联度大过阈值的链接在一起,成为一个二维兴趣主题。 [0046] b, two-dimensional interest extract: between different dimensions, if the two topics of interest occur simultaneously in a multi-dimensional information that these two topics of interest associated; at the same time appear more the greater the association; associate degree the threshold value of link together into a two-dimensional topic of interest. 同样的方法可以找出所有的二维兴趣主题。 The same method can find all of the two-dimensional topic of interest. 关联度的衡量可以采用同时出现的相对比例,也可以采用同时出现的绝对次数。 Can be used to measure the degree of association of the relative proportions occur simultaneously, the absolute number of simultaneous occurrence can also be used. 如表1所示,横轴表示维度A,纵轴表示维度B,各个维度有兴趣主题若干,各形成一个集合(al,a2,a3,a4,a5,a6,a7),(bI,b2, b3,b4,b5)。 As shown in Table 1, the horizontal axis represents the dimension A, dimension B represents the vertical axis, each dimension has a number of topics of interest, each forming a set (al, a2, a3, a4, a5, a6, a7), (bI, b2, b3, b4, b5). 假设此例采用同时出现在同一视频描述中的绝对次数作为关联度,则以上兴趣主题的关联度如矩阵中数值所示。 This embodiment is assumed that the absolute number of employed also appear in the same video as described in the correlation, the more interest as in a matrix of the values ​​associated with the theme of FIG. 假设确定两个不同维度上兴趣主题具有关联性的阈值为10,则可以确定该矩阵中可以提取二维兴趣主题(al,bl),(a2,b2),(a3,b3),(a3,b4),(a4, b2) , (a5,b5) , (a7,bl) 〇 Suppose interest topic is determined at two different dimensions have relevance threshold 10, it may be determined that a two-dimensional matrix may be extracted topic of interest (al, bl), (a2, b2), (a3, b3), (a3, b4), (a4, b2), (a5, b5), (a7, bl) square

[0047] 表I,不同维度兴趣主题关联度矩阵 [0047] Table I, different dimensions of topics of interest correlation matrix

[0048] [0048]

Figure CN104144181BD00071

[0049] ^c、多维兴趣提取:若某维度上的主题出现在两个多维(包含二维)主题中,检查是否这两个多维主题中每个一维主题间都存在超过阈值的关联程度,若是,则合并这两个多维主题,成为更高维度的兴趣主题。 [0049] ^ c, multidimensional interest extract: If the topic of a dimension appears in two multi-dimensional (includes two-dimensional) topic, check whether the two themes are multi-dimensional correlation exists between the degree exceeds the threshold value of each one-dimensional theme if yes, then merge the two multidimensional themes, topics of interest to become higher dimensions. 例如,存在维度C的兴趣主题集合(cl,c2,c3,c4),存在二维兴趣主题(al,c2),(al,c3),(bl,c2)。 For example, a collection of topics of interest exists dimension of C (cl, c2, c3, c4), there is a two-dimensional topic of interest (al, c2), (al, c3), (bl, c2). 那么可以得出三维兴趣主题(al,bl,c2),而因为al,bl,c2三个主题之间的关联度都超过了阈值,可以认定它们来自于用户的同一个兴趣的三个维度。 You can draw three-dimensional topic of interest (al, bl, c2), and because al, bl, c2 correlation between the three themes exceeds the threshold value, the same can be identified three dimensions of their interest from the user.

[0050] d、记录所有不能进一步合并的多维(包括二维)兴趣主题。 [0050] d, further consolidation can not record all multidimensional (including two-dimensional) topics of interest. 在这个过程中,记录下不能进一步合并的多维主题,并且不再记录已经被合并的多维主题,如前所述的A,B,C三维描述中,经过合并有三维主题(al,bl,c2),就不必再记录(al,bl),(al,c2),(bl,c2)。 In this process, the recording can not be further combined multidimensional theme, and no records have been merged multidimensional theme, as described above A, B, C in three-dimensional description, through the combined three-dimensional topic (al, bl, c2 ), do not have to record (al, bl), (al, c2), (bl, c2).

[0051] 如图2所示,过滤显示部分主要包括匹配过滤模块和UI显示模块。 [0051] As shown in FIG. 2, the filter includes a display portion and a matched filter module UI display module. 过滤匹配模块首先拿到爬虫模块的所有输出,即从网络提取的网络视频多维描述信息,然后根据每条描述跟多维兴趣的匹配程度,确定其是否符合用户兴趣,多大程度上符合用户兴趣。 First of all get a filtered output matching module crawler module, i.e., extracted from a multi-dimensional network video network description information, and according to the degree of each description matching with a multidimensional interest to determine whether it meets the user's interest, the extent to match the user interest. 过滤模块选取符合用户兴趣的视频信息,并排序,最终交给UI显示模块。 Select the video filter module information in line with the user's interests, and sorting, ultimately to the UI display module. UI显示模块负责把最终的网络视频信息呈现给读者,并提供响应用户点击操作所需的接口。 UI display module is responsible for the ultimate online video message presented to the reader, and to provide a response to the user clicks on the desired user interface.

[0052] 总之,本发明提供的方法充分利用终端优势,研究符合机顶盒终端场景的信息聚合服务,在通过制定聚合源来缩小聚合范围的基础上,利用本地播放记录和媒体数据库,挖掘用户兴趣,以便缩小聚合范围和数据规模,在提高聚合精准度的同时有效降低终端聚合的资源消耗。 [0052] In summary, the method of the present invention provides full use of the terminal edge, studies met top terminals scene information aggregation service, to narrow the polymerization range by setting a polymerization source, based on the use of local playing records and media database mining user interest, data in order to narrow the scope and scale of the polymerization, the polymerization effectively reduce the resource consumption in the terminal to improve the accuracy of the polymerization time. 即,本发明提供了一种基于终端聚合的解决方法,通过充分利用终端丰富的用户信息,隐式地获取用户兴趣,并根据兴趣,从用户自由定制的视频网站聚合符合兴趣的视频清单,并像本地存储视频一样显示给用户。 Specifically, the present invention provides a solution polymerization method based on terminal, the terminal rich user information, implicitly acquiring the user's interest by making full use, and based on their interests, from the user the freedom to customize video site video inventory in line with the interest of polymerization, and like a local video store as displayed to the user. 本方法允许客户自主定制视频来源,整个网络都可以成为潜在的视频源,有效扩展了信息来源,同时通过制定聚合源,以及基于兴趣的过滤,有效减少了冗余信息,最终将用户感兴趣的视频呈现在终端界面。 This approach allows customers to customize the independent video source, the entire network can be a potential source of video, effectively extends the sources of information, and through the development of a syndicated feed, as well as interest-based filtering, effectively reducing the redundant information, interested users will eventually video presentation at the terminal interface. 本发明无需用户注册或显式地提供兴趣,支持多兴趣主题,且有效屏蔽c/s模式中服务端强推的冗余信息。 The present invention does not require the user to provide explicit registration or interest, to support multiple topics of interest, and effectively shield c / s mode strong push server redundant information.

[0053]在本发明方案的基础上,做出的各种修改或变形仍属于本发明的保护范围。 [0053] On the basis of the embodiment of the present invention, various modifications or variations made to remain within the protection scope of the present invention.

Claims (9)

  1. 1. 一种网络视频终端聚合方法,所述方法包含: 步骤101)通过订阅指定网络视频聚合的源; 步骤102)利用爬虫从订阅网站提取网络视频元数据; 步骤103)将本地播放记录和本地视频信息存储在本地数据库; 步骤104)对本地数据库数据进行预处理,以适应兴趣挖掘需要,其中,所述预处理是对数据库中存储的视频信息逐条过滤,剔除无效信息记录,选取符合条件的数据用于兴趣挖掘; 步骤105)根据本地数据库,挖掘用户多维兴趣主题,所述本地数据库以一定的数据结构存储若干条视频元数据描述,这些视频对象包括本地存储的视频文件,以及用户播放记录中的视频; 步骤106)根据网络视频与用户兴趣的匹配程度对网络视频进行过滤排序,所述匹配过滤依次将每一条网络视频描述信息与兴趣主题匹配,过滤并保留匹配程度高于阈值的结果,并排序; 步骤107 A polymerization process a video terminal, said method comprising: step 101) via the network subscription specified source video polymerization; step 102) using a network crawler video metadata extraction from the subscriber sites; step 103) and records the local playback local video information is stored in a local database; step 104) the local database data preprocessed to accommodate needs interest mining, wherein the pretreatment is filtered by one of the video information stored in the database, excluding invalid recording information, select qualifying for data mining interest; step 105) according to the local database, a multidimensional mining user interest topic, said local database to several pieces of a certain video metadata describes data structure stored, the video object comprises a locally stored video files, playing records and user video; step 106) according to the degree of matching network users interested in video and video filters sorting network, each of said matched filter sequentially a video network description information matches the topic of interest, filter and retain the result of the matching degree higher than a threshold value and sorting; step 107 )显示经过滤、排序而得的网络视频列表。 ) Filtered display, network video obtained by sorting the list.
  2. 2. 根据权利要求1所述的网络视频终端聚合方法,其特征在于,所述网络视频元数据包括:视频名、视频源地址、年份、导演、演员或类型,将所有元数据形成网络视频的多维描述信息。 The network video terminal of the polymerization process of claim 1, wherein said video meta data network comprising: a video name, video source address, year, director, actors, or type, forming a network of all video metadata multi-dimensional description.
  3. 3. 根据权利要求1所述的网络视频终端聚合方法,其特征在于,所述指定视频聚合源的网站是一个或多个视频网站的首页网址。 The network video terminal of the polymerization process of claim 1, wherein the polymerization in the specified video source to one or more video site is the site home page URL.
  4. 4. 根据权利要求1所述的网络视频终端聚合方法,其特征在于,所述爬虫模块以订阅模块指定的一个或多个网页为初始页面,提取视频元数据,为每一个视频生成一条元数据描述,并嵌套地对其包含的二级页面逐一遍历,以获取符合条件的视频元数;同时,获取元数据的方式还可选地包括直接收割网站按一定规范发布的视频信息。 4. The network video terminal of the polymerization process of claim 1, wherein said one or more web crawler module subscription module specified for initial page, extracts video metadata, generate a metadata for each video description, and stepping through its nested page contains two to get the number of eligible video metadata; while acquiring the metadata further embodiment optionally includes video information directly harvest site publishes certain specifications.
  5. 5. 根据权利要求1所述的网络视频终端聚合方法,其特征在于,所述多维兴趣主题即为在两个或两个以上维度进行描述的兴趣主题,其基础为,每一条视频信息都由多个维度的描述信息组成。 The video network terminal according to claim 1 polymerization process, wherein the multi-dimensional topic of interest is the subject of interest will be described in two or more dimensions, based on, for each video information by composed of a plurality of dimensions description.
  6. 6. 根据权利要求1所述的网络视频终端聚合方法,其特征在于,所述多维兴趣主题的提取分为以下步骤: a、 一维兴趣提取:对每一个拟挖掘的维度采取独立的兴趣挖掘策略和标准,得到该维度上的若干兴趣主题,成为一个集合; b、 二维兴趣提取:在不同维度间,若两个兴趣主题同时出现在一条多维信息中,则这两个兴趣主题有关联;同时出现越多,关联越大;把关联度大过阈值的组合在一起,成为一个二维兴趣主题,采用同样的方法找出所有的二维兴趣主题; c、 多维兴趣提取:若某维度上的主题出现在两个多维主题中,检查是否这两个多维主题中每个一维主题间都存在超过阈值的关联程度,若是,则合并这两个多维主题,成为更高维度的兴趣主题; d、 记录所有不能进一步合并的多维兴趣主题。 The network video terminal of the polymerization process of claim 1, wherein said extraction multidimensional topics of interest into the following steps: a, the one-dimensional interest extractor: take independent mining interest of each dimension by excavation policies and standards to obtain a number of topics of interest on that dimension, as a collection; b, the two-dimensional interest extract: between different dimensions, if the two topics of interest occur simultaneously in a multi-dimensional information, the two topics of interest associated ; simultaneous more, the greater the relevance; the relevance greater than the threshold are combined into a two-dimensional topic of interest, using the same method to find all of the two-dimensional topics of interest; c, multi-dimensional interests extract: If a dimension on the topic appeared in two multi-dimensional topic, check the degree of correlation exists between exceeding the threshold of whether these two themes in each one-dimensional multi-dimensional theme, if so, to merge the two multi-dimensional theme, became interested in the topic of higher dimensions ; all can not be further consolidation of d, recording a multi-dimensional topic of interest.
  7. 7. —种网络视频终端聚合系统,其特征在于,所述系统包含: 订阅模块,用于指定网络视频聚合的源; 爬虫模块,用于从订阅模块获得的网络视频聚合源的网站提取网络视频元数据; 本地数据库模块,用于存储本地播放记录和本地视频信息; 预处理模块,用于对本地数据库数据进行预处理,以适应兴趣挖掘需要; 兴趣挖掘模块,用于根据本地数据库依据如下原则进行一维至多维的兴趣提取: 一维兴趣提取:对每一个拟挖掘的维度采取独立的兴趣挖掘策略和标准,得到该维度上的若干兴趣主题,成为一个集合; 二维兴趣提取:在不同维度间,若两个兴趣主题同时出现在一条多维信息中,则这两个兴趣主题有关联;同时出现越多,关联越大;把关联度大过阈值的链接在一起,成为一个二维兴趣主题,依据此策略找到所有二维兴趣主题; 多维兴趣提取: 7. - kind of the polymerization system of a video terminal, wherein, said system comprising: a subscription module configured to specify a network video sources polymerization; crawler module, for the polymerization site online video source obtained from the subscription module extracts video network metadata; local database module for storing local playback and recording local video information; pre-processing module, for performing pre-processing data to the local database, the need to accommodate interest mining; mining module of interest, for both the local database based on the following principles one-dimensional to multidimensional interest extract: extract a one-dimensional interests: take separate interest in each dimension of the proposed mining mining policy and standards, get a number of topics of interest on that dimension, as a collection; two-dimensional interest extraction: different between dimensions, if the two topics of interest occur simultaneously in a multi-dimensional information, the two topics of interest associated; at the same time appear more, the greater the relevance; the correlation greater than a threshold linked together into a two-dimensional interest theme, this policy is based on two-dimensional to find all topics of interest; multidimensional interest extract: 某维度上的主题出现在两个多维主题中,检查是否这两个多维主题中每个一维主题间都存在超过阈值的关联程度,若是,则合并这两个多维主题,成为更高维度的兴趣主题;记录所有不能进一步合并的多维兴趣主题,完成兴趣挖掘; 匹配过滤模块,用于根据网络视频与用户兴趣的匹配程度对网络视频进行过滤排序; 显示模块,用于显示经过滤、排序而得的网络视频列表。 On the topic appeared in two dimensions in a multidimensional topic, check the degree of correlation exists between exceeding the threshold of whether these two themes in each one-dimensional multi-dimensional theme, if so, to merge the two multi-dimensional theme, become higher dimensions topics of interest; all records can not be further combined multidimensional topics of interest, interest mining is completed; matched filtering module for filtering the degree of ordering of the video network and the video matching network user interest; a display module for displaying by filtration, sorting and You get a list of online video.
  8. 8. 根据权利要求7所述的网络视频终端聚合系统,其特征在于,所述订阅模块允许用户指定一个或多个视频网站网址作为视频信息聚合的源,被指定网址的页面及其引用的二级页面包含的视频都包含在后续聚合范围内,并且能够指定视频网站的首页。 8. A network according to claim 7, wherein said video terminal a polymerization system, wherein the subscription module allows the user to specify one or more video information aggregation site URL as the source video, the specified URL page and references two level page contains video are included within the scope of subsequent polymerization, and can specify the home video website.
  9. 9. 根据权利要求7所述的网络视频终端聚合系统,其特征在于,所述爬虫模块在订阅范围内的页面上提取视频元数据,或者直接收割网站按一定规范发布的视频信息,并将同属于一个视频的元数据按照数据结构整理为一条描述网络视频的信息,且该模块对每个页面的二级页面嵌套地抓取元数据。 9. A network according to claim 7, the video terminal the polymerization system, wherein said crawler module extracts video metadata in the subscription within the page, either directly harvested video information according to certain specifications published on the website, and with belonging to a video metadata according to a description of the data structure organized into video information network, and the module nest crawling metadata of each page of the two.
CN 201310166163 2013-05-08 2013-05-08 A network video system and polymerization process terminals CN104144181B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201310166163 CN104144181B (en) 2013-05-08 2013-05-08 A network video system and polymerization process terminals

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201310166163 CN104144181B (en) 2013-05-08 2013-05-08 A network video system and polymerization process terminals

Publications (2)

Publication Number Publication Date
CN104144181A true CN104144181A (en) 2014-11-12
CN104144181B true CN104144181B (en) 2017-12-29

Family

ID=51853249

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201310166163 CN104144181B (en) 2013-05-08 2013-05-08 A network video system and polymerization process terminals

Country Status (1)

Country Link
CN (1) CN104144181B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104834728B (en) * 2015-05-14 2018-03-09 无锡天脉聚源传媒科技有限公司 Method and device for pushing video subscriptions

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101174273B (en) * 2007-12-04 2010-06-23 清华大学 News event detecting method based on metadata analysis
US8145679B1 (en) * 2007-11-01 2012-03-27 Google Inc. Video-related recommendations using link structure
CN102800006A (en) * 2012-07-23 2012-11-28 姚明东 Real-time goods recommendation method based on customer shopping intention exploration
CN103051930A (en) * 2012-12-21 2013-04-17 福建邮科通信技术有限公司 Method and system for recommending mobile video based on flow analysis and user behavior analysis

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8145679B1 (en) * 2007-11-01 2012-03-27 Google Inc. Video-related recommendations using link structure
CN101174273B (en) * 2007-12-04 2010-06-23 清华大学 News event detecting method based on metadata analysis
CN102800006A (en) * 2012-07-23 2012-11-28 姚明东 Real-time goods recommendation method based on customer shopping intention exploration
CN103051930A (en) * 2012-12-21 2013-04-17 福建邮科通信技术有限公司 Method and system for recommending mobile video based on flow analysis and user behavior analysis

Also Published As

Publication number Publication date Type
CN104144181A (en) 2014-11-12 application

Similar Documents

Publication Publication Date Title
US20060265409A1 (en) Acquisition, management and synchronization of podcasts
US20080104521A1 (en) Methods and systems for providing a customizable guide for navigating a corpus of content
US8296797B2 (en) Intelligent video summaries in information access
US20090158146A1 (en) Resizing tag representations or tag group representations to control relative importance
US20120284290A1 (en) System and Method for Syndicating Dynamic Content for Online Publication
US20090089312A1 (en) System and method for inclusion of interactive elements on a search results page
Liu et al. Effective browsing of web image search results
US20100082653A1 (en) Event media search
US20080021710A1 (en) Method and apparatus for providing search capability and targeted advertising for audio, image, and video content over the internet
US20090132520A1 (en) Combination of collaborative filtering and cliprank for personalized media content recommendation
US20100057694A1 (en) Semantic metadata creation for videos
US20120054275A1 (en) Method of recommending content via social signals
US20080028294A1 (en) Method and system for managing and maintaining multimedia content
US8335763B2 (en) Concurrently presented data subfeeds
US20080060013A1 (en) Video channel creation systems and methods
US20110161174A1 (en) Method and apparatus for managing multimedia files
US8959037B2 (en) Signature based system and methods for generation of personalized multimedia channels
US20090043814A1 (en) Systems and methods for comments aggregation and carryover in word pages
US20130041893A1 (en) System for creating and method for providing a news feed website and application
CN101446959A (en) Internet-based news recommendation method and system thereof
Ramakrishnan et al. Toward a peopleweb
CN104008184A (en) Method and device for pushing information
US20160140146A1 (en) Systems and Methods of Building and Using an Image Catalog
CN102769781A (en) Method and device for recommending television program
CN102523511A (en) Network program aggregation and recommendation system and network program aggregation and recommendation method

Legal Events

Date Code Title Description
C06 Publication
C10 Entry into substantive examination
GR01