WO2019127655A1 - 基于用户id和视频拷贝的识别有害视频的方法及系统 - Google Patents

基于用户id和视频拷贝的识别有害视频的方法及系统 Download PDF

Info

Publication number
WO2019127655A1
WO2019127655A1 PCT/CN2018/072239 CN2018072239W WO2019127655A1 WO 2019127655 A1 WO2019127655 A1 WO 2019127655A1 CN 2018072239 W CN2018072239 W CN 2018072239W WO 2019127655 A1 WO2019127655 A1 WO 2019127655A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
weighting factor
url
database
harmful
Prior art date
Application number
PCT/CN2018/072239
Other languages
English (en)
French (fr)
Inventor
蔡昭权
胡松
胡辉
蔡映雪
陈伽
黄翰
梁椅辉
罗伟
黄思博
Original Assignee
惠州学院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 惠州学院 filed Critical 惠州学院
Publication of WO2019127655A1 publication Critical patent/WO2019127655A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]

Definitions

  • the present disclosure pertains to the field of information security, for example, to a method of identifying harmful video and a system therefor.
  • the current technology can be divided into two major categories, one is the traditional method, which includes two categories: (1) the identification method based on single-modal features. This type of method is mainly to extract the visual features of the video, and construct a classifier based on these features. For example, in violent video recognition, common features are video motion vectors, colors, textures, and shapes. (2) Recognition method based on multi-modal feature fusion. This method mainly extracts features of multiple modalities of video and fuses them to construct a classifier. For example, in violent video recognition, in addition to video features, many methods also extract audio features, including short-term energy, bursty sounds, and the like.
  • CNN uses the convolutional neural network to identify and deal with sensitive and harmful images in the database, obtain the internal features of the harmful sensitive video, and judge the obtained video frame by using the learned harmful video frame. Is there any harmful information?
  • RNN cyclic neural network directly input the video sequence in the database into the circular neural network to identify harmful video information, learn the framework of harmful video, and use the learned harmful video framework to judge whether the new video is harmful video.
  • CNN+RNN using CNN to learn the spatial domain information in the image frame in the video, use RNN to identify the time domain information in the video sequence, and finally combine the two to identify and judge, and use the learned framework to identify the video.
  • the existing image processing methods mainly include the following two methods: the traditional method and the deep learning method.
  • the classic method word package model consists of four parts: (1) the underlying feature extraction stage (2) feature coding (3) feature aggregation (4) classification using appropriate classifiers.
  • the deep learning model is another model of image processing, mainly including self-encoder, restricted Boltzmann machine, deep belief network, convolutional neural network, and cyclic neural network. With the continuous advancement of computer hardware and the improvement of the database, the traditional method is simpler than the deep learning. The deep learning method can learn more meaningful data and continuously adjust the parameters according to the task. In terms of image processing, the deep learning model has more powerful feature expression capabilities.
  • the present disclosure provides a method of identifying harmful videos, including:
  • Step a) when it is determined that the page element of the webpage includes the URL path of the video, identifying the user ID recorded in the page content of the webpage, querying whether the ID exists in the first database, and outputting according to the query result of the ID First weighting factor;
  • Second weighting factor Second weighting factor
  • Step d) integrating the first weighting factor and the second weighting factor and the third weighting factor to identify whether the video belongs to a harmful video.
  • the present disclosure also discloses a system for identifying harmful videos, including:
  • a first weighting factor generating module configured to: when determining that a page element of the webpage includes a URL path of the video, identify a user ID recorded in a page content of the webpage, and query whether the ID exists in the first database, and Outputting a first weighting factor according to the query result of the ID;
  • a second weighting factor generating module configured to: obtain a domain name included in the URL or an IP address pointed by the URL according to a URL path of the video, and perform a whois query in the second database according to the domain name included in the URL, And/or based on the IP address pointed by the URL, querying in the second database whether the IP address or the same network segment IP address included in the URL exists, and outputting according to the query result of the whois query result and/or the IP address.
  • a second weighting factor associated with the URL path of the video
  • a third weighting factor generating module configured to: acquire a video file with a minimum picture quality based on a URL path of the video and a minimum picture quality in an online play setting of the video, and utilize a content-based video copy detection technology, Performing video copy detection on the video file of the lowest picture quality in a preset harmful video database, and outputting a third weighting factor according to the monitored result;
  • an identification module configured to integrate the first weighting factor and the second weighting factor and the third weighting factor to identify whether the video belongs to a harmful video.
  • the present disclosure can provide a more efficient scheme for identifying harmful videos by combining the database created by big data with as few image processing methods as possible.
  • Figure 1 is a schematic illustration of the method of one embodiment of the present disclosure
  • FIG. 2 is a schematic diagram of a system in accordance with an embodiment of the present disclosure.
  • references to "an embodiment” herein mean that a particular feature, structure, or characteristic described in connection with the embodiments can be included in at least one embodiment of the present disclosure.
  • the appearances of the phrases in various places in the specification are not necessarily referring to the same embodiments, and are not exclusive or alternative embodiments that are mutually exclusive. Those skilled in the art will appreciate that the embodiments described herein can be combined with other embodiments.
  • FIG. 1 is a schematic flowchart of a method for identifying a harmful video according to an embodiment of the present disclosure. As shown, the method includes:
  • Step S100 When it is determined that the page element of the webpage includes the URL path of the video, identify the user ID recorded in the page content of the webpage, query whether the ID exists in the first database, and output the first result according to the ID query result. a weighting factor;
  • the first database maintains a list of known user IDs that have posted harmful videos.
  • the first weighting factor may be exemplarily 1.0
  • the first weighting factor can be exemplarily zero.
  • Step S200 Obtain a domain name included in the URL or an IP address pointed by the URL according to a URL path of the video, perform a whois query in the second database based on the domain name included in the URL, and/or based on the URL Pointing to the IP address, querying in the second database whether the IP address or the IP address of the same network segment exists in the URL, and outputting the URL path related to the video according to the query result of the whois query result and/or the IP address Second weighting factor;
  • the second database maintains a list of known domain names that have posted harmful videos, and/or a list of known IP addresses and IP address segments of websites that have posted harmful videos.
  • the Whois query is to investigate the association of domain name registrants with harmful videos.
  • the second database can maintain the following information: domain name, information on the Internet, a large number of pornographic videos, violent videos, reactionary videos, or cult videos, and the corresponding harmful video.
  • the second weighting factor may be exemplarily 1.0
  • the second database does not record the identifier of any harmful video of the above domain name www.a.com, but can query the domain name registrant of the domain name, and the domain name of other websites registered by the domain name registrant of the domain name, and the second database Including the logo of the other website publishing a large number of harmful videos on the Internet, even if the second database does not record the identifier of any harmful video of the above domain name www.a.com, the website corresponding to the domain name of www.a.com is still highly Suspected to be a source of harmful video, the second weighting factor can be exemplified as 0.9;
  • the second database does not record the identifier of any harmful video of the above domain name www.a.com, but can query the domain name registrant of the domain name, and the domain name of other websites registered by the domain name registrant of the domain name, the second database Does not include any identifier for the other website to publish harmful videos, the second weighting factor may be exemplarily 0;
  • the second weighting factor may also An example is 0.
  • the IP address pointed by the URL may be obtained according to the URL path of the video, and the IP address/IP address segment query is performed to output the second weighting factor.
  • IP address 192.168.10.3:
  • the second weighting factor may be exemplarily 1.0
  • IP address recorded in the second database is only 192.168.10.4, then 192.168.10.3 is moderately suspected as the alternate address of the website to which the video belongs or the newly replaced address, and the second weighting factor can be exemplified as 0.6;
  • the second weighting factor can be exemplified as 0.9;
  • IP address recorded in the database includes multiple 192.168.XX network segments and there is no 192.168.10.X network segment, then 192.168.10.3 is cautiously suspected to be the address of the harmful video website, and the second weighting factor can be exemplified. Is 0.4.
  • the above steps also have a situation in which the IP list and the domain name list are comprehensively considered, that is, the case where the second weighting factor is jointly determined by the IP query of the picture URL and the domain name whois query.
  • the IP query factor of the picture URL is i
  • the domain name whois query factor is j
  • the second weighting factor is y, where 0 ⁇ i ⁇ 1, 0 ⁇ j ⁇ 1, 0 ⁇ y ⁇ 1, and the second formula can be determined according to the following formula Weighting factor:
  • m and n are not equal, and may be adjusted according to the weight of each query factor and the actual situation of determining the second weighting factor.
  • the above formula for calculating y belongs to the linear formula, but in practical applications, a nonlinear formula may also be used.
  • Step S300 acquiring a video file with the lowest picture quality based on the URL path of the video and the lowest picture quality in the online play setting of the video, and using the content-based video copy detection technology in the preset harmful video database Performing video copy detection on the video file of the lowest picture quality, and outputting a third weighting factor according to the monitored result;
  • This step S300 is based on video copy detection of the content, and outputs a third weighting factor by the result of the monitoring.
  • the preset harmful video database includes such as pornographic image information, violent screen information, reactionary characters, cult identification or other unhealthy content, and the preset harmful video database can be established in combination with big data technology, and can It is constantly being updated. If the video file at the lowest picture quality is determined by the monitoring result as: a suspected copy version of a video in the preset harmful video database, the third weighting factor is reflected. It can be understood that when the corresponding threshold condition is met, the third weighting factor may be 1.0 or 0.8 or 0.4 depending on the specific threshold condition.
  • the video file at the lowest picture quality is obtained based on the URL path of the video and the lowest picture quality in the online play setting of the video. .
  • the inventors made full use of the video content corresponding to the lowest picture quality in today's video playback settings for efficient video copy detection.
  • this does not mean that the minimum picture or the low picture picture must be obtained by the play setting, because the video content corresponding to the low picture quality can also be obtained by various samples and the video copy detection is further implemented.
  • step S300 can perform video processing in combination with a traditional method, or can perform video processing in combination with a deep learning model to identify harmful videos.
  • Step S400 synthesizing the first weighting factor and the second weighting factor and the third weighting factor to identify whether the video belongs to a harmful video.
  • the first weighting factor is x
  • the second weighting factor is y
  • the third weighting factor is z, wherein 0 ⁇ x ⁇ 1, 0 ⁇ y ⁇ 1, 0 ⁇ z ⁇ 1, which can be integrated according to the following formula
  • the above weighting factor calculates the harmful coefficient of the video W:
  • a, b, and c are not equal, and may be adjusted according to each weighting factor and the actual situation of identifying harmful content.
  • the formula for calculating W above is a linear formula, but in practice, a nonlinear formula may also be used.
  • step S300 performs image processing, and the remaining steps are different ways, utilizing related queries and obtaining related weighting factors.
  • Step S400 combines (also referred to as fusion) multiple weighting factors to identify harmful videos.
  • processing and identifying each frame of video is very time consuming, while queries are relatively more time-saving.
  • the above embodiment proposes an efficient method of identifying harmful video.
  • the above-described embodiments are apparently capable of further integrating and updating the first database, the second database, and other databases in conjunction with big data and/or artificial intelligence.
  • the second database is a third party database.
  • the IP address information of the publisher of the harmful video recorded on the web address is collected and updated first. database. This is because harmful videos generally form sticky users. Some of these users will participate in the transmission of harmful videos and most of the IP addresses will be relatively fixed. If the relevant URL itself records the IP address information of the publisher of the harmful video, The present disclosure updates the aforementioned first database by collecting its IP address information.
  • step S200 further includes:
  • the security of the domain name is queried in a third-party domain name security list to output a security factor, and the second weighting factor related to the domain name is corrected by the security factor.
  • virustotal.com is a third-party domain name security screening website. It can be understood that if the third-party information believes that the relevant domain name contains a virus or a Trojan, the second weighting factor should be raised, which is rooted in the fact that the related website is more insecure.
  • the described embodiment focuses on correcting the second weighting factor from a network security perspective to prevent the user from suffering other losses. This is because cyber security is related to the privacy and property rights of users. If the websites related to harmful videos have network security risks, they will bring privacy leakage or property damage to users in addition to the harmful video.
  • the obtaining the video file at the lowest picture quality in step S300 comprises acquiring the video content at the end of the video file.
  • this embodiment it means that when the video content is acquired, in order to minimize the size of the acquired video content, the video content at the end of the video file is preferentially selected. This is because, for harmful videos, whether it is pornographic video, violent video, or reactionary video, the ending is often the climax of the plot, and the spreaders of these harmful videos, whether for the sake of good or political or cult motivation, It is generally impossible to delete the climax of the end of the film. That is to say, for the present embodiment, it greatly reduces the workload of video copy detection. It should be added that this embodiment is a preferred embodiment, and does not mean that the video content cannot select the corresponding content from the first 1/3 playing time period of the video, or select the corresponding content from the middle 1/3 playing time period.
  • the video content at the end of the video may be the content within the last 1/3 playback time interval of the video. More preferably, the video content at the end of the video may be the content within a few minutes of the end of the video, for example, 3 minutes, 5 minutes, 10 minutes; no matter how many minutes, if the last 1/3 playback time is smaller, then the natural preference is 1/ 3 The corresponding content in the playback time period.
  • step S300 in step S300
  • the video file when obtaining the lowest picture quality also includes the following:
  • Step c1) extracting audio in the video
  • the time is located, based on the start and end time of the audio. Video content during the start and end time. This can find relevant harmful images more specifically.
  • the present disclosure can effectively combine multiple dimensions and multiple modes, and combine IP information, domain name information, video information, and audio information to quickly identify harmful video.
  • the above embodiment may be implemented on the router side or the network provider side to filter related videos in advance.
  • a system for identifying harmful videos including:
  • a first weighting factor generating module configured to: when determining that a page element of the webpage includes a URL path of the video, identify a user ID recorded in a page content of the webpage, and query whether the ID exists in the first database, and Outputting a first weighting factor according to the query result of the ID;
  • a second weighting factor generating module configured to: obtain a domain name included in the URL or an IP address pointed by the URL according to a URL path of the video, and perform a whois query in the second database according to the domain name included in the URL, And/or based on the IP address pointed by the URL, querying in the second database whether the IP address or the same network segment IP address included in the URL exists, and outputting according to the query result of the whois query result and/or the IP address.
  • a second weighting factor associated with the URL path of the video
  • a third weighting factor generating module configured to: acquire a video file with a minimum picture quality based on a URL path of the video and a minimum picture quality in an online play setting of the video, and utilize a content-based video copy detection technology, Performing video copy detection on the video file of the lowest picture quality in a preset harmful video database, and outputting a third weighting factor according to the monitored result;
  • an identifying module configured to integrate the first weighting factor and the second weighting factor and the third weighting factor to identify whether the video belongs to a harmful video.
  • the second database is a third party database.
  • the second weighting factor generating module further includes:
  • a correction unit configured to: further query, in a third-party domain name security list, the security of the domain name to output a security factor, and modify the second weighting factor related to the domain name by using the security factor.
  • the video file when acquiring the lowest picture quality in the third weighting factor generating module includes acquiring video content at the end of the video file.
  • the third weighting factor generating module in the third weighting factor generating module further acquires the video file at the lowest picture quality by using the following unit:
  • An audio extraction unit for extracting audio in the video
  • the audio recognition unit is configured to identify whether the harmful content is included in the audio, and if yes, acquire the video content in the start and end time according to the start and end time of the audio.
  • the present disclosure in another embodiment, discloses a system for identifying harmful videos, including:
  • processors and memory having stored therein executable instructions, the processor executing the instructions to perform the following operations:
  • Step a) when it is determined that the page element of the webpage includes the URL path of the video, identifying the user ID recorded in the page content of the webpage, querying whether the ID exists in the first database, and outputting according to the query result of the ID First weighting factor;
  • Second weighting factor Second weighting factor
  • Step d) integrating the first weighting factor and the second weighting factor and the third weighting factor to identify whether the video belongs to a harmful video.
  • the present disclosure in another embodiment, also discloses a computer storage medium storing executable instructions for performing the following method of identifying harmful video:
  • Step a) when it is determined that the page element of the webpage includes the URL path of the video, identifying the user ID recorded in the page content of the webpage, querying whether the ID exists in the first database, and outputting according to the query result of the ID First weighting factor;
  • Second weighting factor Second weighting factor
  • Step d) integrating the first weighting factor and the second weighting factor and the third weighting factor to identify whether the video belongs to a harmful video.
  • the above system may comprise: at least one processor (eg CPU), at least one sensor (eg accelerometer, gyroscope, GPS module or other positioning module), at least one memory, at least one communication bus, wherein the communication bus To achieve connection communication between various components.
  • the device may further include at least one receiver, at least one transmitter, wherein the receiver and the transmitter may be wired transmission ports, or may be wireless devices (including, for example, including antenna devices) for signaling with other node devices. Or the transmission of data.
  • the memory may be a high speed RAM memory or a non-volatile memory such as at least one disk memory.
  • the memory may optionally be at least one storage device located remotely from the aforementioned processor.
  • a set of program code is stored in the memory, and the processor can call the code stored in the memory to perform related functions via the communication bus.
  • Embodiments of the present disclosure also provide a computer storage medium, wherein the computer storage medium can store a program that, when executed, includes some or all of the steps of any one of the methods of identifying a harmful video described in the above method embodiments.
  • Modules and units in the system of the embodiments of the present disclosure may be combined, divided, and deleted according to actual needs. It should be noted that, for the foregoing method embodiments, for the sake of simple description, they are all expressed as a series of action combinations, but those skilled in the art should understand that the present invention is not limited by the described action sequence. Because certain steps may be performed in other sequences or concurrently in accordance with the present invention. In the following, those skilled in the art should also understand that the embodiments described in the specification are all preferred embodiments, and the actions, modules, and units involved are not necessarily required by the present invention.
  • the disclosed system can be implemented in other ways.
  • the embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner for example, multiple units or components may be combined or integrated. Go to another system, or some features can be ignored or not executed.
  • the coupling or direct coupling or communication connection of the various units or components to each other may be an indirect coupling or communication connection through some interfaces, devices or units, and may be electrical or otherwise.
  • the units described as separate components may or may not be physically separate, may be located in one place, or may be distributed over multiple network elements. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present disclosure may be integrated into one processing unit, or each unit may exist separately, or two or more units may be integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
  • the integrated unit if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable storage medium.
  • the technical solution of the present disclosure may contribute to the prior art or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium.
  • a number of instructions are included to cause a computer device (which may be a smart phone, a personal digital assistant, a wearable device, a laptop, a tablet) to perform all or part of the steps of the methods described in various embodiments of the present disclosure.
  • the foregoing storage medium includes: a U disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk, and the like. .

Abstract

一种识别有害视频的方法及系统,其方法包括:当判断出网页的页面元素包括视频的URL路径时,识别所述网页的页面内容中记载的用户ID,依据视频的URL路径获取所述URL中包含的域名或所述URL指向的IP地址,并基于所述用户ID、IP地址和域名的相关查询输出第一权重因子、第二权重因子;并,获取最低画面质量时的视频文件,并在预设的有害视频数据库中对所述最低画面质量的视频文件进行视频拷贝检测,根据监测的结果输出第三权重因子;综合第一权重因子和第二权重因子以及第三权重因子,对所述视频是否属于有害视频进行识别。本公开能够结合大数据所打造的数据库,用尽量少的图像处理手段,利用多种模式提供一种识别有害视频的方案。

Description

基于用户ID和视频拷贝的识别有害视频的方法及系统 技术领域
本公开属于信息安全领域,例如涉及一种识别有害视频的方法及其系统。
背景技术
在信息社会,到处充斥信息流,包括但不限于文本、视频、音频、图片等。其中,视频文件往往包括听觉信息和视觉信息,表达能力更加全面。然而,随着移动互联网的普及,网络上充斥大量有害视频内容,例如涉及毒品、色情、暴力等非法内容的视频,或者诱导加入邪教、自杀群体、犯罪群体等的有害视频,由于视觉直观性、冲击性等特点,其危害性更加甚于有害文本、有害图片和有害音频等,因此对这些有害视频进行识别,进而进行过滤、删除、消除危害,是十分必要的。
对于网络有害视频的识别,现在的技术主要有可以分为两大类,一种是传统方法,其中又包括两类:(1)基于单模态特征的识别方法。这类方法主要是提取视频的视觉特征,根据这些特征来构造分类器。例如在暴力视频识别上,常见的特征有视频运动矢量、颜色、纹理以及形状等。(2)基于多模态特征融合的识别方法,这类方法主要是提取视频的多个模态的特征,将其融合以构造分类器。例如在暴力视频识别上,除了视频特征外,很多方法还提取音频特征,包括短时能量,突发声音等。有些方法还考虑了网络视频周围的文本,从这些文本中继续提取一些特征用于融合识别。另一种是深度学习的方法:(1)CNN利用卷积神经网络对资料库中的敏感有害图像进行识别处理,得到有害敏感视频的内部特征,利用学习到的有害视频框架判断得到的视频帧中是否有有害信息。(2)RNN循环神经网络,直接将资料库中的视频序列输入循环神经网络中识别有害视频信息, 学习到有害视频的框架,利用学习到的有害视频框架判断识别新的视频是否为有害视频。(3)CNN+RNN,利用CNN学习视频中图像帧中的空间域信息,利用RNN识别视频序列中的时间域信息,最后将两者结合进行识别判断,利用学习到的框架对视频进行识别。
现有的图像处理手段主要有下面两种方法:传统方法和深度学习方法。其中传统方法中经典的方法词包模型,该模型由四个部分组成:(1)底层的特征提取阶段(2)特征编码(3)特征汇聚(4)使用合适的分类器进行分类。深度学习模型是另一种图像处理的模型,主要有自编码器,受限波尔兹曼机,深度信念网络,卷积神经网络,循环神经网络等。随着计算机硬件的不断进步,数据库的完善,使用传统的方法运算过程相比于深度学习来说较为简单,深度学习方法能够学习到更有意义的数据,并根据任务不断进行参数调整,所以对于图像处理方面,深度学习模型有更强大的特征表达能力。
现有的识别方法在在识别效率上都有所不足,在大数据和人工智能发展的情形下,如何高效的识别有害视频,就成为一个需要考虑的问题。
发明内容
本公开提供了一种识别有害视频的方法,包括:
步骤a),当判断出网页的页面元素包括视频的URL路径时,识别所述网页的页面内容中记载的用户ID,在第一数据库中查询是否存在所述ID,并根据ID的查询结果输出第一权重因子;
步骤b),依据视频的URL路径获取所述URL中包含的域名或所述URL指向的IP地址,基于所述URL中包含的域名,在第二数据库中进行whois查询,和/或基于所述URL指向的IP地址,在第二数据库中查询是否存在所述URL中包含 的IP地址或同一网段IP地址,并根据whois查询结果和/或IP地址的查询结果,输出与视频的URL路径相关的第二权重因子;
步骤c),基于所述视频的URL路径和所述视频的在线播放设置中的最低画面质量,获取最低画面质量时的视频文件,并利用基于内容的视频拷贝检测技术,在预设的有害视频数据库中对所述最低画面质量的视频文件进行视频拷贝检测,并根据监测的结果输出第三权重因子;
步骤d),综合第一权重因子和第二权重因子以及第三权重因子,对所述视频是否属于有害视频进行识别。
此外,本公开还揭示了一种识别有害视频的系统,包括:
第一权重因子生成模块,用于:当判断出网页的页面元素包括视频的URL路径时,识别所述网页的页面内容中记载的用户ID,在第一数据库中查询是否存在所述ID,并根据ID的查询结果输出第一权重因子;
第二权重因子生成模块,用于:依据视频的URL路径获取所述URL中包含的域名或所述URL指向的IP地址,基于所述URL中包含的域名,在第二数据库中进行whois查询,和/或基于所述URL指向的IP地址,在第二数据库中查询是否存在所述URL中包含的IP地址或同一网段IP地址,并根据whois查询结果和/或IP地址的查询结果,输出与视频的URL路径相关的第二权重因子;
第三权重因子生成模块,用于:基于所述视频的URL路径和所述视频的在线播放设置中的最低画面质量,获取最低画面质量时的视频文件,并利用基于内容的视频拷贝检测技术,在预设的有害视频数据库中对所述最低画面质量的视频文件进行视频拷贝检测,并根据监测的结果输出第三权重因子;
识别模块,用于综合第一权重因子和第二权重因子以及第三权重因子,对所 述视频是否属于有害视频进行识别。
通过所述方法及其系统,本公开能够结合大数据所打造的数据库,用尽量少的图像处理手段,提供一种较为高效的识别有害视频的方案。
附图说明
图1是本公开中一个实施例所述方法的示意图;
图2是本公开中一个实施例所述系统的示意图。
具体实施方式
为了使本领域技术人员理解本公开所披露的技术方案,下面将结合实施例及有关附图,对各个实施例的技术方案进行描述,所描述的实施例是本公开的一部分实施例,而不是全部的实施例。本公开所采用的术语“第一”、“第二”等是用于区别不同对象,而不是用于描述特定顺序。此外,“包括”和“具有”以及它们的任何变形,意图在于覆盖且不排他的包含。例如包含了一系列步骤或单元的过程、或方法、或系统、或产品或设备没有限定于已列出的步骤或单元,而是可选的还包括没有列出的步骤或单元,或可选的还包括对于这些过程、方法、系统、产品或设备固有的其他步骤或单元。
在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本公开的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其他实施例互斥的独立的或备选的实施例。本领域技术人员可以理解的是,本文所描述的实施例可以与其他实施例相结合。
参见图1,图1是本公开中一个实施例提供的一种识别有害视频的方法的流程示意图。如图所示,所述方法包括:
步骤S100,当判断出网页的页面元素包括视频的URL路径时,识别所述网页的页面内容中记载的用户ID,在第一数据库中查询是否存在所述ID,并根据ID的查询结果输出第一权重因子;
能够理解,第一数据库维护已知的、发布过有害视频的用户ID清单。
这是因为,有害图片一般会形成一些粘性用户,这些用户有一部分会参与传播有害图片且大部分的ID是相对固定,甚至相当部分用户的ID在不同的网站或论坛都是相同的ID。
例如,识别到的用户ID叫“tudou”的情形下:
如果第一数据库中记载有名为“tudou”的用户ID,那么第一权重因子可以示例性为1.0;
如果数据库中记载的ID有“tudou1”、“tudou2”、“tudou*”、或者近似的ID,那么“tudou”则被轻度怀疑为相同用户的备用ID,第一权重因子可以示例性为0.3;
如果数据库中记载ID没有“tudou”或相近似的ID,那么第一权重因子可以示例性为0。
步骤S200,依据视频的URL路径获取所述URL中包含的域名或所述URL指向的IP地址,基于所述URL中包含的域名,在第二数据库中进行whois查询,和/或基于所述URL指向的IP地址,在第二数据库中查询是否存在所述URL中包含的IP地址或同一网段IP地址,并根据whois查询结果和/或IP地址的查询结果,输出与视频的URL路径相关的第二权重因子;
能够理解,第二数据库维护已知的、发布过有害视频的域名清单,和/或已知的、发布过有害视频的网站的IP地址、IP地址段清单。
Whois查询是为了考察域名注册人与有害视频的关联情况。第二数据库可以维护如下信息:域名、互联网上大量发布色情视频、暴力视频、反动视频、或邪教视频等的域名注册人的信息以及对应的有害视频的标识。
例如,域名是www.a.com的情形下:
如果第二数据库中记载有该域名地址、相应有害视频的标识及其whois信息,那么第二权重因子可以示例性为1.0;
如果第二数据库中没有记载上述域名www.a.com的任何有害视频的标识,但是能够查询到该域名的域名注册人,以及该域名的域名注册人注册的其他网站的域名,且第二数据库包括所述其他网站在互联网上大量发布有害视频的标识,那么即使第二数据库中没有记载上述域名www.a.com的任何有害视频的标识,www.a.com该域名对应的网站依然被高度怀疑为有害视频的来源,所述第二权重因子可以示例性为0.9;
如果第二数据库中没有记载上述域名www.a.com的任何有害视频的标识,但是能够查询到该域名的域名注册人,以及该域名的域名注册人注册的其他网站的域名,然而第二数据库并不包括任何关于所述其他网站发布有害视频的标识,所述第二权重因子可以示例性为0;
容易理解,如果第二数据库中没有记载上述域名www.a.com的任何有害视频的标识,也查询不到该域名的域名注册人注册的其他网站的域名,那么所述第二权重因子也可以示例性为0。
示例性的,还可以依据视频的URL路径获取所述URL指向的IP地址,进行IP地址/IP地址段查询,来输出第二权重因子,
例如,IP地址是192.168.10.3的情形下:
如果第二数据库中记载有该IP地址,那么第二权重因子可以示例性为1.0;
如果第二数据库中记载的IP地址只有192.168.10.4,那么192.168.10.3则被中度怀疑为该视频所属网站的备用地址或者新近更换的地址,第二权重因子可以示例性为0.6;
如果第二数据库中记载的IP地址有192.168.10.4以及192.168.10.5,甚至记载了192.168.10.X网段的所有IP地址,那么192.168.10.3则被高度怀疑为该视频所属网站的备用地址或者新近更换的地址,第二权重因子可以示例性为0.9;
如果数据库中记载的IP地址中包括多个192.168.X.X网段,而没有192.168.10.X网段,那么192.168.10.3则被谨慎怀疑为有害视频属网站的地址,第二权重因子可以示例性为0.4。
特别的,上述步骤还存在综合考虑IP清单和域名清单的情形,即通过图片URL的IP查询和域名whois查询来共同确定第二权重因子的情形。
假设图片URL的IP查询因子为i,域名whois查询因子为j,第二权重因子为y,其中0≤i≤1,0≤j≤1,0≤y≤1,可以根据如下公式确定第二权重因子:
y=m×i+n×j,其中,m+n=1,m、n则分别表示IP查询因子和域名whois查询因子的权重。
例如,m=n=1/2;
更例如,m、n不相等,具体可以根据各个查询因子的权重以及确定第二权重因子的实际情况而调整。
能够理解,y越接近1,第二权重因子就越重,相关图片属于有害图片的几率越大。
以上计算y的公式属于线性公式,然而实际应用时,也可能采用非线性公式。
进一步的,无论是线性公式还是非线性公式,均可以考虑通过训练或拟合来确定相关公式及其参数。
步骤S300,基于所述视频的URL路径和所述视频的在线播放设置中的最低画面质量,获取最低画面质量时的视频文件,并利用基于内容的视频拷贝检测技术,在预设的有害视频数据库中对所述最低画面质量的视频文件进行视频拷贝检测,并根据监测的结果输出第三权重因子;
该步骤S300是基于内容的视频拷贝检测,并通过监测的结果来输出第三权重因子。能够理解,预设的有害视频数据库包括了诸如色情画面信息、暴力画面信息、反动人物、邪教标识或其他不健康内容等,并且所述预设的有害视频数据库可以结合大数据技术来建立,且可以被不断更新。如果所述最低画面质量时的视频文件被监测结果认定为:所述预设的有害视频数据库中某视频的疑似拷贝版本,则第三权重因子会有所体现。能够理解,满足相应的阈值条件时,第三权重因子可能是1.0,也可能是0.8或0.4,视具体阈值条件而定。
另外,需要强调的是,为了降低本实施例所需的计算资源和时间成本,基于所述视频的URL路径和所述视频的在线播放设置中的最低画面质量,获取最低画面质量时的视频文件。显然,发明人充分利用了当今视频播放设置中的最低画面质量所对应的视频内容来进行高效地视频拷贝检测。但是,这不意味着必须通过播放设置来获取最低画面或低画质画面,因为还可以通过各种采样来获得低画质所对应的视频内容并进一步实施视频拷贝检测。
能够理解,所述步骤S300,既可以结合传统的方法进行视频处理,也可以结合深度学习模型进行视频处理,进而对有害视频进行识别。
步骤S400,综合第一权重因子和第二权重因子以及第三权重因子,对所述视频是否属于有害视频进行识别。
示例性的,设第一权重因子为x,第二权重因子为y,第三权重因子为z,其中0≤x≤1,0≤y≤1,0≤z≤1,可以根据如下公式综合上述权重因子计算视频的有害系数W:
W=a×x+b×y+c×z,其中,a+b+c=1,a、b、c则分别表示各个权重因子的权重。
例如,a=b=c=1/3;
更例如,a、b、c不相等,具体可以根据各个权重因子以及识别有害内容的实际情况而调整。
能够理解,W越接近1,相关视频属于有害视频的几率越大。
以上计算W的公式属于线性公式,然而实际应用时,也可能采用非线性公式。
进一步的,无论是线性公式还是非线性公式,均可以考虑通过训练或拟合来确定相关公式及其参数。
综上,对于上述实施例,仅仅步骤S300进行了图像处理,而其余步骤则是另辟蹊径,利用了相关查询、获得相关的权重因子。步骤S400则综合(也可称为融合)多个权重因子进行有害视频的识别。本领域技术人员均知晓,针对视频的每一帧图像进行处理、识别是非常消耗时间成本的,而查询则相对而言更加节省时间成本。显而易见,上述实施例提出了一种富有效率的识别有害视频的方法。另外,上述实施例显然能够进一步结合大数据和/或人工智能来建立、更新所述第一数据库、第二数据库以及其他数据库。
在另一个实施例中,所述第二数据库为第三方数据库。
例如,进行whois查询的众多网站、以及第三方维护的色情网站列表、暴力网站列表、反动网站列表、邪教网站列表方面的数据库、或者记录了有害图片的网站的IP地址、IP地址段列表方面的数据库。
在另一个实施例中,对于识别后确定为有害视频的,针对其来源的网址(例如论坛或网页),收集所述网址上记载的所述有害视频的发表者的IP地址信息并更新第一数据库。这是因为,有害视频一般会形成一些粘性用户,这些用户有一部分会参与传播有害视频且大部分的IP地址会相对固定,如果相关网址自身记载了所述有害视频的发表者的IP地址信息,本公开则通过收集其IP地址信息来更新前述第一数据库。
在另一个实施例中,步骤S200还包括:
进一步的,在第三方域名安全列表中查询所述域名的安全性以便输出安全因子,并通过所述安全因子对所述与域名相关的第二权重因子进行修正。
例如virustotal.com这一第三方域名安全筛查网站。能够理解,如果第三方信息中认为相关域名包含病毒或木马,则应当提高第二权重因子,根源在于相关网站更加不安全。
能够理解,所述实施例是侧重于从网络安全角度修正第二权重因子,防止用户遭受其他损失。这是因为,网络安全事关用户的隐私和财产权,如果有害视频的相关网站存在网络安全隐患,那么除了有害视频的危害之外还对用户带来隐私泄露或财产损失的危害。
在另一个实施例中,步骤S300中的获取最低画面质量时的视频文件,包括获取视频文件片尾的视频内容。
对该实施例而言,其意味着获取视频内容时,为了尽量减少获取的视频内 容的大小,优先选择视频文件片尾的视频内容。这是因为,对于有害视频而言,不论是色情视频、暴力视频、还是反动视频,其片尾往往是情节的高潮部分,而这些有害视频的传播者,无论是出于癖好还是政治或邪教动机,一般都不可能删除片尾的高潮部分。也就是说,对于本实施例而言,其大大减少了视频拷贝检测的工作量。需要补充的是,该实施例是较佳实施例,并不意味着视频内容不能从视频的前面1/3播放时间段选取相应内容,或者从中间1/3播放时间段选取相应内容。
较佳的,片尾的视频内容可以是视频的末尾1/3播放时间间隔内的内容。更佳的,片尾的视频内容可以是视频的末尾几分钟内的内容,例如3分钟、5分钟、10分钟;不论几分钟,如果末尾1/3播放时间长度更小,那么自然优选末尾1/3播放时间段内的相应内容。
在另一个实施例中,步骤S300中的
获取最低画面质量时的视频文件,还包括如下:
步骤c1):提取视频中的音频;
步骤c2):识别音频中是否包括有害内容,如果有,则根据音频的起止时间获取所述起止时间内的视频内容。
对于该实施例而言,如果识别到音频中包括色情内容、暴力内容、反动政治言论、邪教煽动性言论、或恐怖仇视方面的极端言论,则定位其时间,从音频的起止时间为依据,获取起止时间内的视频内容。这样能够更加针对性的找到相关有害的画面。
如前文所述,如果结合大数据技术,本公开能够富有成效的结合多个维度、多种模式,结合IP信息、域名信息、视频信息、音频信息来快速的识别有害视 频。
更进一步的,上述实施例可以在路由器一侧、或者网络提供商一侧实施,提前过滤相关视频。
与方法相对应的,参见图2,本公开在另一个实施例中揭示了一种识别有害视频的系统,包括:
第一权重因子生成模块,用于:当判断出网页的页面元素包括视频的URL路径时,识别所述网页的页面内容中记载的用户ID,在第一数据库中查询是否存在所述ID,并根据ID的查询结果输出第一权重因子;
第二权重因子生成模块,用于:依据视频的URL路径获取所述URL中包含的域名或所述URL指向的IP地址,基于所述URL中包含的域名,在第二数据库中进行whois查询,和/或基于所述URL指向的IP地址,在第二数据库中查询是否存在所述URL中包含的IP地址或同一网段IP地址,并根据whois查询结果和/或IP地址的查询结果,输出与视频的URL路径相关的第二权重因子;
第三权重因子生成模块,用于:基于所述视频的URL路径和所述视频的在线播放设置中的最低画面质量,获取最低画面质量时的视频文件,并利用基于内容的视频拷贝检测技术,在预设的有害视频数据库中对所述最低画面质量的视频文件进行视频拷贝检测,并根据监测的结果输出第三权重因子;
识别模块,用于综合第一权重因子和第二权重因子以及第三权重因子,对所述视频是否属于有害视频进行识别。
与前文各个方法的实施例所类似的,
优选的,所述第二数据库为第三方数据库。
更优选的,第二权重因子生成模块还包括:
修正单元,用于:进一步的,在第三方域名安全列表中查询所述域名的安全性以便输出安全因子,并通过所述安全因子对所述与域名相关的第二权重因子进行修正。
更优选的,所述第三权重因子生成模块中的获取最低画面质量时的视频文件,包括获取视频文件片尾的视频内容。
更优选的,所述第三权重因子生成模块中所述第三权重因子生成模块中还通过如下单元获取最低画面质量时的视频文件:
音频提取单元,用于提取视频中的音频;
音频识别单元,用于识别音频中是否包括有害内容,如果有,则根据音频的起止时间获取所述起止时间内的视频内容。
本公开在另一个实施例中揭示了一种识别有害视频的系统,包括:
处理器及存储器,所述存储器中存储有可执行指令,所述处理器执行这些指令以执行以下操作:
步骤a),当判断出网页的页面元素包括视频的URL路径时,识别所述网页的页面内容中记载的用户ID,在第一数据库中查询是否存在所述ID,并根据ID的查询结果输出第一权重因子;
步骤b),依据视频的URL路径获取所述URL中包含的域名或所述URL指向的IP地址,基于所述URL中包含的域名,在第二数据库中进行whois查询,和/或基于所述URL指向的IP地址,在第二数据库中查询是否存在所述URL中包含的IP地址或同一网段IP地址,并根据whois查询结果和/或IP地址的查询结果,输出与视频的URL路径相关的第二权重因子;
步骤c),基于所述视频的URL路径和所述视频的在线播放设置中的最低画 面质量,获取最低画面质量时的视频文件,并利用基于内容的视频拷贝检测技术,在预设的有害视频数据库中对所述最低画面质量的视频文件进行视频拷贝检测,并根据监测的结果输出第三权重因子;
步骤d),综合第一权重因子和第二权重因子以及第三权重因子,对所述视频是否属于有害视频进行识别。
本公开在另一个实施例中还揭示了一种计算机存储介质,存储有可执行指令,所述指令用于执行如下识别有害视频的方法:
步骤a),当判断出网页的页面元素包括视频的URL路径时,识别所述网页的页面内容中记载的用户ID,在第一数据库中查询是否存在所述ID,并根据ID的查询结果输出第一权重因子;
步骤b),依据视频的URL路径获取所述URL中包含的域名或所述URL指向的IP地址,基于所述URL中包含的域名,在第二数据库中进行whois查询,和/或基于所述URL指向的IP地址,在第二数据库中查询是否存在所述URL中包含的IP地址或同一网段IP地址,并根据whois查询结果和/或IP地址的查询结果,输出与视频的URL路径相关的第二权重因子;
步骤c),基于所述视频的URL路径和所述视频的在线播放设置中的最低画面质量,获取最低画面质量时的视频文件,并利用基于内容的视频拷贝检测技术,在预设的有害视频数据库中对所述最低画面质量的视频文件进行视频拷贝检测,并根据监测的结果输出第三权重因子;
步骤d),综合第一权重因子和第二权重因子以及第三权重因子,对所述视频是否属于有害视频进行识别。
对于上述系统,其可以包括:至少一个处理器(例如CPU),至少一个传感 器(例如加速度计、陀螺仪、GPS模块或其他定位模块),至少一个存储器,至少一个通信总线,其中,通信总线用于实现各个组件之间的连接通信。所述设备还可以包括至少一个接收器,至少一个发送器,其中,接收器和发送器可以是有线发送端口,也可以是无线设备(例如包括天线装置),用于与其他节点设备进行信令或数据的传输。所述存储器可以是高速RAM存储器,也可以是非不稳定的存储器(Non-volatile memory),例如至少一个磁盘存储器。存储器可选的可以是至少一个位于远离前述处理器的存储装置。存储器中存储一组程序代码,且所述处理器可通过通信总线,调用存储器中存储的代码以执行相关的功能。
本公开的实施例还提供一种计算机存储介质,其中,该计算机存储介质可存储程序,该程序执行时包括上述方法实施例中记载的任何一种识别有害视频的方法的部分或全部步骤。
本公开的实施例方法中的步骤可以根据实际需要进行顺序调整、合并和删减。
本公开的实施例系统中的模块和单元可以根据实际需要进行合并、划分和删减。需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本发明并不受所描述的动作顺序的限制,因为依据本发明,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作、模块、单元并不一定是本发明所必须的。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。
在本公开所提供的几个实施例中,应该理解到,所揭露的系统,可通过其它 的方式实现。例如,以上所描述的实施例仅是示意性的,例如所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,各单元或组件相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,既可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,本公开的各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本公开的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可为智能手机、个人数字助理、可穿戴设备、笔记本电脑、平板电脑)执行本公开的各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,以上实施例仅用以说明本公开的技术方案,而非对其限制;尽管参照前述实施例对本公开进行了详细的说明,本领域技术人员应当理解:其依然 可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本公开的各实施例技术方案的范围。

Claims (12)

  1. 一种识别有害视频的方法,包括:
    步骤a),当判断出网页的页面元素包括视频的URL路径时,识别所述网页的页面内容中记载的用户ID,在第一数据库中查询是否存在所述ID,并根据ID的查询结果输出第一权重因子;
    步骤b),依据视频的URL路径获取所述URL中包含的域名或所述URL指向的IP地址,基于所述URL中包含的域名,在第二数据库中进行whois查询,和/或基于所述URL指向的IP地址,在第二数据库中查询是否存在所述URL中包含的IP地址或同一网段IP地址,并根据whois查询结果和/或IP地址的查询结果,输出与视频的URL路径相关的第二权重因子;
    步骤c),基于所述视频的URL路径和所述视频的在线播放设置中的最低画面质量,获取最低画面质量时的视频文件,并利用基于内容的视频拷贝检测技术,在预设的有害视频数据库中对所述最低画面质量的视频文件进行视频拷贝检测,并根据监测的结果输出第三权重因子;
    步骤d),综合第一权重因子和第二权重因子以及第三权重因子,对所述视频是否属于有害视频进行识别。
  2. 根据权利要求1所述的方法,其中,所述第二数据库为第三方数据库。
  3. 根据权利要求1所述的方法,其中,步骤b)还包括:
    进一步的,在第三方域名安全列表中查询所述域名的安全性以便输出安全因子,并通过所述安全因子对所述第二权重因子进行修正。
  4. 根据权利要求1所述的方法,其中,步骤c)中的获取最低画面质量时的视频文件,包括获取视频文件片尾的视频内容。
  5. 根据权利要求1所述的方法,其中,步骤c)中的获取最低画面质量时 的视频文件,还包括如下:
    步骤c1):提取视频中的音频;
    步骤c2):识别音频中是否包括有害内容,如果有,则根据音频的起止时间获取所述起止时间内的视频内容。
  6. 一种识别有害视频的系统,包括:
    第一权重因子生成模块,用于:当判断出网页的页面元素包括视频的URL路径时,识别所述网页的页面内容中记载的用户ID,在第一数据库中查询是否存在所述ID,并根据ID的查询结果输出第一权重因子;
    第二权重因子生成模块,用于:依据视频的URL路径获取所述URL中包含的域名或所述URL指向的IP地址,基于所述URL中包含的域名,在第二数据库中进行whois查询,和/或基于所述URL指向的IP地址,在第二数据库中查询是否存在所述URL中包含的IP地址或同一网段IP地址,并根据whois查询结果和/或IP地址的查询结果,输出与视频的URL路径相关的第二权重因子;
    第三权重因子生成模块,用于:基于所述视频的URL路径和所述视频的在线播放设置中的最低画面质量,获取最低画面质量时的视频文件,并利用基于内容的视频拷贝检测技术,在预设的有害视频数据库中对所述最低画面质量的视频文件进行视频拷贝检测,并根据监测的结果输出第三权重因子;
    识别模块,用于综合第一权重因子和第二权重因子以及第三权重因子,对所述视频是否属于有害视频进行识别。
  7. 根据权利要求6所述的系统,其中,优选的,所述第二数据库为第三方数据库。
  8. 根据权利要求6所述的系统,其中,第二权重因子生成模块还包括:
    修正单元,用于:进一步的,在第三方域名安全列表中查询所述域名的安全性以便输出安全因子,并通过所述安全因子对所述第二权重因子进行修正。
  9. 根据权利要求6所述的系统,其中,所述第三权重因子生成模块中的获取最低画面质量时的视频文件,包括获取视频文件片尾的视频内容。
  10. 根据权利要求6所述的系统,其中,所述第三权重因子生成模块中还通过如下单元获取最低画面质量时的视频文件:
    音频提取单元,用于提取视频中的音频;
    音频识别单元,用于识别音频中是否包括有害内容,如果有,则根据音频的起止时间获取所述起止时间内的视频内容。
  11. 一种识别有害视频的系统,包括:
    处理器及存储器,所述存储器中存储有可执行指令,所述处理器执行这些指令以执行以下操作:
    步骤a),当判断出网页的页面元素包括视频的URL路径时,识别所述网页的页面内容中记载的用户ID,在第一数据库中查询是否存在所述ID,并根据ID的查询结果输出第一权重因子;
    步骤b),依据视频的URL路径获取所述URL中包含的域名或所述URL指向的IP地址,基于所述URL中包含的域名,在第二数据库中进行whois查询,和/或基于所述URL指向的IP地址,在第二数据库中查询是否存在所述URL中包含的IP地址或同一网段IP地址,并根据whois查询结果和/或IP地址的查询结果,输出与视频的URL路径相关的第二权重因子;
    步骤c),基于所述视频的URL路径和所述视频的在线播放设置中的最低画面质量,获取最低画面质量时的视频文件,并利用基于内容的视频拷贝检测技术, 在预设的有害视频数据库中对所述最低画面质量的视频文件进行视频拷贝检测,并根据监测的结果输出第三权重因子;
    步骤d),综合第一权重因子和第二权重因子以及第三权重因子,对所述视频是否属于有害视频进行识别。
  12. 一种计算机存储介质,存储有可执行指令,所述指令用于执行如下识别有害视频的方法:
    步骤a),当判断出网页的页面元素包括视频的URL路径时,识别所述网页的页面内容中记载的用户ID,在第一数据库中查询是否存在所述ID,并根据ID的查询结果输出第一权重因子;
    步骤b),依据视频的URL路径获取所述URL中包含的域名或所述URL指向的IP地址,基于所述URL中包含的域名,在第二数据库中进行whois查询,和/或基于所述URL指向的IP地址,在第二数据库中查询是否存在所述URL中包含的IP地址或同一网段IP地址,并根据whois查询结果和/或IP地址的查询结果,输出与视频的URL路径相关的第二权重因子;
    步骤c),基于所述视频的URL路径和所述视频的在线播放设置中的最低画面质量,获取最低画面质量时的视频文件,并利用基于内容的视频拷贝检测技术,在预设的有害视频数据库中对所述最低画面质量的视频文件进行视频拷贝检测,并根据监测的结果输出第三权重因子;
    步骤d),综合第一权重因子和第二权重因子以及第三权重因子,对所述视频是否属于有害视频进行识别。
PCT/CN2018/072239 2017-12-30 2018-01-11 基于用户id和视频拷贝的识别有害视频的方法及系统 WO2019127655A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201711500073.7A CN110020257A (zh) 2017-12-30 2017-12-30 基于用户id和视频拷贝的识别有害视频的方法及系统
CN201711500073.7 2017-12-30

Publications (1)

Publication Number Publication Date
WO2019127655A1 true WO2019127655A1 (zh) 2019-07-04

Family

ID=67064473

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/072239 WO2019127655A1 (zh) 2017-12-30 2018-01-11 基于用户id和视频拷贝的识别有害视频的方法及系统

Country Status (2)

Country Link
CN (1) CN110020257A (zh)
WO (1) WO2019127655A1 (zh)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1761204A (zh) * 2005-11-18 2006-04-19 郑州金惠计算机系统工程有限公司 在互联网上堵截色情图像与不良信息的系统
US8595163B1 (en) * 2004-07-23 2013-11-26 Ellis Robinson Ellis System for evaluating hyperdocuments using a trained artificial neural network
CN104574547A (zh) * 2015-01-28 2015-04-29 广东铂亚信息技术股份有限公司 一种基于人脸识别技术的高速公路防逃费方法
CN106101740A (zh) * 2016-07-13 2016-11-09 百度在线网络技术(北京)有限公司 一种视频内容识别方法和装置
CN106599937A (zh) * 2016-12-29 2017-04-26 池州职业技术学院 一种不良图片过滤装置
CN106973305A (zh) * 2017-03-20 2017-07-21 广东小天才科技有限公司 一种视频中不良内容的检测方法及装置
CN206657367U (zh) * 2016-12-29 2017-11-21 池州职业技术学院 一种不良图片过滤装置

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101853377B (zh) * 2010-05-13 2012-10-17 复旦大学 一种对数字视频进行内容识别的方法
CN102208992B (zh) * 2010-06-13 2015-09-02 天津海量信息技术有限公司 面向互联网的不良信息过滤系统及其方法
CN103905372A (zh) * 2012-12-24 2014-07-02 珠海市君天电子科技有限公司 一种钓鱼网站去误报的方法和装置
CN105654051B (zh) * 2015-12-30 2019-02-22 北京奇艺世纪科技有限公司 一种视频检测方法及系统
CN106055574B (zh) * 2016-05-19 2019-12-24 微梦创科网络科技(中国)有限公司 一种识别非法统一资源标识符url的方法与装置

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8595163B1 (en) * 2004-07-23 2013-11-26 Ellis Robinson Ellis System for evaluating hyperdocuments using a trained artificial neural network
CN1761204A (zh) * 2005-11-18 2006-04-19 郑州金惠计算机系统工程有限公司 在互联网上堵截色情图像与不良信息的系统
CN104574547A (zh) * 2015-01-28 2015-04-29 广东铂亚信息技术股份有限公司 一种基于人脸识别技术的高速公路防逃费方法
CN106101740A (zh) * 2016-07-13 2016-11-09 百度在线网络技术(北京)有限公司 一种视频内容识别方法和装置
CN106599937A (zh) * 2016-12-29 2017-04-26 池州职业技术学院 一种不良图片过滤装置
CN206657367U (zh) * 2016-12-29 2017-11-21 池州职业技术学院 一种不良图片过滤装置
CN106973305A (zh) * 2017-03-20 2017-07-21 广东小天才科技有限公司 一种视频中不良内容的检测方法及装置

Also Published As

Publication number Publication date
CN110020257A (zh) 2019-07-16

Similar Documents

Publication Publication Date Title
US11483268B2 (en) Content navigation with automated curation
US10867221B2 (en) Computerized method and system for automated determination of high quality digital content
US10909425B1 (en) Systems and methods for mobile image search
US11070501B2 (en) Computerized system and method for automatically determining and providing digital content within an electronic communication system
US10296534B2 (en) Storing and searching fingerprints derived from media content based on a classification of the media content
JP6316409B2 (ja) トピックに関連付けられているコンテンツ・アイテムのフィードを複数のコンテンツ・ソースから生成する
JP2019527444A (ja) 一致するコンテンツを特定するためのシステムおよび方法
US20200012764A1 (en) Computerized system and method for modifying a media file by automatically applying security features to select portions of media file content
US10296535B2 (en) Method and system to randomize image matching to find best images to be matched with content items
KR20160074500A (ko) 모바일 비디오 서치 기법
WO2020044096A1 (zh) 信息搜索方法、装置及设备/终端/服务器
WO2018040062A1 (en) Method and system for generating phrase blacklist to prevent certain content from appearing in search result in response to search queries
US20220345435A1 (en) Automated image processing and insight presentation
TW201931163A (zh) 影像搜尋方法、系統和索引建構方法和媒體
WO2019127660A1 (zh) 一种基于用户id识别有害图片的方法及其系统
WO2019127652A1 (zh) 基于用户id和片尾内容的识别有害视频的方法及系统
WO2018205736A1 (zh) 多媒体信息检索方法、装置及存储介质
WO2019127656A1 (zh) 基于用户ip和视频拷贝的识别有害视频的方法及系统
US8971645B1 (en) Video categorization using heterogeneous signals
WO2019127653A1 (zh) 基于片尾内容的识别有害视频的方法及其系统
WO2019127659A1 (zh) 一种基于用户id识别有害视频的方法及其系统
WO2019127651A1 (zh) 一种识别有害视频的方法及其系统
WO2019127655A1 (zh) 基于用户id和视频拷贝的识别有害视频的方法及系统
WO2019127654A1 (zh) 基于用户ip和片尾内容的识别有害视频的方法及系统
WO2019127657A1 (zh) 基于内容的视频拷贝的识别有害视频的方法及其系统

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18893607

Country of ref document: EP

Kind code of ref document: A1