WO2019127656A1 - User ip and video copy-based harmful video identification method and system - Google Patents

User ip and video copy-based harmful video identification method and system Download PDF

Info

Publication number
WO2019127656A1
WO2019127656A1 PCT/CN2018/072240 CN2018072240W WO2019127656A1 WO 2019127656 A1 WO2019127656 A1 WO 2019127656A1 CN 2018072240 W CN2018072240 W CN 2018072240W WO 2019127656 A1 WO2019127656 A1 WO 2019127656A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
address
url
weighting factor
database
Prior art date
Application number
PCT/CN2018/072240
Other languages
French (fr)
Chinese (zh)
Inventor
蔡昭权
胡松
胡辉
蔡映雪
陈伽
黄翰
梁椅辉
罗伟
黄思博
Original Assignee
惠州学院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 惠州学院 filed Critical 惠州学院
Publication of WO2019127656A1 publication Critical patent/WO2019127656A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7834Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using audio features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • G06F16/9566URL specific, e.g. using aliases, detecting broken or misspelled links
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/57Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for processing of video signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L61/00Network arrangements, protocols or services for addressing or naming
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L61/00Network arrangements, protocols or services for addressing or naming
    • H04L61/45Network directories; Name-to-address mapping
    • H04L61/4505Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols
    • H04L61/4511Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols using domain name system [DNS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/02Network architectures or network communication protocols for network security for separating internal from external traffic, e.g. firewalls
    • H04L63/0227Filtering policies
    • H04L63/0236Filtering by address, protocol, port number or service, e.g. IP-address or URL

Definitions

  • the present disclosure pertains to the field of information security, for example, to a method of identifying harmful video and a system therefor.
  • the current technology can be divided into two major categories, one is the traditional method, which includes two categories: (1) the identification method based on single-modal features. This type of method is mainly to extract the visual features of the video, and construct a classifier based on these features. For example, in violent video recognition, common features are video motion vectors, colors, textures, and shapes. (2) Recognition method based on multi-modal feature fusion. This method mainly extracts features of multiple modalities of video and fuses them to construct a classifier. For example, in violent video recognition, in addition to video features, many methods also extract audio features, including short-term energy, bursty sounds, and the like.
  • CNN uses the convolutional neural network to identify and deal with sensitive and harmful images in the database, obtain the internal features of the harmful sensitive video, and judge the obtained video frame by using the learned harmful video frame. Is there any harmful information?
  • RNN cycle The neural network directly inputs the video sequence in the database into the circular neural network to identify harmful video information, learns the frame of harmful video, and uses the learned harmful video frame to judge whether the new video is harmful video.
  • CNN+RNN using CNN to learn the spatial domain information in the image frame in the video, use RNN to identify the time domain information in the video sequence, and finally combine the two to identify and judge, and use the learned framework to identify the video.
  • the existing image processing methods mainly include the following two methods: the traditional method and the deep learning method.
  • the classic method word package model consists of four parts: (1) the underlying feature extraction stage (2) feature coding (3) feature aggregation (4) classification using appropriate classifiers.
  • the deep learning model is another model of image processing, mainly including self-encoder, restricted Boltzmann machine, deep belief network, convolutional neural network, and cyclic neural network. With the continuous advancement of computer hardware and the improvement of the database, the traditional method is simpler than the deep learning. The deep learning method can learn more meaningful data and continuously adjust the parameters according to the task. In terms of image processing, the deep learning model has more powerful feature expression capabilities.
  • the present disclosure provides a method of identifying harmful videos, including:
  • Step a) when it is determined that the page element of the webpage includes a URL path of the video, identifying an IP address or an IP address segment of the user recorded in the page content of the webpage, and querying, in the first database, whether the IP address exists or IP address of the same network segment, and outputting a first weighting factor related to the IP according to the query result of the user IP address;
  • Second weighting factor Second weighting factor
  • Step d) integrating the first weighting factor and the second weighting factor and the third weighting factor to identify whether the video belongs to a harmful video.
  • the present disclosure also discloses a system for identifying harmful videos, including:
  • a first weighting factor generating module configured to: when determining that a page element of the webpage includes a URL path of the video, identify an IP address or an IP address segment of the user recorded in the page content of the webpage, and query whether the first database is in the first database The IP address or the IP address of the same network segment exists, and the first weighting factor related to the IP is output according to the query result of the user IP address;
  • a second weighting factor generating module configured to: obtain a domain name included in the URL or an IP address pointed by the URL according to a URL path of the video, and perform a whois query in the second database according to the domain name included in the URL, And/or based on the IP address pointed by the URL, querying in the second database whether the IP address or the same network segment IP address included in the URL exists, and outputting according to the query result of the whois query result and/or the IP address.
  • a second weighting factor associated with the URL path of the video
  • a third weighting factor generating module configured to: acquire a video file with a minimum picture quality based on a URL path of the video and a minimum picture quality in an online play setting of the video, and utilize a content-based video copy detection technology, Performing video copy detection on the video file of the lowest picture quality in a preset harmful video database, and outputting a third weighting factor according to the monitored result;
  • an identifying module configured to integrate the first weighting factor and the second weighting factor and the third weighting factor to identify whether the video belongs to a harmful video.
  • the present disclosure can provide a more efficient scheme for identifying harmful videos by combining the database created by big data with as few image processing methods as possible.
  • Figure 1 is a schematic illustration of the method of one embodiment of the present disclosure
  • FIG. 2 is a schematic diagram of a system in accordance with an embodiment of the present disclosure.
  • references to "an embodiment” herein mean that a particular feature, structure, or characteristic described in connection with the embodiments can be included in at least one embodiment of the present disclosure.
  • the appearances of the phrases in various places in the specification are not necessarily referring to the same embodiments, and are not exclusive or alternative embodiments that are mutually exclusive. Those skilled in the art will appreciate that the embodiments described herein can be combined with other embodiments.
  • FIG. 1 is a schematic flowchart of a method for identifying a harmful video according to an embodiment of the present disclosure. As shown, the method includes:
  • Step S100 When it is determined that the page element of the webpage includes the URL path of the video, identify the IP address or IP address segment of the user recorded in the page content of the webpage, and query whether the IP address or the same exists in the first database. The IP address of the network segment, and outputting a first weighting factor related to the IP according to the query result of the user IP address;
  • the first database maintains a list of known IP addresses or IP address segments of users who have posted harmful videos in the web page.
  • IP address of the user recorded in the content of the web page is 192.168.10.3:
  • the first weighting factor may be exemplarily 1.0
  • IP address recorded in the database is only 192.168.10.4, then 192.168.10.3 is moderately suspected as the alternate address of the user who has posted harmful video or the newly replaced address, and the first weighting factor can be exemplified as 0.6;
  • the IP address recorded in the database is 192.168.10.4 and 192.168.10.5, and even all the IP addresses of the 192.168.10.X network segment are recorded, then 192.168.10.3 is highly suspected as the alternate address of the user who has posted harmful video. Or a newly replaced address, the first weighting factor can be exemplified as 0.9;
  • IP address recorded in the database includes multiple 192.168.XX network segments and no 192.168.10.X network segment, then 192.168.10.3 is cautiously suspected as the address of the user who has posted harmful video, the first weighting factor. It can be exemplified as 0.4.
  • Step S200 Obtain a domain name included in the URL or an IP address pointed by the URL according to a URL path of the video, perform a whois query in the second database based on the domain name included in the URL, and/or based on the URL Pointing to the IP address, querying in the second database whether the IP address or the IP address of the same network segment exists in the URL, and outputting the URL path related to the video according to the query result of the whois query result and/or the IP address Second weighting factor;
  • the second database maintains a list of known domain names that have posted harmful videos, and/or a list of known IP addresses and IP address segments of websites that have posted harmful videos.
  • the Whois query is to investigate the association of domain name registrants with harmful videos.
  • the second database can maintain the following information: domain name, information on the Internet, a large number of pornographic videos, violent videos, reactionary videos, or cult videos, and the corresponding harmful video.
  • the second weighting factor may be exemplarily 1.0
  • the second database does not record the identifier of any harmful video of the above domain name www.a.com, but can query the domain name registrant of the domain name, and the domain name of other websites registered by the domain name registrant of the domain name, and the second database Including the logo of the other website publishing a large number of harmful videos on the Internet, even if the second database does not record the identifier of any harmful video of the above domain name www.a.com, the website corresponding to the domain name of www.a.com is still highly Suspected to be a source of harmful video, the second weighting factor can be exemplified as 0.9;
  • the second database does not record the identifier of any harmful video of the above domain name www.a.com, but can query the domain name registrant of the domain name, and the domain name of other websites registered by the domain name registrant of the domain name, the second database Does not include any identifier for the other website to publish harmful videos, the second weighting factor may be exemplarily 0;
  • the second weighting factor may also An example is 0.
  • the IP address pointed by the URL may be obtained according to the URL path of the video, and the IP address/IP address segment query is performed to output the second weighting factor.
  • IP address 192.168.20.3:
  • the second weighting factor may be exemplarily 1.0
  • IP address recorded in the second database is only 192.168.20.4, then 192.168.20.3 is moderately suspected as the alternate address of the website to which the video belongs or the newly replaced address, and the second weighting factor may be exemplified as 0.6;
  • the second weighting factor can be exemplified as 0.9;
  • IP address recorded in the database includes multiple 192.168.XX network segments and there is no 192.168.20.X network segment, then 192.168.20.3 is cautiously suspected to be the address of the harmful video website, and the second weighting factor can be exemplified. Is 0.4.
  • the above steps also have a situation in which the IP list and the domain name list are comprehensively considered, that is, the case where the second weighting factor is jointly determined by the IP query of the picture URL and the domain name whois query.
  • the IP query factor of the picture URL is i
  • the domain name whois query factor is j
  • the second weighting factor is y, where 0 ⁇ i ⁇ 1, 0 ⁇ j ⁇ 1, 0 ⁇ y ⁇ 1, and the second formula can be determined according to the following formula Weighting factor:
  • m and n are not equal, and may be adjusted according to the weight of each query factor and the actual situation of determining the second weighting factor.
  • the above formula for calculating y belongs to the linear formula, but in practical applications, a nonlinear formula may also be used.
  • Step S300 acquiring a video file with the lowest picture quality based on the URL path of the video and the lowest picture quality in the online play setting of the video, and using the content-based video copy detection technology in the preset harmful video database Performing video copy detection on the video file of the lowest picture quality, and outputting a third weighting factor according to the monitored result;
  • This step S300 is based on video copy detection of the content, and outputs a third weighting factor by the result of the monitoring.
  • the preset harmful video database includes such as pornographic image information, violent screen information, reactionary characters, cult identification or other unhealthy content, and the preset harmful video database can be established in combination with big data technology, and can It is constantly being updated. If the video file at the lowest picture quality is determined by the monitoring result as: a suspected copy version of a video in the preset harmful video database, the third weighting factor is reflected. It can be understood that when the corresponding threshold condition is met, the third weighting factor may be 1.0 or 0.8 or 0.4 depending on the specific threshold condition.
  • the video file at the lowest picture quality is obtained based on the URL path of the video and the lowest picture quality in the online play setting of the video. .
  • the inventors made full use of the video content corresponding to the lowest picture quality in today's video playback settings for efficient video copy detection.
  • this does not mean that the minimum picture or the low picture picture must be obtained by the play setting, because the video content corresponding to the low picture quality can also be obtained by various samples and the video copy detection is further implemented.
  • step S300 can perform video processing in combination with a traditional method, or can perform video processing in combination with a deep learning model to identify harmful videos.
  • Step S400 synthesizing the first weighting factor and the second weighting factor and the third weighting factor to identify whether the video belongs to a harmful video.
  • the first weighting factor is x
  • the second weighting factor is y
  • the third weighting factor is z, wherein 0 ⁇ x ⁇ 1, 0 ⁇ y ⁇ 1, 0 ⁇ z ⁇ 1, which can be integrated according to the following formula
  • the above weighting factor calculates the harmful coefficient of the video W:
  • a, b, and c are not equal, and may be adjusted according to each weighting factor and the actual situation of identifying harmful content.
  • the formula for calculating W above is a linear formula, but in practice, a nonlinear formula may also be used.
  • step S300 performs image processing, and the remaining steps are different ways, utilizing related queries and obtaining related weighting factors.
  • Step S400 combines (also referred to as fusion) multiple weighting factors to identify harmful videos.
  • processing and identifying each frame of video is very time consuming, while queries are relatively more time-saving.
  • the above embodiment proposes an efficient method of identifying harmful video.
  • the above-described embodiments are apparently capable of further integrating and updating the first database, the second database, and other databases in conjunction with big data and/or artificial intelligence.
  • the second database is a third party database.
  • the IP address information of the publisher of the harmful video recorded on the web address is collected and updated first. database. This is because harmful videos generally form sticky users. Some of these users will participate in the transmission of harmful videos and most of the IP addresses will be relatively fixed. If the relevant URL itself records the IP address information of the publisher of the harmful video, The present disclosure updates the aforementioned first database by collecting its IP address information.
  • step S200 further includes:
  • the security of the domain name is queried in a third-party domain name security list to output a security factor, and the second weighting factor related to the domain name is corrected by the security factor.
  • virustotal.com is a third-party domain name security screening website. It can be understood that if the third party information considers that the relevant domain name contains a virus or a Trojan, the second weighting factor should be raised, which is rooted in the fact that the related website is more insecure.
  • the described embodiment focuses on correcting the second weighting factor from a network security perspective to prevent the user from suffering other losses. This is because cyber security is related to the privacy and property rights of users. If the websites related to harmful videos have network security risks, they will bring privacy leakage or property damage to users in addition to the harmful video.
  • the obtaining the video file at the lowest picture quality in step S300 comprises acquiring the video content at the end of the video file.
  • the embodiment means that when the video content is acquired, in order to minimize the size of the acquired video content, the video content at the end of the video file is preferentially selected. This is because, for harmful videos, whether it is pornographic video, violent video, or reactionary video, the ending is often the climax of the plot, and the spreaders of these harmful videos, whether for the sake of good or political or cult motivation, It is generally impossible to delete the climax of the end of the film. That is to say, for the present embodiment, it greatly reduces the workload of video copy detection. It should be added that this embodiment is a preferred embodiment, and does not mean that the video content cannot select the corresponding content from the first 1/3 playing time period of the video, or select the corresponding content from the middle 1/3 playing time period.
  • the video content at the end of the video may be the content within the last 1/3 playback time interval of the video. More preferably, the video content at the end of the video may be the content within a few minutes of the end of the video, for example, 3 minutes, 5 minutes, 10 minutes; no matter how many minutes, if the last 1/3 playback time is smaller, then the natural preference is 1/ 3 The corresponding content in the playback time period.
  • step S300 in step S300
  • the video file when obtaining the lowest picture quality also includes the following:
  • Step c1) extracting audio in the video
  • the time is located, based on the start and end time of the audio. Video content during the start and end time. This can find relevant harmful images more specifically.
  • the present disclosure can effectively combine multiple dimensions and multiple modes, and combine IP information, domain name information, video information, and audio information to quickly identify harmful videos.
  • the above embodiment may be implemented on the router side or the network provider side to filter related videos in advance.
  • a system for identifying harmful videos including:
  • a first weighting factor generating module configured to: when determining that a page element of the webpage includes a URL path of the video, identify an IP address or an IP address segment of the user recorded in the page content of the webpage, and query whether the first database is in the first database The IP address or the IP address of the same network segment exists, and the first weighting factor related to the IP is output according to the query result of the user IP address;
  • a second weighting factor generating module configured to: obtain a domain name included in the URL or an IP address pointed by the URL according to a URL path of the video, and perform a whois query in the second database according to the domain name included in the URL, And/or querying, according to the IP address pointed by the URL, whether the IP address or the same network segment IP address included in the URL exists in the second database, and outputting according to the query result of the whois query result and/or the IP address.
  • a second weighting factor associated with the URL path of the video
  • a third weighting factor generating module configured to: acquire a video file with a minimum picture quality based on a URL path of the video and a minimum picture quality in an online play setting of the video, and utilize a content-based video copy detection technology, Performing video copy detection on the video file of the lowest picture quality in a preset harmful video database, and outputting a third weighting factor according to the monitored result;
  • an identifying module configured to integrate the first weighting factor and the second weighting factor and the third weighting factor to identify whether the video belongs to a harmful video.
  • the second database is a third party database.
  • the second weighting factor generating module further includes:
  • a correction unit configured to: further query, in a third-party domain name security list, the security of the domain name to output a security factor, and modify the second weighting factor related to the domain name by using the security factor.
  • the video file when acquiring the lowest picture quality in the third weighting factor generating module includes acquiring video content at the end of the video file.
  • the third weighting factor generating module in the third weighting factor generating module further acquires the video file at the lowest picture quality by using the following unit:
  • An audio extraction unit for extracting audio in the video
  • the audio recognition unit is configured to identify whether the harmful content is included in the audio, and if yes, acquire the video content in the start and end time according to the start and end time of the audio.
  • the present disclosure in another embodiment, discloses a system for identifying harmful videos, including:
  • Step a) when it is determined that the page element of the webpage includes a URL path of the video, identifying an IP address or an IP address segment of the user recorded in the page content of the webpage, and querying, in the first database, whether the IP address exists or IP address of the same network segment, and outputting a first weighting factor related to the IP according to the query result of the user IP address;
  • Second weighting factor Second weighting factor
  • Step d) integrating the first weighting factor and the second weighting factor and the third weighting factor to identify whether the video belongs to a harmful video.
  • the present disclosure in another embodiment, also discloses a computer storage medium storing executable instructions for performing the following method of identifying harmful video:
  • Step a) when it is determined that the page element of the webpage includes a URL path of the video, identifying an IP address or an IP address segment of the user recorded in the page content of the webpage, and querying, in the first database, whether the IP address exists or IP address of the same network segment, and outputting a first weighting factor related to the IP according to the query result of the user IP address;
  • Second weighting factor Second weighting factor
  • Step d) integrating the first weighting factor and the second weighting factor and the third weighting factor to identify whether the video belongs to a harmful video.
  • the above system may comprise: at least one processor (eg CPU), at least one sensor (eg accelerometer, gyroscope, GPS module or other positioning module), at least one memory, at least one communication bus, wherein the communication bus To achieve connection communication between various components.
  • the device may further include at least one receiver, at least one transmitter, wherein the receiver and the transmitter may be wired transmission ports, or may be wireless devices (including, for example, including antenna devices) for signaling with other node devices. Or the transmission of data.
  • the memory may be a high speed RAM memory or a non-volatile memory such as at least one disk memory.
  • the memory may optionally be at least one storage device located remotely from the aforementioned processor.
  • a set of program code is stored in the memory, and the processor can call the code stored in the memory to perform related functions via the communication bus.
  • Embodiments of the present disclosure also provide a computer storage medium, wherein the computer storage medium can store a program that, when executed, includes some or all of the steps of any of the methods of identifying a harmful video as recited in the above method embodiments.
  • Modules and units in the system of the embodiments of the present disclosure may be combined, divided, and deleted according to actual needs. It should be noted that, for the foregoing method embodiments, for the sake of simple description, they are all expressed as a series of action combinations, but those skilled in the art should understand that the present invention is not limited by the described action sequence. Because certain steps may be performed in other sequences or concurrently in accordance with the present invention. In the following, those skilled in the art should also understand that the embodiments described in the specification are all preferred embodiments, and the actions, modules, and units involved are not necessarily required by the present invention.
  • the disclosed system can be implemented in other manners.
  • the embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner for example, multiple units or components may be combined or integrated. Go to another system, or some features can be ignored or not executed.
  • the coupling or direct coupling or communication connection of the various units or components to each other may be an indirect coupling or communication connection through some interfaces, devices or units, and may be electrical or otherwise.
  • the units described as separate components may or may not be physically separate, may be located in one place, or may be distributed over multiple network elements. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present disclosure may be integrated into one processing unit, or each unit may exist separately, or two or more units may be integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
  • the integrated unit if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable storage medium.
  • the technical solution of the present disclosure may contribute to the prior art or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium.
  • a number of instructions are included to cause a computer device (which may be a smart phone, a personal digital assistant, a wearable device, a laptop, a tablet) to perform all or part of the steps of the methods described in various embodiments of the present disclosure.
  • the foregoing storage medium includes: a U disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk, and the like. .

Abstract

A harmful video identification method and a system, said method comprising: when it is determined that page elements of a webpage include the URL path of a video, identifying a user's IP address or IP address field contained in the page content of the webpage; acquiring, according to the URL path of the video, a domain name included in the URL or an IP address indicated by the URL; outputting a first weight factor and a second weight factor on the basis of the related query of the IP address and the domain name; acquiring a video file having the lowest picture quality, performing, in a predetermined harmful video database, video copy detection on the video file having the lowest picture quality, and outputting a third weight factor according to a monitoring result; and integrating the first weight factor, the second weight factor and the third weight factor to identify whether the video is a harmful video. In conjunction with a database created on the basis of big data, the present disclosure can provide, in multiple modes, a harmful video identification solution using as few image processing means as possible.

Description

基于用户IP和视频拷贝的识别有害视频的方法及系统Method and system for identifying harmful video based on user IP and video copy 技术领域Technical field
本公开属于信息安全领域,例如涉及一种识别有害视频的方法及其系统。The present disclosure pertains to the field of information security, for example, to a method of identifying harmful video and a system therefor.
背景技术Background technique
在信息社会,到处充斥信息流,包括但不限于文本、视频、音频、图片等。其中,视频文件往往包括听觉信息和视觉信息,表达能力更加全面。然而,随着移动互联网的普及,网络上充斥大量有害视频内容,例如涉及毒品、色情、暴力等非法内容的视频,或者诱导加入邪教、自杀群体、犯罪群体等的有害视频,由于视觉直观性、冲击性等特点,其危害性更加甚于有害文本、有害图片和有害音频等,因此对这些有害视频进行识别,进而进行过滤、删除、消除危害,是十分必要的。In the information society, information flows are everywhere, including but not limited to text, video, audio, pictures, and so on. Among them, video files often include auditory information and visual information, and the expression ability is more comprehensive. However, with the popularity of the mobile Internet, the Internet is flooded with harmful video content, such as videos involving illegal content such as drugs, pornography, violence, or harmful videos that are induced to join cults, suicide groups, criminal groups, etc., due to visual intuition, Impact and other characteristics are more harmful than harmful texts, harmful pictures and harmful audio. Therefore, it is necessary to identify these harmful videos, and then filter, delete and eliminate the hazards.
对于网络有害视频的识别,现在的技术主要有可以分为两大类,一种是传统方法,其中又包括两类:(1)基于单模态特征的识别方法。这类方法主要是提取视频的视觉特征,根据这些特征来构造分类器。例如在暴力视频识别上,常见的特征有视频运动矢量、颜色、纹理以及形状等。(2)基于多模态特征融合的识别方法,这类方法主要是提取视频的多个模态的特征,将其融合以构造分类器。例如在暴力视频识别上,除了视频特征外,很多方法还提取音频特征,包括短时能量,突发声音等。有些方法还考虑了网络视频周围的文本,从这些文本中继续提取一些特征用于融合识别。另一种是深度学习的方法:(1)CNN利用卷积神经网络对资料库中的敏感有害图像进行识别处理,得到有害敏感视频的内部特征,利用学习到的有害视频框架判断得到的视频帧中是否有有害信息。(2)RNN循环 神经网络,直接将资料库中的视频序列输入循环神经网络中识别有害视频信息,学习到有害视频的框架,利用学习到的有害视频框架判断识别新的视频是否为有害视频。(3)CNN+RNN,利用CNN学习视频中图像帧中的空间域信息,利用RNN识别视频序列中的时间域信息,最后将两者结合进行识别判断,利用学习到的框架对视频进行识别。For the identification of harmful video on the network, the current technology can be divided into two major categories, one is the traditional method, which includes two categories: (1) the identification method based on single-modal features. This type of method is mainly to extract the visual features of the video, and construct a classifier based on these features. For example, in violent video recognition, common features are video motion vectors, colors, textures, and shapes. (2) Recognition method based on multi-modal feature fusion. This method mainly extracts features of multiple modalities of video and fuses them to construct a classifier. For example, in violent video recognition, in addition to video features, many methods also extract audio features, including short-term energy, bursty sounds, and the like. Some methods also consider text around the web video, and continue to extract features from these texts for fusion recognition. The other is the method of deep learning: (1) CNN uses the convolutional neural network to identify and deal with sensitive and harmful images in the database, obtain the internal features of the harmful sensitive video, and judge the obtained video frame by using the learned harmful video frame. Is there any harmful information? (2) RNN cycle The neural network directly inputs the video sequence in the database into the circular neural network to identify harmful video information, learns the frame of harmful video, and uses the learned harmful video frame to judge whether the new video is harmful video. (3) CNN+RNN, using CNN to learn the spatial domain information in the image frame in the video, use RNN to identify the time domain information in the video sequence, and finally combine the two to identify and judge, and use the learned framework to identify the video.
现有的图像处理手段主要有下面两种方法:传统方法和深度学习方法。其中传统方法中经典的方法词包模型,该模型由四个部分组成:(1)底层的特征提取阶段(2)特征编码(3)特征汇聚(4)使用合适的分类器进行分类。深度学习模型是另一种图像处理的模型,主要有自编码器,受限波尔兹曼机,深度信念网络,卷积神经网络,循环神经网络等。随着计算机硬件的不断进步,数据库的完善,使用传统的方法运算过程相比于深度学习来说较为简单,深度学习方法能够学习到更有意义的数据,并根据任务不断进行参数调整,所以对于图像处理方面,深度学习模型有更强大的特征表达能力。The existing image processing methods mainly include the following two methods: the traditional method and the deep learning method. Among the traditional methods, the classic method word package model consists of four parts: (1) the underlying feature extraction stage (2) feature coding (3) feature aggregation (4) classification using appropriate classifiers. The deep learning model is another model of image processing, mainly including self-encoder, restricted Boltzmann machine, deep belief network, convolutional neural network, and cyclic neural network. With the continuous advancement of computer hardware and the improvement of the database, the traditional method is simpler than the deep learning. The deep learning method can learn more meaningful data and continuously adjust the parameters according to the task. In terms of image processing, the deep learning model has more powerful feature expression capabilities.
现有的识别方法在在识别效率上都有所不足,在大数据和人工智能发展的情形下,如何高效的识别有害视频,就成为一个需要考虑的问题。Existing identification methods have shortcomings in recognition efficiency. In the case of the development of big data and artificial intelligence, how to effectively identify harmful videos becomes a problem to be considered.
发明内容Summary of the invention
本公开提供了一种识别有害视频的方法,包括:The present disclosure provides a method of identifying harmful videos, including:
步骤a),当判断出网页的页面元素包括视频的URL路径时,识别所述网页的页面内容中记载的用户的IP地址或IP地址段,在第一数据库中查询是否存在所述IP地址或同一网段IP地址,并根据用户IP地址的查询结果输出与IP相关的第一权重因子;Step a), when it is determined that the page element of the webpage includes a URL path of the video, identifying an IP address or an IP address segment of the user recorded in the page content of the webpage, and querying, in the first database, whether the IP address exists or IP address of the same network segment, and outputting a first weighting factor related to the IP according to the query result of the user IP address;
步骤b),依据视频的URL路径获取所述URL中包含的域名或所述URL指向 的IP地址,基于所述URL中包含的域名,在第二数据库中进行whois查询,和/或基于所述URL指向的IP地址,在第二数据库中查询是否存在所述URL中包含的IP地址或同一网段IP地址,并根据whois查询结果和/或IP地址的查询结果,输出与视频的URL路径相关的第二权重因子;Step b): obtaining a domain name included in the URL or an IP address pointed by the URL according to a URL path of the video, performing a whois query in the second database based on the domain name included in the URL, and/or based on the The IP address pointed to by the URL, in the second database, whether the IP address included in the URL or the IP address of the same network segment exists, and the output is related to the URL path of the video according to the query result of the whois query and/or the IP address. Second weighting factor;
步骤c),基于所述视频的URL路径和所述视频的在线播放设置中的最低画面质量,获取最低画面质量时的视频文件,并利用基于内容的视频拷贝检测技术,在预设的有害视频数据库中对所述最低画面质量的视频文件进行视频拷贝检测,并根据监测的结果输出第三权重因子;Step c): acquiring a video file with the lowest picture quality based on the URL path of the video and the lowest picture quality in the online play setting of the video, and using the content-based video copy detection technology to preset the harmful video Performing video copy detection on the video file of the lowest picture quality in the database, and outputting a third weighting factor according to the monitored result;
步骤d),综合第一权重因子和第二权重因子以及第三权重因子,对所述视频是否属于有害视频进行识别。Step d), integrating the first weighting factor and the second weighting factor and the third weighting factor to identify whether the video belongs to a harmful video.
此外,本公开还揭示了一种识别有害视频的系统,包括:In addition, the present disclosure also discloses a system for identifying harmful videos, including:
第一权重因子生成模块,用于:当判断出网页的页面元素包括视频的URL路径时,识别所述网页的页面内容中记载的用户的IP地址或IP地址段,在第一数据库中查询是否存在所述IP地址或同一网段IP地址,并根据用户IP地址的查询结果输出与IP相关的第一权重因子;a first weighting factor generating module, configured to: when determining that a page element of the webpage includes a URL path of the video, identify an IP address or an IP address segment of the user recorded in the page content of the webpage, and query whether the first database is in the first database The IP address or the IP address of the same network segment exists, and the first weighting factor related to the IP is output according to the query result of the user IP address;
第二权重因子生成模块,用于:依据视频的URL路径获取所述URL中包含的域名或所述URL指向的IP地址,基于所述URL中包含的域名,在第二数据库中进行whois查询,和/或基于所述URL指向的IP地址,在第二数据库中查询是否存在所述URL中包含的IP地址或同一网段IP地址,并根据whois查询结果和/或IP地址的查询结果,输出与视频的URL路径相关的第二权重因子;a second weighting factor generating module, configured to: obtain a domain name included in the URL or an IP address pointed by the URL according to a URL path of the video, and perform a whois query in the second database according to the domain name included in the URL, And/or based on the IP address pointed by the URL, querying in the second database whether the IP address or the same network segment IP address included in the URL exists, and outputting according to the query result of the whois query result and/or the IP address. a second weighting factor associated with the URL path of the video;
第三权重因子生成模块,用于:基于所述视频的URL路径和所述视频的在线播放设置中的最低画面质量,获取最低画面质量时的视频文件,并利用基于内容 的视频拷贝检测技术,在预设的有害视频数据库中对所述最低画面质量的视频文件进行视频拷贝检测,并根据监测的结果输出第三权重因子;a third weighting factor generating module, configured to: acquire a video file with a minimum picture quality based on a URL path of the video and a minimum picture quality in an online play setting of the video, and utilize a content-based video copy detection technology, Performing video copy detection on the video file of the lowest picture quality in a preset harmful video database, and outputting a third weighting factor according to the monitored result;
识别模块,用于综合第一权重因子和第二权重因子以及第三权重因子,对所述视频是否属于有害视频进行识别。And an identifying module, configured to integrate the first weighting factor and the second weighting factor and the third weighting factor to identify whether the video belongs to a harmful video.
通过所述方法及其系统,本公开能够结合大数据所打造的数据库,用尽量少的图像处理手段,提供一种较为高效的识别有害视频的方案。Through the method and its system, the present disclosure can provide a more efficient scheme for identifying harmful videos by combining the database created by big data with as few image processing methods as possible.
附图说明DRAWINGS
图1是本公开中一个实施例所述方法的示意图;Figure 1 is a schematic illustration of the method of one embodiment of the present disclosure;
图2是本公开中一个实施例所述系统的示意图。2 is a schematic diagram of a system in accordance with an embodiment of the present disclosure.
具体实施方式Detailed ways
为了使本领域技术人员理解本公开所披露的技术方案,下面将结合实施例及有关附图,对各个实施例的技术方案进行描述,所描述的实施例是本公开的一部分实施例,而不是全部的实施例。本公开所采用的术语“第一”、“第二”等是用于区别不同对象,而不是用于描述特定顺序。此外,“包括”和“具有”以及它们的任何变形,意图在于覆盖且不排他的包含。例如包含了一系列步骤或单元的过程、或方法、或系统、或产品或设备没有限定于已列出的步骤或单元,而是可选的还包括没有列出的步骤或单元,或可选的还包括对于这些过程、方法、系统、产品或设备固有的其他步骤或单元。In order to make those skilled in the art understand the technical solutions disclosed in the present disclosure, the technical solutions of the various embodiments will be described below in conjunction with the embodiments and related drawings, which are a part of the embodiments of the present disclosure, instead of All embodiments. The terms "first", "second", etc., as used in this disclosure, are used to distinguish different objects, and are not intended to describe a particular order. Moreover, "including" and "having" and any variations thereof are intended to be inclusive and not exclusive. For example, a process, or method, or system, or product or device that comprises a series of steps or units is not limited to the listed steps or units, but optionally includes steps or units not listed, or optional Also includes other steps or units inherent to these processes, methods, systems, products, or devices.
在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本公开的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其他实施例互斥的独立的或备选的实施例。本领域技术人员可以理解的是,本文所描述的实施例可以与其他实施例相结 合。References to "an embodiment" herein mean that a particular feature, structure, or characteristic described in connection with the embodiments can be included in at least one embodiment of the present disclosure. The appearances of the phrases in various places in the specification are not necessarily referring to the same embodiments, and are not exclusive or alternative embodiments that are mutually exclusive. Those skilled in the art will appreciate that the embodiments described herein can be combined with other embodiments.
参见图1,图1是本公开中一个实施例提供的一种识别有害视频的方法的流程示意图。如图所示,所述方法包括:Referring to FIG. 1 , FIG. 1 is a schematic flowchart of a method for identifying a harmful video according to an embodiment of the present disclosure. As shown, the method includes:
步骤S100,当判断出网页的页面元素包括视频的URL路径时,识别所述网页的页面内容中记载的用户的IP地址或IP地址段,在第一数据库中查询是否存在所述IP地址或同一网段IP地址,并根据用户IP地址的查询结果输出与IP相关的第一权重因子;Step S100: When it is determined that the page element of the webpage includes the URL path of the video, identify the IP address or IP address segment of the user recorded in the page content of the webpage, and query whether the IP address or the same exists in the first database. The IP address of the network segment, and outputting a first weighting factor related to the IP according to the query result of the user IP address;
能够理解,第一数据库维护已知的、在网页中曾经发布过有害视频的用户的IP地址或IP地址段清单。It can be understood that the first database maintains a list of known IP addresses or IP address segments of users who have posted harmful videos in the web page.
例如,当识别出网页页面内容中记载的用户的IP地址是192.168.10.3的情形下:For example, when it is recognized that the IP address of the user recorded in the content of the web page is 192.168.10.3:
如果第一数据库中记载有该IP地址,那么第一权重因子可以示例性为1.0;If the IP address is recorded in the first database, the first weighting factor may be exemplarily 1.0;
如果数据库中记载的IP地址只有192.168.10.4,那么192.168.10.3则被中度怀疑为曾经发布有害视频的用户的备用地址或者新近更换的地址,第一权重因子可以示例性为0.6;If the IP address recorded in the database is only 192.168.10.4, then 192.168.10.3 is moderately suspected as the alternate address of the user who has posted harmful video or the newly replaced address, and the first weighting factor can be exemplified as 0.6;
如果数据库中记载的IP地址有192.168.10.4以及192.168.10.5,甚至记载了192.168.10.X网段的所有IP地址,那么192.168.10.3则被高度怀疑为曾经发布有害视频的用户的的备用地址或者新近更换的地址,第一权重因子可以示例性为0.9;If the IP address recorded in the database is 192.168.10.4 and 192.168.10.5, and even all the IP addresses of the 192.168.10.X network segment are recorded, then 192.168.10.3 is highly suspected as the alternate address of the user who has posted harmful video. Or a newly replaced address, the first weighting factor can be exemplified as 0.9;
如果数据库中记载的IP地址中包括多个192.168.X.X网段,而没有192.168.10.X网段,那么192.168.10.3则被谨慎怀疑为曾经发布有害视频的用户的的地址,第一权重因子可以示例性为0.4。If the IP address recorded in the database includes multiple 192.168.XX network segments and no 192.168.10.X network segment, then 192.168.10.3 is cautiously suspected as the address of the user who has posted harmful video, the first weighting factor. It can be exemplified as 0.4.
步骤S200,依据视频的URL路径获取所述URL中包含的域名或所述URL指向的IP地址,基于所述URL中包含的域名,在第二数据库中进行whois查询,和/或基于所述URL指向的IP地址,在第二数据库中查询是否存在所述URL中包含的IP地址或同一网段IP地址,并根据whois查询结果和/或IP地址的查询结果,输出与视频的URL路径相关的第二权重因子;Step S200: Obtain a domain name included in the URL or an IP address pointed by the URL according to a URL path of the video, perform a whois query in the second database based on the domain name included in the URL, and/or based on the URL Pointing to the IP address, querying in the second database whether the IP address or the IP address of the same network segment exists in the URL, and outputting the URL path related to the video according to the query result of the whois query result and/or the IP address Second weighting factor;
能够理解,第二数据库维护已知的、发布过有害视频的域名清单,和/或已知的、发布过有害视频的网站的IP地址、IP地址段清单。It can be understood that the second database maintains a list of known domain names that have posted harmful videos, and/or a list of known IP addresses and IP address segments of websites that have posted harmful videos.
Whois查询是为了考察域名注册人与有害视频的关联情况。第二数据库可以维护如下信息:域名、互联网上大量发布色情视频、暴力视频、反动视频、或邪教视频等的域名注册人的信息以及对应的有害视频的标识。The Whois query is to investigate the association of domain name registrants with harmful videos. The second database can maintain the following information: domain name, information on the Internet, a large number of pornographic videos, violent videos, reactionary videos, or cult videos, and the corresponding harmful video.
例如,域名是www.a.com的情形下:For example, if the domain name is www.a.com:
如果第二数据库中记载有该域名地址、相应有害视频的标识及其whois信息,那么第二权重因子可以示例性为1.0;If the domain name address, the identifier of the corresponding harmful video, and its whois information are recorded in the second database, the second weighting factor may be exemplarily 1.0;
如果第二数据库中没有记载上述域名www.a.com的任何有害视频的标识,但是能够查询到该域名的域名注册人,以及该域名的域名注册人注册的其他网站的域名,且第二数据库包括所述其他网站在互联网上大量发布有害视频的标识,那么即使第二数据库中没有记载上述域名www.a.com的任何有害视频的标识,www.a.com该域名对应的网站依然被高度怀疑为有害视频的来源,所述第二权重因子可以示例性为0.9;If the second database does not record the identifier of any harmful video of the above domain name www.a.com, but can query the domain name registrant of the domain name, and the domain name of other websites registered by the domain name registrant of the domain name, and the second database Including the logo of the other website publishing a large number of harmful videos on the Internet, even if the second database does not record the identifier of any harmful video of the above domain name www.a.com, the website corresponding to the domain name of www.a.com is still highly Suspected to be a source of harmful video, the second weighting factor can be exemplified as 0.9;
如果第二数据库中没有记载上述域名www.a.com的任何有害视频的标识,但是能够查询到该域名的域名注册人,以及该域名的域名注册人注册的其他网站的域名,然而第二数据库并不包括任何关于所述其他网站发布有害视频的标识,所 述第二权重因子可以示例性为0;If the second database does not record the identifier of any harmful video of the above domain name www.a.com, but can query the domain name registrant of the domain name, and the domain name of other websites registered by the domain name registrant of the domain name, the second database Does not include any identifier for the other website to publish harmful videos, the second weighting factor may be exemplarily 0;
容易理解,如果第二数据库中没有记载上述域名www.a.com的任何有害视频的标识,也查询不到该域名的域名注册人注册的其他网站的域名,那么所述第二权重因子也可以示例性为0。It is easy to understand that if the second database does not record the identifier of any harmful video of the above domain name www.a.com, and the domain name of other websites registered by the domain name registrant of the domain name is not queried, the second weighting factor may also An example is 0.
示例性的,还可以依据视频的URL路径获取所述URL指向的IP地址,进行IP地址/IP地址段查询,来输出第二权重因子,Exemplarily, the IP address pointed by the URL may be obtained according to the URL path of the video, and the IP address/IP address segment query is performed to output the second weighting factor.
例如,IP地址是192.168.20.3的情形下:For example, if the IP address is 192.168.20.3:
如果第二数据库中记载有该IP地址,那么第二权重因子可以示例性为1.0;If the IP address is recorded in the second database, the second weighting factor may be exemplarily 1.0;
如果第二数据库中记载的IP地址只有192.168.20.4,那么192.168.20.3则被中度怀疑为该视频所属网站的备用地址或者新近更换的地址,第二权重因子可以示例性为0.6;If the IP address recorded in the second database is only 192.168.20.4, then 192.168.20.3 is moderately suspected as the alternate address of the website to which the video belongs or the newly replaced address, and the second weighting factor may be exemplified as 0.6;
如果第二数据库中记载的IP地址有192.168.20.4以及192.168.20.5,甚至记载了192.168.20.X网段的所有IP地址,那么192.168.20.3则被高度怀疑为该视频所属网站的备用地址或者新近更换的地址,第二权重因子可以示例性为0.9;If the IP address recorded in the second database is 192.168.20.4 and 192.168.20.5, and even all the IP addresses of the 192.168.20.X network segment are recorded, then 192.168.20.3 is highly suspected to be the alternate address of the website to which the video belongs or The newly replaced address, the second weighting factor can be exemplified as 0.9;
如果数据库中记载的IP地址中包括多个192.168.X.X网段,而没有192.168.20.X网段,那么192.168.20.3则被谨慎怀疑为有害视频属网站的地址,第二权重因子可以示例性为0.4。If the IP address recorded in the database includes multiple 192.168.XX network segments and there is no 192.168.20.X network segment, then 192.168.20.3 is cautiously suspected to be the address of the harmful video website, and the second weighting factor can be exemplified. Is 0.4.
特别的,上述步骤还存在综合考虑IP清单和域名清单的情形,即通过图片URL的IP查询和域名whois查询来共同确定第二权重因子的情形。In particular, the above steps also have a situation in which the IP list and the domain name list are comprehensively considered, that is, the case where the second weighting factor is jointly determined by the IP query of the picture URL and the domain name whois query.
假设图片URL的IP查询因子为i,域名whois查询因子为j,第二权重因子为y,其中0≤i≤1,0≤j≤1,0≤y≤1,可以根据如下公式确定第二权重因子:Suppose the IP query factor of the picture URL is i, the domain name whois query factor is j, and the second weighting factor is y, where 0≤i≤1, 0≤j≤1, 0≤y≤1, and the second formula can be determined according to the following formula Weighting factor:
y=m×i+n×j,其中,m+n=1,m、n则分别表示IP查询因子和域名whois查询因子的权重。y=m×i+n×j, where m+n=1, m and n represent the weights of the IP query factor and the domain name whois query factor, respectively.
例如,m=n=1/2;For example, m=n=1/2;
更例如,m、n不相等,具体可以根据各个查询因子的权重以及确定第二权重因子的实际情况而调整。For example, m and n are not equal, and may be adjusted according to the weight of each query factor and the actual situation of determining the second weighting factor.
能够理解,y越接近1,第二权重因子就越重,相关图片属于有害图片的几率越大。It can be understood that the closer y is to 1, the heavier the second weighting factor is, and the greater the probability that the related picture belongs to a harmful picture.
以上计算y的公式属于线性公式,然而实际应用时,也可能采用非线性公式。The above formula for calculating y belongs to the linear formula, but in practical applications, a nonlinear formula may also be used.
进一步的,无论是线性公式还是非线性公式,均可以考虑通过训练或拟合来确定相关公式及其参数。Further, whether it is a linear formula or a nonlinear formula, it can be considered to determine the relevant formula and its parameters by training or fitting.
步骤S300,基于所述视频的URL路径和所述视频的在线播放设置中的最低画面质量,获取最低画面质量时的视频文件,并利用基于内容的视频拷贝检测技术,在预设的有害视频数据库中对所述最低画面质量的视频文件进行视频拷贝检测,并根据监测的结果输出第三权重因子;Step S300, acquiring a video file with the lowest picture quality based on the URL path of the video and the lowest picture quality in the online play setting of the video, and using the content-based video copy detection technology in the preset harmful video database Performing video copy detection on the video file of the lowest picture quality, and outputting a third weighting factor according to the monitored result;
该步骤S300是基于内容的视频拷贝检测,并通过监测的结果来输出第三权重因子。能够理解,预设的有害视频数据库包括了诸如色情画面信息、暴力画面信息、反动人物、邪教标识或其他不健康内容等,并且所述预设的有害视频数据库可以结合大数据技术来建立,且可以被不断更新。如果所述最低画面质量时的视频文件被监测结果认定为:所述预设的有害视频数据库中某视频的疑似拷贝版本,则第三权重因子会有所体现。能够理解,满足相应的阈值条件时,第三权重因子可能是1.0,也可能是0.8或0.4,视具体阈值条件而定。This step S300 is based on video copy detection of the content, and outputs a third weighting factor by the result of the monitoring. It can be understood that the preset harmful video database includes such as pornographic image information, violent screen information, reactionary characters, cult identification or other unhealthy content, and the preset harmful video database can be established in combination with big data technology, and can It is constantly being updated. If the video file at the lowest picture quality is determined by the monitoring result as: a suspected copy version of a video in the preset harmful video database, the third weighting factor is reflected. It can be understood that when the corresponding threshold condition is met, the third weighting factor may be 1.0 or 0.8 or 0.4 depending on the specific threshold condition.
另外,需要强调的是,为了降低本实施例所需的计算资源和时间成本,基于所述视频的URL路径和所述视频的在线播放设置中的最低画面质量,获取最低画面质量时的视频文件。显然,发明人充分利用了当今视频播放设置中的最低画面质量所对应的视频内容来进行高效地视频拷贝检测。但是,这不意味着必须通过播放设置来获取最低画面或低画质画面,因为还可以通过各种采样来获得低画质所对应的视频内容并进一步实施视频拷贝检测。In addition, it should be emphasized that in order to reduce the computational resources and time cost required by the embodiment, the video file at the lowest picture quality is obtained based on the URL path of the video and the lowest picture quality in the online play setting of the video. . Obviously, the inventors made full use of the video content corresponding to the lowest picture quality in today's video playback settings for efficient video copy detection. However, this does not mean that the minimum picture or the low picture picture must be obtained by the play setting, because the video content corresponding to the low picture quality can also be obtained by various samples and the video copy detection is further implemented.
能够理解,所述步骤S300,既可以结合传统的方法进行视频处理,也可以结合深度学习模型进行视频处理,进而对有害视频进行识别。It can be understood that the step S300 can perform video processing in combination with a traditional method, or can perform video processing in combination with a deep learning model to identify harmful videos.
步骤S400,综合第一权重因子和第二权重因子以及第三权重因子,对所述视频是否属于有害视频进行识别。Step S400, synthesizing the first weighting factor and the second weighting factor and the third weighting factor to identify whether the video belongs to a harmful video.
示例性的,设第一权重因子为x,第二权重因子为y,第三权重因子为z,其中0≤x≤1,0≤y≤1,0≤z≤1,可以根据如下公式综合上述权重因子计算视频的有害系数W:Exemplarily, the first weighting factor is x, the second weighting factor is y, and the third weighting factor is z, wherein 0≤x≤1, 0≤y≤1, 0≤z≤1, which can be integrated according to the following formula The above weighting factor calculates the harmful coefficient of the video W:
W=a×x+b×y+c×z,其中,a+b+c=1,a、b、c则分别表示各个权重因子的权重。W = a × x + b × y + c × z, where a + b + c = 1, a, b, c respectively represent the weight of each weighting factor.
例如,a=b=c=1/3;For example, a=b=c=1/3;
更例如,a、b、c不相等,具体可以根据各个权重因子以及识别有害内容的实际情况而调整。For example, a, b, and c are not equal, and may be adjusted according to each weighting factor and the actual situation of identifying harmful content.
能够理解,W越接近1,相关视频属于有害视频的几率越大。It can be understood that the closer W is to 1, the greater the probability that the related video belongs to harmful video.
以上计算W的公式属于线性公式,然而实际应用时,也可能采用非线性公式。The formula for calculating W above is a linear formula, but in practice, a nonlinear formula may also be used.
进一步的,无论是线性公式还是非线性公式,均可以考虑通过训练或拟合来 确定相关公式及其参数。Further, whether it is a linear formula or a nonlinear formula, it can be considered to determine the relevant formula and its parameters by training or fitting.
综上,对于上述实施例,仅仅步骤S300进行了图像处理,而其余步骤则是另辟蹊径,利用了相关查询、获得相关的权重因子。步骤S400则综合(也可称为融合)多个权重因子进行有害视频的识别。本领域技术人员均知晓,针对视频的每一帧图像进行处理、识别是非常消耗时间成本的,而查询则相对而言更加节省时间成本。显而易见,上述实施例提出了一种富有效率的识别有害视频的方法。另外,上述实施例显然能够进一步结合大数据和/或人工智能来建立、更新所述第一数据库、第二数据库以及其他数据库。In summary, for the above embodiment, only step S300 performs image processing, and the remaining steps are different ways, utilizing related queries and obtaining related weighting factors. Step S400 combines (also referred to as fusion) multiple weighting factors to identify harmful videos. Those skilled in the art are aware that processing and identifying each frame of video is very time consuming, while queries are relatively more time-saving. It will be apparent that the above embodiment proposes an efficient method of identifying harmful video. Additionally, the above-described embodiments are apparently capable of further integrating and updating the first database, the second database, and other databases in conjunction with big data and/or artificial intelligence.
在另一个实施例中,所述第二数据库为第三方数据库。In another embodiment, the second database is a third party database.
例如,进行whois查询的众多网站、以及第三方维护的色情网站列表、暴力网站列表、反动网站列表、邪教网站列表方面的数据库、或者记录了有害图片的网站的IP地址、IP地址段列表方面的数据库。For example, a number of websites that perform whois queries, as well as lists of pornographic websites maintained by third parties, lists of violent websites, lists of reaction websites, databases of cult website lists, or lists of IP addresses and IP address lists of websites that record harmful pictures. database.
在另一个实施例中,对于识别后确定为有害视频的,针对其来源的网址(例如论坛或网页),收集所述网址上记载的所述有害视频的发表者的IP地址信息并更新第一数据库。这是因为,有害视频一般会形成一些粘性用户,这些用户有一部分会参与传播有害视频且大部分的IP地址会相对固定,如果相关网址自身记载了所述有害视频的发表者的IP地址信息,本公开则通过收集其IP地址信息来更新前述第一数据库。In another embodiment, for the URL (such as a forum or a webpage) whose source is determined to be harmful after the identification, the IP address information of the publisher of the harmful video recorded on the web address is collected and updated first. database. This is because harmful videos generally form sticky users. Some of these users will participate in the transmission of harmful videos and most of the IP addresses will be relatively fixed. If the relevant URL itself records the IP address information of the publisher of the harmful video, The present disclosure updates the aforementioned first database by collecting its IP address information.
在另一个实施例中,步骤S200还包括:In another embodiment, step S200 further includes:
进一步的,在第三方域名安全列表中查询所述域名的安全性以便输出安全因子,并通过所述安全因子对所述与域名相关的第二权重因子进行修正。Further, the security of the domain name is queried in a third-party domain name security list to output a security factor, and the second weighting factor related to the domain name is corrected by the security factor.
例如virustotal.com这一第三方域名安全筛查网站。能够理解,如果第三 方信息中认为相关域名包含病毒或木马,则应当提高第二权重因子,根源在于相关网站更加不安全。For example, virustotal.com is a third-party domain name security screening website. It can be understood that if the third party information considers that the relevant domain name contains a virus or a Trojan, the second weighting factor should be raised, which is rooted in the fact that the related website is more insecure.
能够理解,所述实施例是侧重于从网络安全角度修正第二权重因子,防止用户遭受其他损失。这是因为,网络安全事关用户的隐私和财产权,如果有害视频的相关网站存在网络安全隐患,那么除了有害视频的危害之外还对用户带来隐私泄露或财产损失的危害。It can be appreciated that the described embodiment focuses on correcting the second weighting factor from a network security perspective to prevent the user from suffering other losses. This is because cyber security is related to the privacy and property rights of users. If the websites related to harmful videos have network security risks, they will bring privacy leakage or property damage to users in addition to the harmful video.
在另一个实施例中,步骤S300中的获取最低画面质量时的视频文件,包括获取视频文件片尾的视频内容。In another embodiment, the obtaining the video file at the lowest picture quality in step S300 comprises acquiring the video content at the end of the video file.
对该实施例而言,其意味着获取视频内容时,为了尽量减少获取的视频内容的大小,优先选择视频文件片尾的视频内容。这是因为,对于有害视频而言,不论是色情视频、暴力视频、还是反动视频,其片尾往往是情节的高潮部分,而这些有害视频的传播者,无论是出于癖好还是政治或邪教动机,一般都不可能删除片尾的高潮部分。也就是说,对于本实施例而言,其大大减少了视频拷贝检测的工作量。需要补充的是,该实施例是较佳实施例,并不意味着视频内容不能从视频的前面1/3播放时间段选取相应内容,或者从中间1/3播放时间段选取相应内容。For the embodiment, it means that when the video content is acquired, in order to minimize the size of the acquired video content, the video content at the end of the video file is preferentially selected. This is because, for harmful videos, whether it is pornographic video, violent video, or reactionary video, the ending is often the climax of the plot, and the spreaders of these harmful videos, whether for the sake of good or political or cult motivation, It is generally impossible to delete the climax of the end of the film. That is to say, for the present embodiment, it greatly reduces the workload of video copy detection. It should be added that this embodiment is a preferred embodiment, and does not mean that the video content cannot select the corresponding content from the first 1/3 playing time period of the video, or select the corresponding content from the middle 1/3 playing time period.
较佳的,片尾的视频内容可以是视频的末尾1/3播放时间间隔内的内容。更佳的,片尾的视频内容可以是视频的末尾几分钟内的内容,例如3分钟、5分钟、10分钟;不论几分钟,如果末尾1/3播放时间长度更小,那么自然优选末尾1/3播放时间段内的相应内容。Preferably, the video content at the end of the video may be the content within the last 1/3 playback time interval of the video. More preferably, the video content at the end of the video may be the content within a few minutes of the end of the video, for example, 3 minutes, 5 minutes, 10 minutes; no matter how many minutes, if the last 1/3 playback time is smaller, then the natural preference is 1/ 3 The corresponding content in the playback time period.
在另一个实施例中,步骤S300中的In another embodiment, in step S300
获取最低画面质量时的视频文件,还包括如下:The video file when obtaining the lowest picture quality also includes the following:
步骤c1):提取视频中的音频;Step c1): extracting audio in the video;
步骤c2):识别音频中是否包括有害内容,如果有,则根据音频的起止时间获取所述起止时间内的视频内容。Step c2): Identify whether harmful content is included in the audio, and if so, obtain the video content in the start and end time according to the start and end time of the audio.
对于该实施例而言,如果识别到音频中包括色情内容、暴力内容、反动政治言论、邪教煽动性言论、或恐怖仇视方面的极端言论,则定位其时间,从音频的起止时间为依据,获取起止时间内的视频内容。这样能够更加针对性的找到相关有害的画面。For this embodiment, if it is recognized that the audio includes extreme content such as pornographic content, violent content, reactionary political speech, cult inflammatory speech, or horrific hatred, then the time is located, based on the start and end time of the audio. Video content during the start and end time. This can find relevant harmful images more specifically.
如前文所述,如果结合大数据技术,本公开能够富有成效的结合多个维度、多种模式,结合IP信息、域名信息、视频信息、音频信息来快速的识别有害视频。As described above, if combined with big data technology, the present disclosure can effectively combine multiple dimensions and multiple modes, and combine IP information, domain name information, video information, and audio information to quickly identify harmful videos.
更进一步的,上述实施例可以在路由器一侧、或者网络提供商一侧实施,提前过滤相关视频。Further, the above embodiment may be implemented on the router side or the network provider side to filter related videos in advance.
与方法相对应的,参见图2,本公开在另一个实施例中揭示了一种识别有害视频的系统,包括:Corresponding to the method, referring to FIG. 2, the present disclosure discloses, in another embodiment, a system for identifying harmful videos, including:
第一权重因子生成模块,用于:当判断出网页的页面元素包括视频的URL路径时,识别所述网页的页面内容中记载的用户的IP地址或IP地址段,在第一数据库中查询是否存在所述IP地址或同一网段IP地址,并根据用户IP地址的查询结果输出与IP相关的第一权重因子;a first weighting factor generating module, configured to: when determining that a page element of the webpage includes a URL path of the video, identify an IP address or an IP address segment of the user recorded in the page content of the webpage, and query whether the first database is in the first database The IP address or the IP address of the same network segment exists, and the first weighting factor related to the IP is output according to the query result of the user IP address;
第二权重因子生成模块,用于:依据视频的URL路径获取所述URL中包含的域名或所述URL指向的IP地址,基于所述URL中包含的域名,在第二数据库中进行whois查询,和/或基于所述URL指向的IP地址,在第二数据库中查询是否存在所述URL中包含的IP地址或同一网段IP地址,并根据whois查询结果和/ 或IP地址的查询结果,输出与视频的URL路径相关的第二权重因子;a second weighting factor generating module, configured to: obtain a domain name included in the URL or an IP address pointed by the URL according to a URL path of the video, and perform a whois query in the second database according to the domain name included in the URL, And/or querying, according to the IP address pointed by the URL, whether the IP address or the same network segment IP address included in the URL exists in the second database, and outputting according to the query result of the whois query result and/or the IP address. a second weighting factor associated with the URL path of the video;
第三权重因子生成模块,用于:基于所述视频的URL路径和所述视频的在线播放设置中的最低画面质量,获取最低画面质量时的视频文件,并利用基于内容的视频拷贝检测技术,在预设的有害视频数据库中对所述最低画面质量的视频文件进行视频拷贝检测,并根据监测的结果输出第三权重因子;a third weighting factor generating module, configured to: acquire a video file with a minimum picture quality based on a URL path of the video and a minimum picture quality in an online play setting of the video, and utilize a content-based video copy detection technology, Performing video copy detection on the video file of the lowest picture quality in a preset harmful video database, and outputting a third weighting factor according to the monitored result;
识别模块,用于综合第一权重因子和第二权重因子以及第三权重因子,对所述视频是否属于有害视频进行识别。And an identifying module, configured to integrate the first weighting factor and the second weighting factor and the third weighting factor to identify whether the video belongs to a harmful video.
与前文各个方法的实施例所类似的,Similar to the embodiments of the various methods described above,
优选的,所述第二数据库为第三方数据库。Preferably, the second database is a third party database.
更优选的,第二权重因子生成模块还包括:More preferably, the second weighting factor generating module further includes:
修正单元,用于:进一步的,在第三方域名安全列表中查询所述域名的安全性以便输出安全因子,并通过所述安全因子对所述与域名相关的第二权重因子进行修正。And a correction unit, configured to: further query, in a third-party domain name security list, the security of the domain name to output a security factor, and modify the second weighting factor related to the domain name by using the security factor.
更优选的,所述第三权重因子生成模块中的获取最低画面质量时的视频文件,包括获取视频文件片尾的视频内容。More preferably, the video file when acquiring the lowest picture quality in the third weighting factor generating module includes acquiring video content at the end of the video file.
更优选的,所述第三权重因子生成模块中所述第三权重因子生成模块中还通过如下单元获取最低画面质量时的视频文件:More preferably, the third weighting factor generating module in the third weighting factor generating module further acquires the video file at the lowest picture quality by using the following unit:
音频提取单元,用于提取视频中的音频;An audio extraction unit for extracting audio in the video;
音频识别单元,用于识别音频中是否包括有害内容,如果有,则根据音频的起止时间获取所述起止时间内的视频内容。The audio recognition unit is configured to identify whether the harmful content is included in the audio, and if yes, acquire the video content in the start and end time according to the start and end time of the audio.
本公开在另一个实施例中揭示了一种识别有害视频的系统,包括:The present disclosure, in another embodiment, discloses a system for identifying harmful videos, including:
处理器及存储器,所述存储器中存储有可执行指令,所述处理器执行这些指 令以执行以下操作:A processor and a memory having executable instructions stored therein, the processor executing the instructions to perform the following operations:
步骤a),当判断出网页的页面元素包括视频的URL路径时,识别所述网页的页面内容中记载的用户的IP地址或IP地址段,在第一数据库中查询是否存在所述IP地址或同一网段IP地址,并根据用户IP地址的查询结果输出与IP相关的第一权重因子;Step a), when it is determined that the page element of the webpage includes a URL path of the video, identifying an IP address or an IP address segment of the user recorded in the page content of the webpage, and querying, in the first database, whether the IP address exists or IP address of the same network segment, and outputting a first weighting factor related to the IP according to the query result of the user IP address;
步骤b),依据视频的URL路径获取所述URL中包含的域名或所述URL指向的IP地址,基于所述URL中包含的域名,在第二数据库中进行whois查询,和/或基于所述URL指向的IP地址,在第二数据库中查询是否存在所述URL中包含的IP地址或同一网段IP地址,并根据whois查询结果和/或IP地址的查询结果,输出与视频的URL路径相关的第二权重因子;Step b): obtaining a domain name included in the URL or an IP address pointed by the URL according to a URL path of the video, performing a whois query in the second database based on the domain name included in the URL, and/or based on the The IP address pointed to by the URL, in the second database, whether the IP address included in the URL or the IP address of the same network segment exists, and the output is related to the URL path of the video according to the query result of the whois query and/or the IP address. Second weighting factor;
步骤c),基于所述视频的URL路径和所述视频的在线播放设置中的最低画面质量,获取最低画面质量时的视频文件,并利用基于内容的视频拷贝检测技术,在预设的有害视频数据库中对所述最低画面质量的视频文件进行视频拷贝检测,并根据监测的结果输出第三权重因子;Step c): acquiring a video file with the lowest picture quality based on the URL path of the video and the lowest picture quality in the online play setting of the video, and using the content-based video copy detection technology to preset the harmful video Performing video copy detection on the video file of the lowest picture quality in the database, and outputting a third weighting factor according to the monitored result;
步骤d),综合第一权重因子和第二权重因子以及第三权重因子,对所述视频是否属于有害视频进行识别。Step d), integrating the first weighting factor and the second weighting factor and the third weighting factor to identify whether the video belongs to a harmful video.
本公开在另一个实施例中还揭示了一种计算机存储介质,存储有可执行指令,所述指令用于执行如下识别有害视频的方法:The present disclosure, in another embodiment, also discloses a computer storage medium storing executable instructions for performing the following method of identifying harmful video:
步骤a),当判断出网页的页面元素包括视频的URL路径时,识别所述网页的页面内容中记载的用户的IP地址或IP地址段,在第一数据库中查询是否存在所述IP地址或同一网段IP地址,并根据用户IP地址的查询结果输出与IP相关的第一权重因子;Step a), when it is determined that the page element of the webpage includes a URL path of the video, identifying an IP address or an IP address segment of the user recorded in the page content of the webpage, and querying, in the first database, whether the IP address exists or IP address of the same network segment, and outputting a first weighting factor related to the IP according to the query result of the user IP address;
步骤b),依据视频的URL路径获取所述URL中包含的域名或所述URL指向的IP地址,基于所述URL中包含的域名,在第二数据库中进行whois查询,和/或基于所述URL指向的IP地址,在第二数据库中查询是否存在所述URL中包含的IP地址或同一网段IP地址,并根据whois查询结果和/或IP地址的查询结果,输出与视频的URL路径相关的第二权重因子;Step b): obtaining a domain name included in the URL or an IP address pointed by the URL according to a URL path of the video, performing a whois query in the second database based on the domain name included in the URL, and/or based on the The IP address pointed to by the URL, in the second database, whether the IP address included in the URL or the IP address of the same network segment exists, and the output is related to the URL path of the video according to the query result of the whois query and/or the IP address. Second weighting factor;
步骤c),基于所述视频的URL路径和所述视频的在线播放设置中的最低画面质量,获取最低画面质量时的视频文件,并利用基于内容的视频拷贝检测技术,在预设的有害视频数据库中对所述最低画面质量的视频文件进行视频拷贝检测,并根据监测的结果输出第三权重因子;Step c): acquiring a video file with the lowest picture quality based on the URL path of the video and the lowest picture quality in the online play setting of the video, and using the content-based video copy detection technology to preset the harmful video Performing video copy detection on the video file of the lowest picture quality in the database, and outputting a third weighting factor according to the monitored result;
步骤d),综合第一权重因子和第二权重因子以及第三权重因子,对所述视频是否属于有害视频进行识别。Step d), integrating the first weighting factor and the second weighting factor and the third weighting factor to identify whether the video belongs to a harmful video.
对于上述系统,其可以包括:至少一个处理器(例如CPU),至少一个传感器(例如加速度计、陀螺仪、GPS模块或其他定位模块),至少一个存储器,至少一个通信总线,其中,通信总线用于实现各个组件之间的连接通信。所述设备还可以包括至少一个接收器,至少一个发送器,其中,接收器和发送器可以是有线发送端口,也可以是无线设备(例如包括天线装置),用于与其他节点设备进行信令或数据的传输。所述存储器可以是高速RAM存储器,也可以是非不稳定的存储器(Non-volatile memory),例如至少一个磁盘存储器。存储器可选的可以是至少一个位于远离前述处理器的存储装置。存储器中存储一组程序代码,且所述处理器可通过通信总线,调用存储器中存储的代码以执行相关的功能。For the above system, it may comprise: at least one processor (eg CPU), at least one sensor (eg accelerometer, gyroscope, GPS module or other positioning module), at least one memory, at least one communication bus, wherein the communication bus To achieve connection communication between various components. The device may further include at least one receiver, at least one transmitter, wherein the receiver and the transmitter may be wired transmission ports, or may be wireless devices (including, for example, including antenna devices) for signaling with other node devices. Or the transmission of data. The memory may be a high speed RAM memory or a non-volatile memory such as at least one disk memory. The memory may optionally be at least one storage device located remotely from the aforementioned processor. A set of program code is stored in the memory, and the processor can call the code stored in the memory to perform related functions via the communication bus.
本公开的实施例还提供一种计算机存储介质,其中,该计算机存储介质可存储程序,该程序执行时包括上述方法实施例中记载的任何一种识别有害视频的方 法的部分或全部步骤。Embodiments of the present disclosure also provide a computer storage medium, wherein the computer storage medium can store a program that, when executed, includes some or all of the steps of any of the methods of identifying a harmful video as recited in the above method embodiments.
本公开的实施例方法中的步骤可以根据实际需要进行顺序调整、合并和删减。The steps in the method of the embodiment of the present disclosure may be sequentially adjusted, merged, and deleted according to actual needs.
本公开的实施例系统中的模块和单元可以根据实际需要进行合并、划分和删减。需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本发明并不受所描述的动作顺序的限制,因为依据本发明,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作、模块、单元并不一定是本发明所必须的。Modules and units in the system of the embodiments of the present disclosure may be combined, divided, and deleted according to actual needs. It should be noted that, for the foregoing method embodiments, for the sake of simple description, they are all expressed as a series of action combinations, but those skilled in the art should understand that the present invention is not limited by the described action sequence. Because certain steps may be performed in other sequences or concurrently in accordance with the present invention. In the following, those skilled in the art should also understand that the embodiments described in the specification are all preferred embodiments, and the actions, modules, and units involved are not necessarily required by the present invention.
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。In the above embodiments, the descriptions of the various embodiments are different, and the details that are not detailed in a certain embodiment can be referred to the related descriptions of other embodiments.
在本公开所提供的几个实施例中,应该理解到,所揭露的系统,可通过其它的方式实现。例如,以上所描述的实施例仅是示意性的,例如所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,各单元或组件相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性或其它的形式。In the several embodiments provided by the present disclosure, it should be understood that the disclosed system can be implemented in other manners. For example, the embodiments described above are merely illustrative. For example, the division of the unit is only a logical function division. In actual implementation, there may be another division manner, for example, multiple units or components may be combined or integrated. Go to another system, or some features can be ignored or not executed. In addition, the coupling or direct coupling or communication connection of the various units or components to each other may be an indirect coupling or communication connection through some interfaces, devices or units, and may be electrical or otherwise.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,既可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separate, may be located in one place, or may be distributed over multiple network elements. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
另外,本公开的各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独存在,也可以两个或两个以上单元集成在一个单元中。上述 集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present disclosure may be integrated into one processing unit, or each unit may exist separately, or two or more units may be integrated into one unit. The above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本公开的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可为智能手机、个人数字助理、可穿戴设备、笔记本电脑、平板电脑)执行本公开的各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。The integrated unit, if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present disclosure may contribute to the prior art or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium. A number of instructions are included to cause a computer device (which may be a smart phone, a personal digital assistant, a wearable device, a laptop, a tablet) to perform all or part of the steps of the methods described in various embodiments of the present disclosure. The foregoing storage medium includes: a U disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk, and the like. .
以上所述,以上实施例仅用以说明本公开的技术方案,而非对其限制;尽管参照前述实施例对本公开进行了详细的说明,本领域技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本公开的各实施例技术方案的范围。The above embodiments are only used to illustrate the technical solutions of the present disclosure, and are not intended to be limiting; although the present disclosure has been described in detail with reference to the foregoing embodiments, those skilled in the art will understand that The technical solutions described in the examples are modified, or equivalent to some of the technical features are included; and the modifications or substitutions do not depart from the scope of the technical solutions of the embodiments of the present disclosure.

Claims (12)

  1. 一种识别有害视频的方法,包括:A method of identifying unwanted videos, including:
    步骤a),当判断出网页的页面元素包括视频的URL路径时,识别所述网页的页面内容中记载的用户的IP地址或IP地址段,在第一数据库中查询是否存在所述IP地址或同一网段IP地址,并根据用户IP地址的查询结果输出与IP相关的第一权重因子;Step a), when it is determined that the page element of the webpage includes a URL path of the video, identifying an IP address or an IP address segment of the user recorded in the page content of the webpage, and querying, in the first database, whether the IP address exists or IP address of the same network segment, and outputting a first weighting factor related to the IP according to the query result of the user IP address;
    步骤b),依据视频的URL路径获取所述URL中包含的域名或所述URL指向的IP地址,基于所述URL中包含的域名,在第二数据库中进行whois查询,和/或基于所述URL指向的IP地址,在第二数据库中查询是否存在所述URL中包含的IP地址或同一网段IP地址,并根据whois查询结果和/或IP地址的查询结果,输出与视频的URL路径相关的第二权重因子;Step b): obtaining a domain name included in the URL or an IP address pointed by the URL according to a URL path of the video, performing a whois query in the second database based on the domain name included in the URL, and/or based on the The IP address pointed to by the URL, in the second database, whether the IP address included in the URL or the IP address of the same network segment exists, and the output is related to the URL path of the video according to the query result of the whois query and/or the IP address. Second weighting factor;
    步骤c),基于所述视频的URL路径和所述视频的在线播放设置中的最低画面质量,获取最低画面质量时的视频文件,并利用基于内容的视频拷贝检测技术,在预设的有害视频数据库中对所述最低画面质量的视频文件进行视频拷贝检测,并根据监测的结果输出第三权重因子;Step c): acquiring a video file with the lowest picture quality based on the URL path of the video and the lowest picture quality in the online play setting of the video, and using the content-based video copy detection technology to preset the harmful video Performing video copy detection on the video file of the lowest picture quality in the database, and outputting a third weighting factor according to the monitored result;
    步骤d),综合第一权重因子和第二权重因子以及第三权重因子,对所述视频是否属于有害视频进行识别。Step d), integrating the first weighting factor and the second weighting factor and the third weighting factor to identify whether the video belongs to a harmful video.
  2. 根据权利要求1所述的方法,其中,所述第二数据库为第三方数据库。The method of claim 1 wherein said second database is a third party database.
  3. 根据权利要求1所述的方法,其中,步骤b)还包括:The method of claim 1 wherein step b) further comprises:
    进一步的,在第三方域名安全列表中查询所述域名的安全性以便输出安全因子,并通过所述安全因子对所述第二权重因子进行修正。Further, the security of the domain name is queried in a third-party domain name security list to output a security factor, and the second weighting factor is corrected by the security factor.
  4. 根据权利要求1所述的方法,其中,步骤c)中的获取最低画面质量时的视频文件,包括获取视频文件片尾的视频内容。The method according to claim 1, wherein the obtaining the video file at the lowest picture quality in step c) comprises acquiring the video content at the end of the video file.
  5. 根据权利要求1所述的方法,其中,步骤c)中的获取最低画面质量时的视频文件,还包括如下:The method according to claim 1, wherein the video file at the time of obtaining the lowest picture quality in step c) further comprises the following:
    步骤c1):提取视频中的音频;Step c1): extracting audio in the video;
    步骤c2):识别音频中是否包括有害内容,如果有,则根据音频的起止时间获取所述起止时间内的视频内容。Step c2): Identify whether harmful content is included in the audio, and if so, obtain the video content in the start and end time according to the start and end time of the audio.
  6. 一种识别有害视频的系统,包括:A system for identifying harmful videos, including:
    第一权重因子生成模块,用于:当判断出网页的页面元素包括视频的URL路径时,识别所述网页的页面内容中记载的用户的IP地址或IP地址段,在第一数据库中查询是否存在所述IP地址或同一网段IP地址,并根据用户IP地址的查询结果输出与IP相关的第一权重因子;a first weighting factor generating module, configured to: when determining that a page element of the webpage includes a URL path of the video, identify an IP address or an IP address segment of the user recorded in the page content of the webpage, and query whether the first database is in the first database The IP address or the IP address of the same network segment exists, and the first weighting factor related to the IP is output according to the query result of the user IP address;
    第二权重因子生成模块,用于:依据视频的URL路径获取所述URL中包含的域名或所述URL指向的IP地址,基于所述URL中包含的域名,在第二数据库中进行whois查询,和/或基于所述URL指向的IP地址,在第二数据库中查询是否存在所述URL中包含的IP地址或同一网段IP地址,并根据whois查询结果和/或IP地址的查询结果,输出与视频的URL路径相关的第二权重因子;a second weighting factor generating module, configured to: obtain a domain name included in the URL or an IP address pointed by the URL according to a URL path of the video, and perform a whois query in the second database according to the domain name included in the URL, And/or based on the IP address pointed by the URL, querying in the second database whether the IP address or the same network segment IP address included in the URL exists, and outputting according to the query result of the whois query result and/or the IP address. a second weighting factor associated with the URL path of the video;
    第三权重因子生成模块,用于:基于所述视频的URL路径和所述视频的在线播放设置中的最低画面质量,获取最低画面质量时的视频文件,并利用基于内容的视频拷贝检测技术,在预设的有害视频数据库中对所述最低画面质量的视频文件进行视频拷贝检测,并根据监测的结果输出第三权重因子;a third weighting factor generating module, configured to: acquire a video file with a minimum picture quality based on a URL path of the video and a minimum picture quality in an online play setting of the video, and utilize a content-based video copy detection technology, Performing video copy detection on the video file of the lowest picture quality in a preset harmful video database, and outputting a third weighting factor according to the monitored result;
    识别模块,用于综合第一权重因子和第二权重因子以及第三权重因子,对所述视频是否属于有害视频进行识别。And an identifying module, configured to integrate the first weighting factor and the second weighting factor and the third weighting factor to identify whether the video belongs to a harmful video.
  7. 根据权利要求6所述的系统,其中,优选的,所述第二数据库为第三方 数据库。The system of claim 6 wherein preferably said second database is a third party database.
  8. 根据权利要求6所述的系统,其中,第二权重因子生成模块还包括:The system of claim 6 wherein the second weighting factor generation module further comprises:
    修正单元,用于:进一步的,在第三方域名安全列表中查询所述域名的安全性以便输出安全因子,并通过所述安全因子对所述第二权重因子进行修正。And a correction unit, configured to: further query, in a third-party domain name security list, the security of the domain name to output a security factor, and modify the second weighting factor by using the security factor.
  9. 根据权利要求6所述的系统,其中,所述第三权重因子生成模块中的获取最低画面质量时的视频文件,包括获取视频文件片尾的视频内容。The system according to claim 6, wherein the video file when obtaining the lowest picture quality in the third weighting factor generating module comprises acquiring video content at the end of the video file.
  10. 根据权利要求6所述的系统,其中,所述第三权重因子生成模块中还通过如下单元获取最低画面质量时的视频文件:The system according to claim 6, wherein the third weight factor generation module further acquires a video file at the lowest picture quality by the following unit:
    音频提取单元,用于提取视频中的音频;An audio extraction unit for extracting audio in the video;
    音频识别单元,用于识别音频中是否包括有害内容,如果有,则根据音频的起止时间获取所述起止时间内的视频内容。The audio recognition unit is configured to identify whether the harmful content is included in the audio, and if yes, acquire the video content in the start and end time according to the start and end time of the audio.
  11. 一种识别有害视频的系统,包括:A system for identifying harmful videos, including:
    处理器及存储器,所述存储器中存储有可执行指令,所述处理器执行这些指令以执行以下操作:a processor and a memory having stored therein executable instructions, the processor executing the instructions to perform the following operations:
    步骤a),当判断出网页的页面元素包括视频的URL路径时,识别所述网页的页面内容中记载的用户的IP地址或IP地址段,在第一数据库中查询是否存在所述IP地址或同一网段IP地址,并根据用户IP地址的查询结果输出与IP相关的第一权重因子;Step a), when it is determined that the page element of the webpage includes a URL path of the video, identifying an IP address or an IP address segment of the user recorded in the page content of the webpage, and querying, in the first database, whether the IP address exists or IP address of the same network segment, and outputting a first weighting factor related to the IP according to the query result of the user IP address;
    步骤b),依据视频的URL路径获取所述URL中包含的域名或所述URL指向的IP地址,基于所述URL中包含的域名,在第二数据库中进行whois查询,和/或基于所述URL指向的IP地址,在第二数据库中查询是否存在所述URL中包含的IP地址或同一网段IP地址,并根据whois查询结果和/或IP地址的查询结果, 输出与视频的URL路径相关的第二权重因子;Step b): obtaining a domain name included in the URL or an IP address pointed by the URL according to a URL path of the video, performing a whois query in the second database based on the domain name included in the URL, and/or based on the The IP address pointed to by the URL, in the second database, whether the IP address included in the URL or the IP address of the same network segment exists, and according to the query result of the whois query and/or the IP address, the output is related to the URL path of the video. Second weighting factor;
    步骤c),基于所述视频的URL路径和所述视频的在线播放设置中的最低画面质量,获取最低画面质量时的视频文件,并利用基于内容的视频拷贝检测技术,在预设的有害视频数据库中对所述最低画面质量的视频文件进行视频拷贝检测,并根据监测的结果输出第三权重因子;Step c): acquiring a video file with the lowest picture quality based on the URL path of the video and the lowest picture quality in the online play setting of the video, and using the content-based video copy detection technology to preset the harmful video Performing video copy detection on the video file of the lowest picture quality in the database, and outputting a third weighting factor according to the monitored result;
    步骤d),综合第一权重因子和第二权重因子以及第三权重因子,对所述视频是否属于有害视频进行识别。Step d), integrating the first weighting factor and the second weighting factor and the third weighting factor to identify whether the video belongs to a harmful video.
  12. 一种计算机存储介质,存储有可执行指令,所述指令用于执行如下识别有害视频的方法:A computer storage medium storing executable instructions for performing the following method of identifying harmful video:
    步骤a),当判断出网页的页面元素包括视频的URL路径时,识别所述网页的页面内容中记载的用户的IP地址或IP地址段,在第一数据库中查询是否存在所述IP地址或同一网段IP地址,并根据用户IP地址的查询结果输出与IP相关的第一权重因子;Step a), when it is determined that the page element of the webpage includes a URL path of the video, identifying an IP address or an IP address segment of the user recorded in the page content of the webpage, and querying, in the first database, whether the IP address exists or IP address of the same network segment, and outputting a first weighting factor related to the IP according to the query result of the user IP address;
    步骤b),依据视频的URL路径获取所述URL中包含的域名或所述URL指向的IP地址,基于所述URL中包含的域名,在第二数据库中进行whois查询,和/或基于所述URL指向的IP地址,在第二数据库中查询是否存在所述URL中包含的IP地址或同一网段IP地址,并根据whois查询结果和/或IP地址的查询结果,输出与视频的URL路径相关的第二权重因子;Step b): obtaining a domain name included in the URL or an IP address pointed by the URL according to a URL path of the video, performing a whois query in the second database based on the domain name included in the URL, and/or based on the The IP address pointed to by the URL, in the second database, whether the IP address included in the URL or the IP address of the same network segment exists, and the output is related to the URL path of the video according to the query result of the whois query and/or the IP address. Second weighting factor;
    步骤c),基于所述视频的URL路径和所述视频的在线播放设置中的最低画面质量,获取最低画面质量时的视频文件,并利用基于内容的视频拷贝检测技术,在预设的有害视频数据库中对所述最低画面质量的视频文件进行视频拷贝检测,并根据监测的结果输出第三权重因子;Step c): acquiring a video file with the lowest picture quality based on the URL path of the video and the lowest picture quality in the online play setting of the video, and using the content-based video copy detection technology to preset the harmful video Performing video copy detection on the video file of the lowest picture quality in the database, and outputting a third weighting factor according to the monitored result;
    步骤d),综合第一权重因子和第二权重因子以及第三权重因子,对所述视频是否属于有害视频进行识别。Step d), integrating the first weighting factor and the second weighting factor and the third weighting factor to identify whether the video belongs to a harmful video.
PCT/CN2018/072240 2017-12-30 2018-01-11 User ip and video copy-based harmful video identification method and system WO2019127656A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201711499943.3A CN110020254A (en) 2017-12-30 2017-12-30 The method and system of the harmful video of identification based on User IP and video copy
CN201711499943.3 2017-12-30

Publications (1)

Publication Number Publication Date
WO2019127656A1 true WO2019127656A1 (en) 2019-07-04

Family

ID=67064475

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/072240 WO2019127656A1 (en) 2017-12-30 2018-01-11 User ip and video copy-based harmful video identification method and system

Country Status (2)

Country Link
CN (1) CN110020254A (en)
WO (1) WO2019127656A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112347244B (en) * 2019-08-08 2023-07-25 四川大学 Yellow-based and gambling-based website detection method based on mixed feature analysis

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101853377A (en) * 2010-05-13 2010-10-06 复旦大学 Method for identifying content of digital video
CN102547794A (en) * 2012-01-12 2012-07-04 郑州金惠计算机系统工程有限公司 Identification and supervision platform for pornographic images and videos and inappropriate contents on wireless application protocol (WAP)-based mobile media
CN103634306A (en) * 2013-11-18 2014-03-12 北京奇虎科技有限公司 Security detection method and security detection server for network data
CN105591997A (en) * 2014-10-20 2016-05-18 杭州迪普科技有限公司 URL (uniform resource locator) classification and filtering method and device
CN105631015A (en) * 2015-12-31 2016-06-01 宁波领视信息科技有限公司 Intelligent multimedia player
CN106354800A (en) * 2016-08-26 2017-01-25 中国互联网络信息中心 Undesirable website detection method based on multi-dimensional feature
CN106973305A (en) * 2017-03-20 2017-07-21 广东小天才科技有限公司 The detection method and device of harmful content in a kind of video
US20170289624A1 (en) * 2016-04-01 2017-10-05 Samsung Electrônica da Amazônia Ltda. Multimodal and real-time method for filtering sensitive media

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9143840B2 (en) * 2013-05-20 2015-09-22 Veenome, Inc. Systems and methods for evaluating online videos
CN103634317A (en) * 2013-11-28 2014-03-12 北京奇虎科技有限公司 Method and system of performing safety appraisal on malicious web site information on basis of cloud safety
CN104615760B (en) * 2015-02-13 2018-04-13 北京瑞星网安技术股份有限公司 Fishing website recognition methods and system
CN106055574B (en) * 2016-05-19 2019-12-24 微梦创科网络科技(中国)有限公司 Method and device for identifying illegal uniform resource identifier (URL)

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101853377A (en) * 2010-05-13 2010-10-06 复旦大学 Method for identifying content of digital video
CN102547794A (en) * 2012-01-12 2012-07-04 郑州金惠计算机系统工程有限公司 Identification and supervision platform for pornographic images and videos and inappropriate contents on wireless application protocol (WAP)-based mobile media
CN103634306A (en) * 2013-11-18 2014-03-12 北京奇虎科技有限公司 Security detection method and security detection server for network data
CN105591997A (en) * 2014-10-20 2016-05-18 杭州迪普科技有限公司 URL (uniform resource locator) classification and filtering method and device
CN105631015A (en) * 2015-12-31 2016-06-01 宁波领视信息科技有限公司 Intelligent multimedia player
US20170289624A1 (en) * 2016-04-01 2017-10-05 Samsung Electrônica da Amazônia Ltda. Multimodal and real-time method for filtering sensitive media
CN106354800A (en) * 2016-08-26 2017-01-25 中国互联网络信息中心 Undesirable website detection method based on multi-dimensional feature
CN106973305A (en) * 2017-03-20 2017-07-21 广东小天才科技有限公司 The detection method and device of harmful content in a kind of video

Also Published As

Publication number Publication date
CN110020254A (en) 2019-07-16

Similar Documents

Publication Publication Date Title
US11483268B2 (en) Content navigation with automated curation
US10867221B2 (en) Computerized method and system for automated determination of high quality digital content
US10296534B2 (en) Storing and searching fingerprints derived from media content based on a classification of the media content
US11423126B2 (en) Computerized system and method for modifying a media file by automatically applying security features to select portions of media file content
JP2019527444A (en) System and method for identifying matching content
US20200349385A1 (en) Multimedia resource matching method and apparatus, storage medium, and electronic apparatus
KR20160074500A (en) Mobile video search
US20220345435A1 (en) Automated image processing and insight presentation
CN107766399A (en) For the method and system and machine readable media for image is matched with content item
TW201931163A (en) Image search and index building
WO2019127660A1 (en) Method and system for identifying harmful pictures based on user id
WO2019127652A1 (en) Method for identifying harmful video on basis of user id and credits content and system therefor
WO2019127656A1 (en) User ip and video copy-based harmful video identification method and system
WO2019127653A1 (en) Method for identifying harmful video on basis of credits content and system therefor
US8971645B1 (en) Video categorization using heterogeneous signals
WO2019127659A1 (en) Method and system for identifying harmful video based on user id
WO2019127651A1 (en) Method and system thereof for identifying malicious video
WO2019127655A1 (en) Method and system for identifying harmful video on basis of user id and video copy
WO2019127654A1 (en) Method and system for identifying harmful videos on basis of user ip and credits content
WO2019127657A1 (en) Method and system for identifying harmful video through content-based video copy
WO2019127661A1 (en) User ip-based harmful video identification method and system thereof
WO2019127662A1 (en) Method and system for identifying harmful picture on basis of user ip
WO2019127658A1 (en) Method and system for identifying malicious image on the basis of url paths of similar images
WO2019127663A1 (en) Harmful picture identification method and system therefor
CN110069649B (en) Graphic file retrieval method, graphic file retrieval device, graphic file retrieval equipment and computer readable storage medium

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18894479

Country of ref document: EP

Kind code of ref document: A1