WO2019127656A1

WO2019127656A1 - User ip and video copy-based harmful video identification method and system

Info

Publication number: WO2019127656A1
Application number: PCT/CN2018/072240
Authority: WO
Inventors: 蔡昭权; 胡松; 胡辉; 蔡映雪; 陈伽; 黄翰; 梁椅辉; 罗伟; 黄思博
Original assignee: 惠州学院
Priority date: 2017-12-30
Filing date: 2018-01-11
Publication date: 2019-07-04
Also published as: CN110020254A

Abstract

A harmful video identification method and a system, said method comprising: when it is determined that page elements of a webpage include the URL path of a video, identifying a user's IP address or IP address field contained in the page content of the webpage; acquiring, according to the URL path of the video, a domain name included in the URL or an IP address indicated by the URL; outputting a first weight factor and a second weight factor on the basis of the related query of the IP address and the domain name; acquiring a video file having the lowest picture quality, performing, in a predetermined harmful video database, video copy detection on the video file having the lowest picture quality, and outputting a third weight factor according to a monitoring result; and integrating the first weight factor, the second weight factor and the third weight factor to identify whether the video is a harmful video. In conjunction with a database created on the basis of big data, the present disclosure can provide, in multiple modes, a harmful video identification solution using as few image processing means as possible.

Description

Method and system for identifying harmful video based on user IP and video copy

Technical field

The present disclosure pertains to the field of information security, for example, to a method of identifying harmful video and a system therefor.

Background technique

In the information society, information flows are everywhere, including but not limited to text, video, audio, pictures, and so on. Among them, video files often include auditory information and visual information, and the expression ability is more comprehensive. However, with the popularity of the mobile Internet, the Internet is flooded with harmful video content, such as videos involving illegal content such as drugs, pornography, violence, or harmful videos that are induced to join cults, suicide groups, criminal groups, etc., due to visual intuition, Impact and other characteristics are more harmful than harmful texts, harmful pictures and harmful audio. Therefore, it is necessary to identify these harmful videos, and then filter, delete and eliminate the hazards.

For the identification of harmful video on the network, the current technology can be divided into two major categories, one is the traditional method, which includes two categories: (1) the identification method based on single-modal features. This type of method is mainly to extract the visual features of the video, and construct a classifier based on these features. For example, in violent video recognition, common features are video motion vectors, colors, textures, and shapes. (2) Recognition method based on multi-modal feature fusion. This method mainly extracts features of multiple modalities of video and fuses them to construct a classifier. For example, in violent video recognition, in addition to video features, many methods also extract audio features, including short-term energy, bursty sounds, and the like. Some methods also consider text around the web video, and continue to extract features from these texts for fusion recognition. The other is the method of deep learning: (1) CNN uses the convolutional neural network to identify and deal with sensitive and harmful images in the database, obtain the internal features of the harmful sensitive video, and judge the obtained video frame by using the learned harmful video frame. Is there any harmful information? (2) RNN cycle The neural network directly inputs the video sequence in the database into the circular neural network to identify harmful video information, learns the frame of harmful video, and uses the learned harmful video frame to judge whether the new video is harmful video. (3) CNN+RNN, using CNN to learn the spatial domain information in the image frame in the video, use RNN to identify the time domain information in the video sequence, and finally combine the two to identify and judge, and use the learned framework to identify the video.

The existing image processing methods mainly include the following two methods: the traditional method and the deep learning method. Among the traditional methods, the classic method word package model consists of four parts: (1) the underlying feature extraction stage (2) feature coding (3) feature aggregation (4) classification using appropriate classifiers. The deep learning model is another model of image processing, mainly including self-encoder, restricted Boltzmann machine, deep belief network, convolutional neural network, and cyclic neural network. With the continuous advancement of computer hardware and the improvement of the database, the traditional method is simpler than the deep learning. The deep learning method can learn more meaningful data and continuously adjust the parameters according to the task. In terms of image processing, the deep learning model has more powerful feature expression capabilities.

Existing identification methods have shortcomings in recognition efficiency. In the case of the development of big data and artificial intelligence, how to effectively identify harmful videos becomes a problem to be considered.

Summary of the invention

The present disclosure provides a method of identifying harmful videos, including:

Step a), when it is determined that the page element of the webpage includes a URL path of the video, identifying an IP address or an IP address segment of the user recorded in the page content of the webpage, and querying, in the first database, whether the IP address exists or IP address of the same network segment, and outputting a first weighting factor related to the IP according to the query result of the user IP address;

Step b): obtaining a domain name included in the URL or an IP address pointed by the URL according to a URL path of the video, performing a whois query in the second database based on the domain name included in the URL, and/or based on the The IP address pointed to by the URL, in the second database, whether the IP address included in the URL or the IP address of the same network segment exists, and the output is related to the URL path of the video according to the query result of the whois query and/or the IP address. Second weighting factor;

Step c): acquiring a video file with the lowest picture quality based on the URL path of the video and the lowest picture quality in the online play setting of the video, and using the content-based video copy detection technology to preset the harmful video Performing video copy detection on the video file of the lowest picture quality in the database, and outputting a third weighting factor according to the monitored result;

Step d), integrating the first weighting factor and the second weighting factor and the third weighting factor to identify whether the video belongs to a harmful video.

In addition, the present disclosure also discloses a system for identifying harmful videos, including:

a first weighting factor generating module, configured to: when determining that a page element of the webpage includes a URL path of the video, identify an IP address or an IP address segment of the user recorded in the page content of the webpage, and query whether the first database is in the first database The IP address or the IP address of the same network segment exists, and the first weighting factor related to the IP is output according to the query result of the user IP address;

a second weighting factor generating module, configured to: obtain a domain name included in the URL or an IP address pointed by the URL according to a URL path of the video, and perform a whois query in the second database according to the domain name included in the URL, And/or based on the IP address pointed by the URL, querying in the second database whether the IP address or the same network segment IP address included in the URL exists, and outputting according to the query result of the whois query result and/or the IP address. a second weighting factor associated with the URL path of the video;

a third weighting factor generating module, configured to: acquire a video file with a minimum picture quality based on a URL path of the video and a minimum picture quality in an online play setting of the video, and utilize a content-based video copy detection technology, Performing video copy detection on the video file of the lowest picture quality in a preset harmful video database, and outputting a third weighting factor according to the monitored result;

And an identifying module, configured to integrate the first weighting factor and the second weighting factor and the third weighting factor to identify whether the video belongs to a harmful video.

Through the method and its system, the present disclosure can provide a more efficient scheme for identifying harmful videos by combining the database created by big data with as few image processing methods as possible.

DRAWINGS

Figure 1 is a schematic illustration of the method of one embodiment of the present disclosure;

2 is a schematic diagram of a system in accordance with an embodiment of the present disclosure.

Detailed ways

In order to make those skilled in the art understand the technical solutions disclosed in the present disclosure, the technical solutions of the various embodiments will be described below in conjunction with the embodiments and related drawings, which are a part of the embodiments of the present disclosure, instead of All embodiments. The terms "first", "second", etc., as used in this disclosure, are used to distinguish different objects, and are not intended to describe a particular order. Moreover, "including" and "having" and any variations thereof are intended to be inclusive and not exclusive. For example, a process, or method, or system, or product or device that comprises a series of steps or units is not limited to the listed steps or units, but optionally includes steps or units not listed, or optional Also includes other steps or units inherent to these processes, methods, systems, products, or devices.

References to "an embodiment" herein mean that a particular feature, structure, or characteristic described in connection with the embodiments can be included in at least one embodiment of the present disclosure. The appearances of the phrases in various places in the specification are not necessarily referring to the same embodiments, and are not exclusive or alternative embodiments that are mutually exclusive. Those skilled in the art will appreciate that the embodiments described herein can be combined with other embodiments.

Referring to FIG. 1 , FIG. 1 is a schematic flowchart of a method for identifying a harmful video according to an embodiment of the present disclosure. As shown, the method includes:

Step S100: When it is determined that the page element of the webpage includes the URL path of the video, identify the IP address or IP address segment of the user recorded in the page content of the webpage, and query whether the IP address or the same exists in the first database. The IP address of the network segment, and outputting a first weighting factor related to the IP according to the query result of the user IP address;

It can be understood that the first database maintains a list of known IP addresses or IP address segments of users who have posted harmful videos in the web page.

For example, when it is recognized that the IP address of the user recorded in the content of the web page is 192.168.10.3:

If the IP address is recorded in the first database, the first weighting factor may be exemplarily 1.0;

If the IP address recorded in the database is only 192.168.10.4, then 192.168.10.3 is moderately suspected as the alternate address of the user who has posted harmful video or the newly replaced address, and the first weighting factor can be exemplified as 0.6;

If the IP address recorded in the database is 192.168.10.4 and 192.168.10.5, and even all the IP addresses of the 192.168.10.X network segment are recorded, then 192.168.10.3 is highly suspected as the alternate address of the user who has posted harmful video. Or a newly replaced address, the first weighting factor can be exemplified as 0.9;

If the IP address recorded in the database includes multiple 192.168.XX network segments and no 192.168.10.X network segment, then 192.168.10.3 is cautiously suspected as the address of the user who has posted harmful video, the first weighting factor. It can be exemplified as 0.4.

Step S200: Obtain a domain name included in the URL or an IP address pointed by the URL according to a URL path of the video, perform a whois query in the second database based on the domain name included in the URL, and/or based on the URL Pointing to the IP address, querying in the second database whether the IP address or the IP address of the same network segment exists in the URL, and outputting the URL path related to the video according to the query result of the whois query result and/or the IP address Second weighting factor;

It can be understood that the second database maintains a list of known domain names that have posted harmful videos, and/or a list of known IP addresses and IP address segments of websites that have posted harmful videos.

The Whois query is to investigate the association of domain name registrants with harmful videos. The second database can maintain the following information: domain name, information on the Internet, a large number of pornographic videos, violent videos, reactionary videos, or cult videos, and the corresponding harmful video.

For example, if the domain name is www.a.com:

If the domain name address, the identifier of the corresponding harmful video, and its whois information are recorded in the second database, the second weighting factor may be exemplarily 1.0;

If the second database does not record the identifier of any harmful video of the above domain name www.a.com, but can query the domain name registrant of the domain name, and the domain name of other websites registered by the domain name registrant of the domain name, and the second database Including the logo of the other website publishing a large number of harmful videos on the Internet, even if the second database does not record the identifier of any harmful video of the above domain name www.a.com, the website corresponding to the domain name of www.a.com is still highly Suspected to be a source of harmful video, the second weighting factor can be exemplified as 0.9;

If the second database does not record the identifier of any harmful video of the above domain name www.a.com, but can query the domain name registrant of the domain name, and the domain name of other websites registered by the domain name registrant of the domain name, the second database Does not include any identifier for the other website to publish harmful videos, the second weighting factor may be exemplarily 0;

It is easy to understand that if the second database does not record the identifier of any harmful video of the above domain name www.a.com, and the domain name of other websites registered by the domain name registrant of the domain name is not queried, the second weighting factor may also An example is 0.

Exemplarily, the IP address pointed by the URL may be obtained according to the URL path of the video, and the IP address/IP address segment query is performed to output the second weighting factor.

For example, if the IP address is 192.168.20.3:

If the IP address is recorded in the second database, the second weighting factor may be exemplarily 1.0;

If the IP address recorded in the second database is only 192.168.20.4, then 192.168.20.3 is moderately suspected as the alternate address of the website to which the video belongs or the newly replaced address, and the second weighting factor may be exemplified as 0.6;

If the IP address recorded in the second database is 192.168.20.4 and 192.168.20.5, and even all the IP addresses of the 192.168.20.X network segment are recorded, then 192.168.20.3 is highly suspected to be the alternate address of the website to which the video belongs or The newly replaced address, the second weighting factor can be exemplified as 0.9;

If the IP address recorded in the database includes multiple 192.168.XX network segments and there is no 192.168.20.X network segment, then 192.168.20.3 is cautiously suspected to be the address of the harmful video website, and the second weighting factor can be exemplified. Is 0.4.

In particular, the above steps also have a situation in which the IP list and the domain name list are comprehensively considered, that is, the case where the second weighting factor is jointly determined by the IP query of the picture URL and the domain name whois query.

Suppose the IP query factor of the picture URL is i, the domain name whois query factor is j, and the second weighting factor is y, where 0≤i≤1, 0≤j≤1, 0≤y≤1, and the second formula can be determined according to the following formula Weighting factor:

y=m×i+n×j, where m+n=1, m and n represent the weights of the IP query factor and the domain name whois query factor, respectively.

For example, m=n=1/2;

For example, m and n are not equal, and may be adjusted according to the weight of each query factor and the actual situation of determining the second weighting factor.

It can be understood that the closer y is to 1, the heavier the second weighting factor is, and the greater the probability that the related picture belongs to a harmful picture.

The above formula for calculating y belongs to the linear formula, but in practical applications, a nonlinear formula may also be used.

Further, whether it is a linear formula or a nonlinear formula, it can be considered to determine the relevant formula and its parameters by training or fitting.

Step S300, acquiring a video file with the lowest picture quality based on the URL path of the video and the lowest picture quality in the online play setting of the video, and using the content-based video copy detection technology in the preset harmful video database Performing video copy detection on the video file of the lowest picture quality, and outputting a third weighting factor according to the monitored result;

This step S300 is based on video copy detection of the content, and outputs a third weighting factor by the result of the monitoring. It can be understood that the preset harmful video database includes such as pornographic image information, violent screen information, reactionary characters, cult identification or other unhealthy content, and the preset harmful video database can be established in combination with big data technology, and can It is constantly being updated. If the video file at the lowest picture quality is determined by the monitoring result as: a suspected copy version of a video in the preset harmful video database, the third weighting factor is reflected. It can be understood that when the corresponding threshold condition is met, the third weighting factor may be 1.0 or 0.8 or 0.4 depending on the specific threshold condition.

In addition, it should be emphasized that in order to reduce the computational resources and time cost required by the embodiment, the video file at the lowest picture quality is obtained based on the URL path of the video and the lowest picture quality in the online play setting of the video. . Obviously, the inventors made full use of the video content corresponding to the lowest picture quality in today's video playback settings for efficient video copy detection. However, this does not mean that the minimum picture or the low picture picture must be obtained by the play setting, because the video content corresponding to the low picture quality can also be obtained by various samples and the video copy detection is further implemented.

It can be understood that the step S300 can perform video processing in combination with a traditional method, or can perform video processing in combination with a deep learning model to identify harmful videos.

Step S400, synthesizing the first weighting factor and the second weighting factor and the third weighting factor to identify whether the video belongs to a harmful video.

Exemplarily, the first weighting factor is x, the second weighting factor is y, and the third weighting factor is z, wherein 0≤x≤1, 0≤y≤1, 0≤z≤1, which can be integrated according to the following formula The above weighting factor calculates the harmful coefficient of the video W:

W = a × x + b × y + c × z, where a + b + c = 1, a, b, c respectively represent the weight of each weighting factor.

For example, a=b=c=1/3;

For example, a, b, and c are not equal, and may be adjusted according to each weighting factor and the actual situation of identifying harmful content.

It can be understood that the closer W is to 1, the greater the probability that the related video belongs to harmful video.

The formula for calculating W above is a linear formula, but in practice, a nonlinear formula may also be used.

In summary, for the above embodiment, only step S300 performs image processing, and the remaining steps are different ways, utilizing related queries and obtaining related weighting factors. Step S400 combines (also referred to as fusion) multiple weighting factors to identify harmful videos. Those skilled in the art are aware that processing and identifying each frame of video is very time consuming, while queries are relatively more time-saving. It will be apparent that the above embodiment proposes an efficient method of identifying harmful video. Additionally, the above-described embodiments are apparently capable of further integrating and updating the first database, the second database, and other databases in conjunction with big data and/or artificial intelligence.

In another embodiment, the second database is a third party database.

For example, a number of websites that perform whois queries, as well as lists of pornographic websites maintained by third parties, lists of violent websites, lists of reaction websites, databases of cult website lists, or lists of IP addresses and IP address lists of websites that record harmful pictures. database.

In another embodiment, for the URL (such as a forum or a webpage) whose source is determined to be harmful after the identification, the IP address information of the publisher of the harmful video recorded on the web address is collected and updated first. database. This is because harmful videos generally form sticky users. Some of these users will participate in the transmission of harmful videos and most of the IP addresses will be relatively fixed. If the relevant URL itself records the IP address information of the publisher of the harmful video, The present disclosure updates the aforementioned first database by collecting its IP address information.

In another embodiment, step S200 further includes:

Further, the security of the domain name is queried in a third-party domain name security list to output a security factor, and the second weighting factor related to the domain name is corrected by the security factor.

For example, virustotal.com is a third-party domain name security screening website. It can be understood that if the third party information considers that the relevant domain name contains a virus or a Trojan, the second weighting factor should be raised, which is rooted in the fact that the related website is more insecure.

It can be appreciated that the described embodiment focuses on correcting the second weighting factor from a network security perspective to prevent the user from suffering other losses. This is because cyber security is related to the privacy and property rights of users. If the websites related to harmful videos have network security risks, they will bring privacy leakage or property damage to users in addition to the harmful video.

In another embodiment, the obtaining the video file at the lowest picture quality in step S300 comprises acquiring the video content at the end of the video file.

For the embodiment, it means that when the video content is acquired, in order to minimize the size of the acquired video content, the video content at the end of the video file is preferentially selected. This is because, for harmful videos, whether it is pornographic video, violent video, or reactionary video, the ending is often the climax of the plot, and the spreaders of these harmful videos, whether for the sake of good or political or cult motivation, It is generally impossible to delete the climax of the end of the film. That is to say, for the present embodiment, it greatly reduces the workload of video copy detection. It should be added that this embodiment is a preferred embodiment, and does not mean that the video content cannot select the corresponding content from the first 1/3 playing time period of the video, or select the corresponding content from the middle 1/3 playing time period.

Preferably, the video content at the end of the video may be the content within the last 1/3 playback time interval of the video. More preferably, the video content at the end of the video may be the content within a few minutes of the end of the video, for example, 3 minutes, 5 minutes, 10 minutes; no matter how many minutes, if the last 1/3 playback time is smaller, then the natural preference is 1/ 3 The corresponding content in the playback time period.

In another embodiment, in step S300

The video file when obtaining the lowest picture quality also includes the following:

Step c1): extracting audio in the video;

Step c2): Identify whether harmful content is included in the audio, and if so, obtain the video content in the start and end time according to the start and end time of the audio.

For this embodiment, if it is recognized that the audio includes extreme content such as pornographic content, violent content, reactionary political speech, cult inflammatory speech, or horrific hatred, then the time is located, based on the start and end time of the audio. Video content during the start and end time. This can find relevant harmful images more specifically.

As described above, if combined with big data technology, the present disclosure can effectively combine multiple dimensions and multiple modes, and combine IP information, domain name information, video information, and audio information to quickly identify harmful videos.

Further, the above embodiment may be implemented on the router side or the network provider side to filter related videos in advance.

Corresponding to the method, referring to FIG. 2, the present disclosure discloses, in another embodiment, a system for identifying harmful videos, including:

a second weighting factor generating module, configured to: obtain a domain name included in the URL or an IP address pointed by the URL according to a URL path of the video, and perform a whois query in the second database according to the domain name included in the URL, And/or querying, according to the IP address pointed by the URL, whether the IP address or the same network segment IP address included in the URL exists in the second database, and outputting according to the query result of the whois query result and/or the IP address. a second weighting factor associated with the URL path of the video;

Similar to the embodiments of the various methods described above,

Preferably, the second database is a third party database.

More preferably, the second weighting factor generating module further includes:

And a correction unit, configured to: further query, in a third-party domain name security list, the security of the domain name to output a security factor, and modify the second weighting factor related to the domain name by using the security factor.

More preferably, the video file when acquiring the lowest picture quality in the third weighting factor generating module includes acquiring video content at the end of the video file.

More preferably, the third weighting factor generating module in the third weighting factor generating module further acquires the video file at the lowest picture quality by using the following unit:

An audio extraction unit for extracting audio in the video;

The audio recognition unit is configured to identify whether the harmful content is included in the audio, and if yes, acquire the video content in the start and end time according to the start and end time of the audio.

The present disclosure, in another embodiment, discloses a system for identifying harmful videos, including:

A processor and a memory having executable instructions stored therein, the processor executing the instructions to perform the following operations:

The present disclosure, in another embodiment, also discloses a computer storage medium storing executable instructions for performing the following method of identifying harmful video:

For the above system, it may comprise: at least one processor (eg CPU), at least one sensor (eg accelerometer, gyroscope, GPS module or other positioning module), at least one memory, at least one communication bus, wherein the communication bus To achieve connection communication between various components. The device may further include at least one receiver, at least one transmitter, wherein the receiver and the transmitter may be wired transmission ports, or may be wireless devices (including, for example, including antenna devices) for signaling with other node devices. Or the transmission of data. The memory may be a high speed RAM memory or a non-volatile memory such as at least one disk memory. The memory may optionally be at least one storage device located remotely from the aforementioned processor. A set of program code is stored in the memory, and the processor can call the code stored in the memory to perform related functions via the communication bus.

Embodiments of the present disclosure also provide a computer storage medium, wherein the computer storage medium can store a program that, when executed, includes some or all of the steps of any of the methods of identifying a harmful video as recited in the above method embodiments.

The steps in the method of the embodiment of the present disclosure may be sequentially adjusted, merged, and deleted according to actual needs.

Modules and units in the system of the embodiments of the present disclosure may be combined, divided, and deleted according to actual needs. It should be noted that, for the foregoing method embodiments, for the sake of simple description, they are all expressed as a series of action combinations, but those skilled in the art should understand that the present invention is not limited by the described action sequence. Because certain steps may be performed in other sequences or concurrently in accordance with the present invention. In the following, those skilled in the art should also understand that the embodiments described in the specification are all preferred embodiments, and the actions, modules, and units involved are not necessarily required by the present invention.

In the above embodiments, the descriptions of the various embodiments are different, and the details that are not detailed in a certain embodiment can be referred to the related descriptions of other embodiments.

In the several embodiments provided by the present disclosure, it should be understood that the disclosed system can be implemented in other manners. For example, the embodiments described above are merely illustrative. For example, the division of the unit is only a logical function division. In actual implementation, there may be another division manner, for example, multiple units or components may be combined or integrated. Go to another system, or some features can be ignored or not executed. In addition, the coupling or direct coupling or communication connection of the various units or components to each other may be an indirect coupling or communication connection through some interfaces, devices or units, and may be electrical or otherwise.

The units described as separate components may or may not be physically separate, may be located in one place, or may be distributed over multiple network elements. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, each functional unit in each embodiment of the present disclosure may be integrated into one processing unit, or each unit may exist separately, or two or more units may be integrated into one unit. The above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present disclosure may contribute to the prior art or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium. A number of instructions are included to cause a computer device (which may be a smart phone, a personal digital assistant, a wearable device, a laptop, a tablet) to perform all or part of the steps of the methods described in various embodiments of the present disclosure. The foregoing storage medium includes: a U disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk, and the like. .

The above embodiments are only used to illustrate the technical solutions of the present disclosure, and are not intended to be limiting; although the present disclosure has been described in detail with reference to the foregoing embodiments, those skilled in the art will understand that The technical solutions described in the examples are modified, or equivalent to some of the technical features are included; and the modifications or substitutions do not depart from the scope of the technical solutions of the embodiments of the present disclosure.

Claims

A method of identifying unwanted videos, including:

Step a), when it is determined that the page element of the webpage includes a URL path of the video, identifying an IP address or an IP address segment of the user recorded in the page content of the webpage, and querying, in the first database, whether the IP address exists or IP address of the same network segment, and outputting a first weighting factor related to the IP according to the query result of the user IP address;

Step b): obtaining a domain name included in the URL or an IP address pointed by the URL according to a URL path of the video, performing a whois query in the second database based on the domain name included in the URL, and/or based on the The IP address pointed to by the URL, in the second database, whether the IP address included in the URL or the IP address of the same network segment exists, and the output is related to the URL path of the video according to the query result of the whois query and/or the IP address. Second weighting factor;

Step c): acquiring a video file with the lowest picture quality based on the URL path of the video and the lowest picture quality in the online play setting of the video, and using the content-based video copy detection technology to preset the harmful video Performing video copy detection on the video file of the lowest picture quality in the database, and outputting a third weighting factor according to the monitored result;

Step d), integrating the first weighting factor and the second weighting factor and the third weighting factor to identify whether the video belongs to a harmful video.
The method of claim 1 wherein said second database is a third party database.
The method of claim 1 wherein step b) further comprises:

Further, the security of the domain name is queried in a third-party domain name security list to output a security factor, and the second weighting factor is corrected by the security factor.
The method according to claim 1, wherein the obtaining the video file at the lowest picture quality in step c) comprises acquiring the video content at the end of the video file.
The method according to claim 1, wherein the video file at the time of obtaining the lowest picture quality in step c) further comprises the following:

Step c1): extracting audio in the video;

Step c2): Identify whether harmful content is included in the audio, and if so, obtain the video content in the start and end time according to the start and end time of the audio.
A system for identifying harmful videos, including:

a first weighting factor generating module, configured to: when determining that a page element of the webpage includes a URL path of the video, identify an IP address or an IP address segment of the user recorded in the page content of the webpage, and query whether the first database is in the first database The IP address or the IP address of the same network segment exists, and the first weighting factor related to the IP is output according to the query result of the user IP address;

a second weighting factor generating module, configured to: obtain a domain name included in the URL or an IP address pointed by the URL according to a URL path of the video, and perform a whois query in the second database according to the domain name included in the URL, And/or based on the IP address pointed by the URL, querying in the second database whether the IP address or the same network segment IP address included in the URL exists, and outputting according to the query result of the whois query result and/or the IP address. a second weighting factor associated with the URL path of the video;

a third weighting factor generating module, configured to: acquire a video file with a minimum picture quality based on a URL path of the video and a minimum picture quality in an online play setting of the video, and utilize a content-based video copy detection technology, Performing video copy detection on the video file of the lowest picture quality in a preset harmful video database, and outputting a third weighting factor according to the monitored result;

And an identifying module, configured to integrate the first weighting factor and the second weighting factor and the third weighting factor to identify whether the video belongs to a harmful video.
The system of claim 6 wherein preferably said second database is a third party database.
The system of claim 6 wherein the second weighting factor generation module further comprises:

And a correction unit, configured to: further query, in a third-party domain name security list, the security of the domain name to output a security factor, and modify the second weighting factor by using the security factor.
The system according to claim 6, wherein the video file when obtaining the lowest picture quality in the third weighting factor generating module comprises acquiring video content at the end of the video file.
The system according to claim 6, wherein the third weight factor generation module further acquires a video file at the lowest picture quality by the following unit:

An audio extraction unit for extracting audio in the video;

The audio recognition unit is configured to identify whether the harmful content is included in the audio, and if yes, acquire the video content in the start and end time according to the start and end time of the audio.
A system for identifying harmful videos, including:

a processor and a memory having stored therein executable instructions, the processor executing the instructions to perform the following operations:

Step a), when it is determined that the page element of the webpage includes a URL path of the video, identifying an IP address or an IP address segment of the user recorded in the page content of the webpage, and querying, in the first database, whether the IP address exists or IP address of the same network segment, and outputting a first weighting factor related to the IP according to the query result of the user IP address;

Step b): obtaining a domain name included in the URL or an IP address pointed by the URL according to a URL path of the video, performing a whois query in the second database based on the domain name included in the URL, and/or based on the The IP address pointed to by the URL, in the second database, whether the IP address included in the URL or the IP address of the same network segment exists, and according to the query result of the whois query and/or the IP address, the output is related to the URL path of the video. Second weighting factor;

Step c): acquiring a video file with the lowest picture quality based on the URL path of the video and the lowest picture quality in the online play setting of the video, and using the content-based video copy detection technology to preset the harmful video Performing video copy detection on the video file of the lowest picture quality in the database, and outputting a third weighting factor according to the monitored result;

Step d), integrating the first weighting factor and the second weighting factor and the third weighting factor to identify whether the video belongs to a harmful video.
A computer storage medium storing executable instructions for performing the following method of identifying harmful video:

Step a), when it is determined that the page element of the webpage includes a URL path of the video, identifying an IP address or an IP address segment of the user recorded in the page content of the webpage, and querying, in the first database, whether the IP address exists or IP address of the same network segment, and outputting a first weighting factor related to the IP according to the query result of the user IP address;

Step b): obtaining a domain name included in the URL or an IP address pointed by the URL according to a URL path of the video, performing a whois query in the second database based on the domain name included in the URL, and/or based on the The IP address pointed to by the URL, in the second database, whether the IP address included in the URL or the IP address of the same network segment exists, and the output is related to the URL path of the video according to the query result of the whois query and/or the IP address. Second weighting factor;

Step c): acquiring a video file with the lowest picture quality based on the URL path of the video and the lowest picture quality in the online play setting of the video, and using the content-based video copy detection technology to preset the harmful video Performing video copy detection on the video file of the lowest picture quality in the database, and outputting a third weighting factor according to the monitored result;

Step d), integrating the first weighting factor and the second weighting factor and the third weighting factor to identify whether the video belongs to a harmful video.