CN111382383A - Method, device, medium and computer equipment for determining sensitive type of webpage content - Google Patents

Method, device, medium and computer equipment for determining sensitive type of webpage content Download PDF

Info

Publication number
CN111382383A
CN111382383A CN201811629231.3A CN201811629231A CN111382383A CN 111382383 A CN111382383 A CN 111382383A CN 201811629231 A CN201811629231 A CN 201811629231A CN 111382383 A CN111382383 A CN 111382383A
Authority
CN
China
Prior art keywords
content
webpage
evaluated
type
sensitivity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811629231.3A
Other languages
Chinese (zh)
Inventor
梅小伟
罗伟汛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Baiguoyuan Information Technology Co Ltd
Original Assignee
Guangzhou Baiguoyuan Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Baiguoyuan Information Technology Co Ltd filed Critical Guangzhou Baiguoyuan Information Technology Co Ltd
Priority to CN201811629231.3A priority Critical patent/CN111382383A/en
Publication of CN111382383A publication Critical patent/CN111382383A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Transfer Between Computers (AREA)

Abstract

The invention provides a method and a device for determining sensitive types of webpage contents, a computer storage medium and computer equipment; the method comprises the following steps: acquiring the content of the webpage to be evaluated according to the URL of the webpage to be evaluated; evaluating the webpage content to be evaluated by using a sensitivity evaluation model corresponding to the content type of the webpage content to be evaluated to obtain a corresponding model evaluation value; judging whether the model evaluation value is in a preset threshold interval or not; if so, sending the webpage content to be evaluated to a manual checking client; and receiving an artificial evaluation value corresponding to the webpage content to be evaluated, and determining the sensitivity type of the webpage content to be evaluated according to the artificial evaluation value. By the technical scheme, the sensitive type of the webpage content can be quickly and preliminarily evaluated, the webpage content belonging to the preset condition is further evaluated manually, the identification accuracy of the webpage content is ensured, and the identification efficiency of the webpage content is obviously improved.

Description

Method, device, medium and computer equipment for determining sensitive type of webpage content
Technical Field
The invention relates to the field of content identification, in particular to a method, a device, a medium and computer equipment for determining sensitive types of webpage content.
Background
With the rapid development of the internet, network information has become an essential part of people's life, and at present, the number of web pages on the internet is hundreds of millions, the content on the web pages is eight, and more web sites providing sensitive content, such as videos, pictures and the like related to pornography, are provided, which makes the network environment become very complex and may have a great influence on the health growth of teenagers. Therefore, it is very important to monitor and identify the content of web pages on the internet.
At present, the identification and monitoring means of the web page content generally includes that a sensitive word bank is preset firstly, then sensitive word matching identification is carried out on text data of the web page according to the preset sensitive word bank, but a website providing sensitive content carries out some technical processing on the text in the web page so as to avoid being matched by the sensitive words; another commonly used identification and monitoring means is to use a means of manually browsing all pictures, audios or videos in a web page to be checked for identification and screening, however, although the identification accuracy of the content identification means is higher than that of the former one, the number of web pages to be identified is huge, so that the content identification efficiency is low because the means needs to consume a large amount of human resources during implementation, and the manual identification time is long.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a method, a device, a medium and a computer device for determining the sensitive type of the webpage content.
An embodiment of the present invention provides a method for determining a sensitive type of web page content according to a first aspect, including:
acquiring the content of the webpage to be evaluated according to the URL of the webpage to be evaluated;
evaluating the webpage content to be evaluated by using a sensitivity evaluation model corresponding to the content type of the webpage content to be evaluated to obtain a corresponding model evaluation value;
judging whether the model evaluation value is in a preset threshold interval or not;
if so, sending the webpage content to be evaluated to a manual checking client;
and receiving an artificial evaluation value corresponding to the webpage content to be evaluated, and determining the sensitivity type of the webpage content to be evaluated according to the artificial evaluation value.
Further, the obtaining of the content of the web page to be evaluated according to the URL of the web page to be evaluated includes:
acquiring webpage data corresponding to the URL of the webpage to be evaluated;
extracting a link matched with a preset matching rule from the webpage data;
acquiring webpage content according to the link;
and taking the webpage content as the webpage content to be evaluated.
Further, the taking the web page content as the web page content to be evaluated includes:
if the content type of the webpage content is an image, acquiring a resolution threshold of the webpage content;
judging whether the resolution threshold is larger than a preset resolution threshold or not;
and if so, taking the webpage content as the webpage content to be evaluated.
Further, the taking the web page content as the web page content to be evaluated includes:
if the content type of the webpage content is an image, processing the webpage content by using a preset algorithm to obtain a picture characteristic value;
comparing the picture characteristic value with a prestored picture characteristic value corresponding to the URL of the webpage to be evaluated to obtain a picture characteristic value difference value;
and if the picture characteristic value difference is larger than a preset characteristic value threshold, taking the webpage content as the webpage content to be evaluated.
Further, the taking the web page content as the web page content to be evaluated includes:
if the content type of the webpage content is an image, determining the image quantity of the webpage content;
judging whether the number of the images is smaller than a preset number threshold value or not;
if the number of the images is smaller than a preset number threshold, acquiring a webpage screenshot of the webpage to be evaluated according to the URL of the webpage to be evaluated;
and taking the webpage screenshot as the webpage content to be evaluated.
Further, the determining the sensitivity type of the web content to be evaluated according to the artificial evaluation value then includes:
if the sensitivity type is a primary sensitivity type, determining that the webpage sensitivity type corresponding to the webpage to be evaluated is a first sensitivity, and stopping determining webpage content of the webpage to be evaluated, which does not have the corresponding sensitivity type;
if the sensitive type is not the primary sensitive type, judging whether all the webpage contents in the webpage to be evaluated have the corresponding sensitive type;
if the web page sensitivity types corresponding to the web pages to be evaluated are determined to be the second sensitivity types;
if the determined sensitive types do not all exist, the webpage content of which the corresponding sensitive types are not determined continues to be determined. Further, the determining the sensitivity type of the web content to be evaluated according to the artificial evaluation value then includes:
comparing the model evaluation value with the artificial evaluation value to obtain an evaluation difference value;
judging whether the evaluation difference value is in a preset evaluation value difference value interval or not;
if so, taking the webpage content to be evaluated and the artificial evaluation value as sample webpage content and the evaluation value thereof;
and training a sensitivity evaluation model corresponding to the content type by using the sample webpage content and the evaluation value thereof to obtain the trained sensitivity evaluation model.
An embodiment of the present invention provides, according to a second aspect, an apparatus for determining a sensitive type of web page content, including:
the content obtaining module is used for obtaining the webpage content to be evaluated according to the URL of the webpage to be evaluated;
the model evaluation value obtaining module is used for evaluating the webpage content to be evaluated by using a sensitivity evaluation model corresponding to the content type of the webpage content to be evaluated to obtain a corresponding model evaluation value;
the judging module is used for judging whether the model evaluation value is in a preset threshold interval or not;
the content sending module is used for sending the webpage content to be evaluated to a manual checking client when the model evaluation value is within a preset threshold interval;
and the sensitivity type determining module is used for receiving the artificial evaluation value corresponding to the webpage content to be evaluated and determining the sensitivity type of the webpage content to be evaluated according to the artificial evaluation value.
Embodiments of the present invention provide a computer-readable storage medium according to a third aspect, on which a computer program is stored, which when executed by a processor implements the method for determining a sensitive type of web page content described above.
An embodiment of the present invention provides a computer device according to a fourth aspect, the computer device including:
one or more processors;
storage means for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement the web page content sensitivity type determination method described above.
In the embodiment of the invention, the webpage content to be evaluated is obtained according to the URL of the webpage to be evaluated, a sensitivity evaluation model corresponding to the content type of the webpage content to be evaluated is firstly used for evaluating the webpage content to be evaluated to obtain a corresponding model evaluation value, and then whether the model evaluation value is positioned in a preset threshold interval is judged; if the model evaluation value is within a preset threshold interval, sending the webpage content to be evaluated to a manual review client; and after receiving the manual evaluation value corresponding to the webpage content to be evaluated, determining the sensitivity type of the webpage content to be evaluated according to the manual evaluation value. According to the technical scheme, the sensitivity evaluation model is used for quickly performing preliminary evaluation on the sensitivity type of the webpage content, manual work is used for further evaluation on the webpage content in the preset condition, the identification accuracy of the webpage content is guaranteed, and the identification efficiency of the webpage content is remarkably improved by combining the evaluation model and the manual identification.
Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a flowchart illustrating a method for determining a sensitive type of web page content according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating a method for obtaining content of a web page to be evaluated according to a URL of the web page to be evaluated according to an embodiment of the present invention;
FIG. 3 is a flowchart illustrating a method for determining content of a web page to be evaluated according to an embodiment of the present invention;
FIG. 4 is a flowchart illustrating a method for determining content of a web page to be evaluated according to another embodiment of the present invention;
FIG. 5 is a flowchart illustrating a method for determining content of a web page to be evaluated according to another embodiment of the present invention;
FIG. 6 is a schematic structural diagram of an apparatus for determining a sensitive type of web page content according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of a computer device according to an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative only and should not be construed as limiting the invention.
It will be understood by those skilled in the art that, unless otherwise specified, the singular forms "a", "an", "the" and "the" may include the plural forms as well, and the "first" and "second" used herein are only used to distinguish one technical feature from another and are not intended to limit the order, number, etc. of the technical features. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
The embodiment of the invention provides a method for determining a sensitive type of webpage content, and the following describes a specific implementation mode of the invention in detail with reference to the accompanying drawings. As shown in fig. 1, a method for determining a sensitive type of web page content according to an embodiment of the present invention includes the following steps:
s110: and obtaining the content of the webpage to be evaluated according to the URL of the webpage to be evaluated.
In this embodiment, the browser is operated through the Javascript script, so that an environment in which a real user accesses a web page can be simulated, and a web page content acquisition operation is performed on the web page to be evaluated in a program simulation environment. Specifically, a Uniform Resource Locator (URL) of a web page to be evaluated is obtained, then a web page source code corresponding to the URL of the web page to be evaluated is obtained, and then web page content to be evaluated is obtained from the web page source code.
S120: and evaluating the webpage content to be evaluated by using a sensitivity evaluation model corresponding to the content type of the webpage content to be evaluated to obtain a corresponding model evaluation value.
The content types of the webpage content to be evaluated comprise texts, images, audios or videos, and the corresponding sensitivity evaluation models are respectively a text type sensitivity evaluation model, an image type sensitivity evaluation model, an audio type sensitivity evaluation model and a video type sensitivity evaluation model. After obtaining the webpage content to be evaluated, inputting the webpage content to be evaluated into a trained sensitivity evaluation model corresponding to the content type of the webpage content, evaluating the webpage content to be evaluated by the sensitivity evaluation model, and obtaining a corresponding pornographic index evaluation value, wherein the model evaluation value is the pornographic index evaluation value obtained by evaluating the webpage content to be evaluated by the sensitivity evaluation model.
S130: and judging whether the model evaluation value is in a preset threshold interval.
In this embodiment, the preset threshold interval defines the number of the web page contents that need to be sent to the manual review client, and the larger the range of the preset threshold interval is, the larger the number of the web page contents that need to be sent to the manual review client is, and otherwise, the smaller the number is, and the specific range may be set according to a specific application scenario. For example, in an application scenario, it is desired to determine whether image-type web page content in a web page relates to pornography, a pornography index evaluation value interval corresponding to a pornography sensitive type, that is, the preset threshold interval, may be preset, and then the web page content is evaluated by using a sensitivity evaluation model to obtain a model evaluation value, and then the model evaluation value is compared with the preset threshold interval, and it is known whether the corresponding web page content belongs to the pornography sensitive type according to the comparison result.
For example, the image type sensitivity evaluation model presets 4 categories of sensitivity types for the web content of the image type, the evaluation value range is [ 0,1 ], wherein when the model evaluation value of the web content to be evaluated is in the interval of [ 0, 0.049 ], the corresponding sensitivity type is a normal class, when the model evaluation value of the web content to be evaluated is in the interval of [ 0.050, 0.092 ], the corresponding sensitivity type is a light class, when the model evaluation value of the web content to be evaluated is in the interval of [ 0.093, 0.70 ], the corresponding sensitivity type is a sexy class, when the model evaluation value of the web content to be evaluated is in the interval of [ 0.093, 0.70 ], the corresponding sensitivity type is a pornographic class, if it is desired to determine whether a certain image type web content belongs to a pornography, the interval is preset to be [ 0.71 ], the threshold value range is preset to be [ 1 ], and then the model evaluation value range is corresponding to the model evaluation value range of [ 0.71 ],71, 1 ] carrying out comparison.
Further, the preset threshold interval may be set as a transition threshold interval between the pornographic sensitive type and the category adjacent to the pornographic sensitive type, that is, the web page content determined by the sensitivity evaluation model may be the pornographic web page content, and when the model evaluation value is located in the transition threshold interval, if a slight deviation exists in determining the sensitive type corresponding to the web page content, the sensitivity evaluation model may have a great influence on the final result, for example, determining the web page content that is originally a sexual category as the pornographic category, or determining the web page content that is originally a sexual category as the sexual category, or the like. The specific range of the threshold interval may be determined according to the threshold interval corresponding to the sensitivity type category corresponding to the sensitivity evaluation model.
For example, based on the above example, if the threshold interval corresponding to the pornographic sensitivity type is [ 0.71, 1 ], the class adjacent to the threshold interval is the sexual sensitivity type, and the threshold interval corresponding to the sexual sensitivity type is [ 0.093, 0.70 ], the preset threshold interval may be set to [ 0.55, 0.85 ].
S140: and if so, sending the webpage content to be evaluated to a manual checking client.
And if the model evaluation value is within a preset threshold value interval, sending the corresponding webpage content to be evaluated to a manual review client, and finally evaluating by a reviewer to ensure the identification accuracy of the webpage content. After the auditor finishes the identification of the webpage content, the webpage content is scored according to a preset scoring rule, a pornographic index evaluation value, namely an artificial evaluation value, corresponding to the webpage content is determined, and the artificial evaluation value is fed back.
And if the model evaluation value is not in a preset threshold interval, determining the sensitivity type of the webpage content to be evaluated according to the model evaluation value.
For example, based on the above example, if the model evaluation value corresponding to the web content a picture is 0.53, it can be determined that the sensitivity type corresponding thereto is a sexually sensitive type.
S150: and receiving an artificial evaluation value corresponding to the webpage content to be evaluated, and determining the sensitivity type of the webpage content to be evaluated according to the artificial evaluation value.
And when the artificial evaluation value is received, determining the sensitivity type of the webpage content to be evaluated according to the artificial evaluation value directly. For example, based on the above example, if the artificial score corresponding to the picture of the web content B is 0.95, it may be determined that the corresponding sensitivity type is pornography.
The embodiment of the invention utilizes the sensitivity evaluation model to carry out the preliminary evaluation of the sensitivity type for the webpage content, can quickly obtain the evaluation value, effectively reduces the workload of manual identification, greatly reduces the investment of human resources, further evaluates the webpage content belonging to the preset condition by using manpower, ensures the identification accuracy of the webpage content, and obviously improves the identification efficiency of the webpage content by combining the evaluation model and the manual identification.
In one embodiment, as shown in fig. 2, the S110: obtaining the content of the webpage to be evaluated according to the URL of the webpage to be evaluated, comprising the following steps:
s111: and acquiring webpage data corresponding to the URL of the webpage to be evaluated.
S112: and extracting a link matched with a preset matching rule from the webpage data.
S113: and obtaining the webpage content according to the link.
S114: and taking the webpage content as the webpage content to be evaluated.
In this embodiment, the preset matching rule may be a regular matching algorithm, and the web page content may be found from the web page data, that is, the web page source code, by using the regular matching algorithm. For example, if the content type of the web content to be found is an image, the regular matching algorithm may be configured to match the links with common picture format names such as ". png", ". gif" and the like from the web page data, and then download the corresponding pictures through the links, and use the pictures as the web content to be evaluated.
In one embodiment, as shown in fig. 3, the S114: taking the webpage content as webpage content to be evaluated, and the method comprises the following steps:
s210: and if the content type of the webpage content is an image, acquiring a resolution threshold of the webpage content.
S220: and judging whether the resolution threshold is larger than a preset resolution threshold.
S230: and if so, taking the webpage content as the webpage content to be evaluated.
In this embodiment, considering that there may be many pictures of the web site LOGO, the small icon, and the like, which are not the main content of the web page, in the web page, if the part of the pictures in the web page is identified in the identification process, the evaluation service pressure of the sensitivity evaluation model is greatly increased, and the sensitivity types corresponding to the part of the pictures are usually normal types, so identifying the part of the pictures is equivalent to wasting computer resources. Therefore, it is necessary to filter the picture of the non-web page main content.
Specifically, since the size of the part of the picture is usually small, a preset resolution threshold may be configured in advance, and then only the resolution threshold of the web page content needs to be acquired, and compared with the preset resolution threshold, the main content in the web page is identified according to the comparison result, and is used as the web page content to be evaluated.
In another embodiment, as shown in fig. 4, the S114: taking the webpage content as webpage content to be evaluated, and the method comprises the following steps:
s310: and if the content type of the webpage content is an image, processing the webpage content by using a preset algorithm to obtain a picture characteristic value.
S320: and comparing the picture characteristic value with a pre-stored picture characteristic value corresponding to the URL of the webpage to be evaluated to obtain a picture characteristic value difference value.
S330: and if the picture characteristic value difference is larger than a preset characteristic value threshold, taking the webpage content as the webpage content to be evaluated.
In this embodiment, it is considered that there may be some pictures with higher similarity in the web page, the sensitivity types corresponding to the pictures are generally consistent, and if all the pictures are identified, the evaluation service pressure of the sensitivity evaluation model is also increased, so that the pictures with higher similarity to the identified pictures need to be filtered.
The preset algorithm may be a perceptual hash algorithm. The more commonly used algorithms in the perceptual hash algorithm include a cosine transform perceptual hash algorithm, an average value hash algorithm and a difference value hash algorithm, and one or more algorithms can be selected according to a specific application scene to process the picture-type web page content to be evaluated. Specifically, each content of the web page to be identified is processed by using a preset algorithm, a corresponding fingerprint character string, namely the picture characteristic value, is obtained, and the obtained picture characteristic value is stored in association with the URL of the web page to be identified.
Before identifying a new webpage content to be identified, judging whether the content type of the webpage content to be identified is an image type, if so, processing the webpage content to be identified by using the preset algorithm to obtain a picture characteristic value, then comparing the currently obtained picture characteristic value with a prestored picture characteristic value corresponding to the URL of the webpage to be evaluated one by one to obtain a Hamming distance (i.e. a picture characteristic value difference value), wherein the smaller the picture characteristic value difference value is, the higher the similarity between pictures is.
In another embodiment, as shown in fig. 5, the S114: taking the webpage content as webpage content to be evaluated, and the method comprises the following steps:
s410: and if the content type of the webpage content is an image, determining the image quantity of the webpage content.
S420: and judging whether the number of the images is smaller than a preset number threshold value.
S430: and if the number of the images is smaller than a preset number threshold, acquiring a webpage screenshot of the webpage to be evaluated according to the URL of the webpage to be evaluated.
S440: and taking the webpage screenshot as the webpage content to be evaluated.
In this embodiment, it is considered that in practical application, a link suffix of a part of image-type web content is not in a common picture format, for example, a link of currently acquired image-type web content to be evaluated is https:// img-blog.abc.com/xxx, which is a picture, but the link suffix is not in a common picture format, so that the web content is missed when matching is performed by using the regular matching algorithm.
Therefore, the number of all image type webpage contents in a certain webpage to be evaluated, namely the image number, can be detected, if the image number is smaller than a preset number threshold, a webpage screenshot of the webpage to be evaluated is obtained, and then the webpage screenshot is sent to a manual review client as the webpage contents to be evaluated for manual identification, so that identification that sensitive types are missed in part of the webpage contents due to the fact that part of the images cannot be detected is avoided.
In one embodiment, in S150: determining the sensitivity type of the webpage content to be evaluated according to the artificial evaluation value, and then:
if the sensitivity type is the primary sensitivity type, determining that the webpage sensitivity type corresponding to the webpage to be evaluated is the first sensitivity, and stopping determining the webpage content of which the corresponding sensitivity type is not determined in the webpage to be evaluated.
And if the sensitive type is not the primary sensitive type, judging whether all the webpage contents in the webpage to be evaluated have the corresponding sensitive type.
And if the determined corresponding sensitivity types exist, determining that the webpage sensitivity type corresponding to the webpage to be evaluated is a second sensitivity.
If the determined sensitive types do not all exist, the webpage content of which the corresponding sensitive types are not determined continues to be determined.
In this embodiment, the sensitivity type level and the web page sensitivity type may include several levels, and the sensitivity type level of the web page content and the web page sensitivity type are in one-to-one correspondence, and may be preset according to a specific application scenario.
For example, in a certain application scenario, setting the webpage sensitivity types as a first sensitivity-yellow-related type and a second sensitivity-non-yellow-related type, setting the sensitivity type level corresponding to the yellow-related type as a first-level sensitivity type, setting the sensitivity type level corresponding to the non-yellow-related type as a second-level sensitivity type, taking the pornographic type in the sensitivity types of the sensitivity evaluation model as the first-level sensitivity type, and taking the sexuality type, the mini type and the normal type as the second-level sensitivity type; if the sensitive type of the C web page content is determined to be pornographic, it may be further determined that the sensitive type of the C web page content belongs to a primary sensitive type, and the web page sensitive type of the corresponding web page is yellow-related.
If the sensitive type of the webpage content is determined to belong to the first-level sensitive type, other webpage contents in the webpage corresponding to the webpage content are not evaluated, and the webpage sensitive type corresponding to the webpage content can be directly determined to be the first sensitive type-yellow-related type, so that the evaluation business pressure is reduced.
And if the sensitive type of the webpage content is not the primary sensitive type, other webpage contents of the webpage need to be evaluated continuously to determine the corresponding sensitive type, and the evaluation operation is not stopped until the webpage content of which the sensitive type belongs to the primary sensitive type is found or all the webpage contents in the webpage are evaluated.
Preferably, considering that the text-type and image-type sensitivity evaluation models perform evaluation faster than the audio-type and video-type sensitivity evaluation models, since the web page sensitive type of the web page is usually set to be less, such as two types of yellow-related type and non-yellow-related type, if the content type of the obtained web content further includes audio type web content and/or video type web content, the text-type and image-type web page contents are evaluated using the text-type and image-type sensitivity evaluation models respectively, the two types of sensitivity evaluation models do not determine the webpage content with the sensitivity type of the first-level sensitivity type, and the audio type sensitivity evaluation model and/or the video type sensitivity evaluation model are used for evaluating the audio type webpage content and/or the video type webpage content so as to reduce the evaluation business pressure.
Furthermore, when the webpage sensitivity type of the webpage is determined to belong to the first sensitivity, the screenshot of the webpage and the webpage content with the sensitivity type being the first-level sensitivity type are obtained and stored, and the problem that the webpage owner changes or deletes related content in the webpage with the first sensitivity, so that the webpage is yellow-related and difficult to prove is solved.
In one embodiment, in S150: determining the sensitivity type of the webpage content to be evaluated according to the artificial evaluation value, and then:
and comparing the model evaluation value with the artificial evaluation value to obtain an evaluation difference value.
And judging whether the evaluation difference value is in a preset evaluation value difference value interval.
And if so, taking the webpage content to be evaluated and the artificial evaluation value as sample webpage content and the evaluation value thereof.
And training a sensitivity evaluation model corresponding to the content type by using the sample webpage content and the evaluation value thereof to obtain the trained sensitivity evaluation model.
In this embodiment, before using the sensitivity evaluation model, the sample web page content and the corresponding label data of each content type need to be obtained to train the sensitivity evaluation model, so as to obtain a trained sensitivity evaluation model. And the labeled data is the evaluation value of the pornographic index labeled for the sample webpage content by a labeling person or a labeling model.
Considering that the number of webpages to be identified is huge, the styles of the webpage contents of each content type are rich, and if more sample webpage contents with more styles can be provided for each sensitivity evaluation model for learning, the evaluation accuracy of the sensitivity evaluation model can be obviously improved. Although the trained sensitivity evaluation model learns a large number of sample web contents and a large number of sample web contents, in actual operation, the sensitivity evaluation model still encounters a large number of web contents which are not learned by the sensitivity evaluation model and have different styles from the styles of the sample web contents learned by the sensitivity evaluation model, and in this case, the evaluation accuracy of the sensitivity evaluation model is not very accurate. Therefore, when the evaluation difference value between the model evaluation value and the artificial evaluation value is in the preset evaluation value difference value interval, the corresponding webpage content to be evaluated and the artificial evaluation value are used as the sample webpage content and the evaluation value thereof, and the sensitivity evaluation model is trained by using the sample webpage content and the evaluation value thereof, so that the sensitivity evaluation model can continuously learn webpage contents of more styles, and the evaluation accuracy of the sensitivity evaluation model is continuously improved.
Specifically, the larger the evaluation difference value is, the lower the evaluation accuracy of the sensitivity evaluation model in the current evaluation is, and conversely, the smaller the evaluation difference value is, the higher the evaluation accuracy of the sensitivity evaluation model in the current evaluation is. When the preset evaluation value difference interval is set, the setting may be performed according to a specific application scenario.
In one possible implementation, the preset evaluation value difference interval is set to be the maximum, for example, to be [ 0,1 ], so that all the web contents that need to be sent to the manual review client are taken as sample web contents.
In another possible implementation, considering that when the evaluation difference between the model evaluation value evaluated by the sensitivity evaluation model this time and the manual evaluation value evaluated by the reviewer is small, it indicates that the evaluation accuracy of the sensitivity evaluation model for the web content evaluated this time is high, if the web content with high evaluation accuracy is also used as the sample web content, the effect on improving the evaluation accuracy of the sensitivity evaluation model is low, and in order to improve the evaluation efficiency of the sensitivity evaluation model, it may be considered that the web content with accurate evaluation by the sensitivity evaluation model is not used as the sample web content. For example, the preset evaluation value difference interval is set to [ 0.2,1 ].
In order to better understand the technical solution of the present invention, the present invention further provides a web content sensitivity type determining apparatus, as shown in fig. 6, including the following modules:
a content obtaining module 110, configured to obtain content of a web page to be evaluated according to a URL of the web page to be evaluated;
a model evaluation value obtaining module 120, configured to evaluate the to-be-evaluated web page content by using a sensitivity evaluation model corresponding to the content type of the to-be-evaluated web page content, so as to obtain a corresponding model evaluation value;
a judging module 130, configured to judge whether the model evaluation value is within a preset threshold interval;
the content sending module 140 is configured to send the web content to be evaluated to a manual review client when the model evaluation value is within a preset threshold interval;
and a sensitivity type determining module 150, configured to receive an artificial evaluation value corresponding to the web content to be evaluated, and determine a sensitivity type of the web content to be evaluated according to the artificial evaluation value.
The content type of the webpage content to be evaluated comprises text, images, audio or video. The model evaluation value is a pornographic index evaluation value obtained by evaluating the webpage content to be evaluated by the sensitivity evaluation model.
The preset threshold interval limits the number of the webpage contents which need to be sent to the manual checking client, the larger the range of the preset threshold interval is, the more the number of the webpage contents which need to be sent to the manual checking client is, otherwise, the smaller the number is, and the specific range can be set according to the specific application scene. For example, in an application scenario, it is desired to determine whether image-type web page content in a web page relates to pornography, a pornography index evaluation value interval corresponding to a pornography sensitive type, that is, the preset threshold interval, may be preset, and then the web page content is evaluated by using a sensitivity evaluation model to obtain a model evaluation value, and then the model evaluation value is compared with the preset threshold interval, and it is known whether the corresponding web page content belongs to the pornography sensitive type according to the comparison result.
Further, the preset threshold interval may be set as a transition threshold interval between the pornographic sensitive type and the category adjacent to the pornographic sensitive type, that is, the webpage content determined by the sensitivity evaluation model may be the pornographic webpage content, where a specific range of the threshold interval may be determined according to the threshold interval corresponding to the sensitive type category corresponding to the sensitivity evaluation model.
The embodiment of the invention utilizes the sensitivity evaluation model to carry out the preliminary evaluation of the sensitivity type for the webpage content, can quickly obtain the evaluation value, effectively reduces the workload of manual identification, greatly reduces the investment of human resources, further evaluates the webpage content belonging to the preset condition by using manpower, ensures the identification accuracy of the webpage content, and obviously improves the identification efficiency of the webpage content by combining the evaluation model and the manual identification.
In one embodiment, the content obtaining module 110 includes:
the webpage data obtaining submodule 111 is configured to obtain webpage data corresponding to the URL of the webpage to be evaluated;
the link acquisition submodule 112 is configured to extract a link matched with a preset matching rule from the webpage data;
a web content obtaining sub-module 113, configured to obtain web content according to the link;
and the content to be evaluated determining submodule 114 is used for taking the webpage content as the webpage content to be evaluated.
In this embodiment, the preset matching rule may be a regular matching algorithm.
In one embodiment, the content to be evaluated determining sub-module 114 includes:
a resolution obtaining unit 210 configured to obtain a resolution threshold of the web content when a content type of the web content is an image;
a resolution determining unit 220, configured to determine whether the resolution threshold is greater than a preset resolution threshold;
a first to-be-evaluated content determining unit 230, configured to, when the resolution threshold is greater than a preset resolution threshold, take the web content as a to-be-evaluated web content.
In one embodiment, the content to be evaluated determining sub-module 114 includes:
a feature value obtaining unit 310, configured to, when the content type of the web content is an image, process the web content using a preset algorithm to obtain a picture feature value;
a difference obtaining unit 320, configured to compare the picture feature value with a pre-stored picture feature value corresponding to the URL of the web page to be evaluated, so as to obtain a picture feature value difference;
the second to-be-evaluated content determining unit 330 is configured to, when the picture feature value difference is greater than a preset feature value threshold, take the webpage content as a to-be-evaluated webpage content.
In one embodiment, the content to be evaluated determining sub-module 114 includes:
a number determination unit 410 for determining the number of images of the web content when the content type of the web content is an image;
a number judgment unit 420 configured to judge whether the number of images is smaller than a preset number threshold;
a screenshot obtaining unit 430, configured to obtain a webpage screenshot of the webpage to be evaluated according to the URL of the webpage to be evaluated when the number of the images is smaller than a preset number threshold;
and a third content to be evaluated determining unit 440, configured to use the screenshot as content of a web page to be evaluated.
In an embodiment, after executing the function corresponding to the sensitivity type determining module 150, the web content sensitivity type determining apparatus provided by the present invention further executes the functions corresponding to the following modules:
the first sensitivity determining module is used for determining that the webpage sensitivity type corresponding to the webpage to be evaluated is a first sensitivity when the sensitivity type is a primary sensitivity type, and stopping determining the webpage content of the webpage to be evaluated, which does not determine the corresponding sensitivity type;
the webpage sensitivity type judging module is used for judging whether all webpage contents in the webpage to be evaluated have the sensitivity type corresponding to the determination when the sensitivity type is not the primary sensitivity type;
the second sensitivity determining module is used for determining that the webpage sensitivity type corresponding to the webpage to be evaluated is the second sensitivity when all webpage contents corresponding to the webpage to be evaluated have the sensitivity type corresponding to the determination;
and the execution module is used for continuously determining the webpage contents of which the corresponding sensitive types are not determined when all the webpage contents corresponding to the webpage to be evaluated do not have the corresponding sensitive types.
In this embodiment, the sensitivity type level and the web page sensitivity type may include several levels, and the sensitivity type level of the web page content and the web page sensitivity type are in one-to-one correspondence, and may be preset according to a specific application scenario.
Preferably, considering that the text-type and image-type sensitivity evaluation models perform evaluation faster than the audio-type and video-type sensitivity evaluation models, since the web page sensitive type of the web page is usually set to be less, such as two types of yellow-related type and non-yellow-related type, if the content type of the obtained web content further includes audio type web content and/or video type web content, the text-type and image-type web page contents are evaluated using the text-type and image-type sensitivity evaluation models respectively, the two types of sensitivity evaluation models do not determine the webpage content with the sensitivity type of the first-level sensitivity type, and the audio type sensitivity evaluation model and/or the video type sensitivity evaluation model are used for evaluating the audio type webpage content and/or the video type webpage content so as to reduce the evaluation business pressure.
Furthermore, when the webpage sensitivity type of the webpage is determined to belong to the first sensitivity, the screenshot of the webpage and the webpage content with the sensitivity type being the first-level sensitivity type are obtained and stored, and the problem that the webpage owner changes or deletes related content in the webpage with the first sensitivity, so that the webpage is yellow-related and difficult to prove is solved.
In an embodiment, after executing the function corresponding to the sensitivity type determining module 150, the web content sensitivity type determining apparatus provided by the present invention further executes the functions corresponding to the following modules:
an evaluation difference obtaining module, configured to compare the model evaluation value with the artificial evaluation value to obtain an evaluation difference;
the evaluation difference value judging module is used for judging whether the evaluation difference value is positioned in a preset evaluation value difference value interval or not;
the sample data determining module is used for taking the webpage content to be evaluated and the artificial evaluation value as a sample webpage content and an evaluation value thereof when the evaluation difference value is within a preset evaluation value difference value interval;
and the model training module is used for training the sensitivity evaluation model corresponding to the content type by using the sample webpage content and the evaluation value thereof to obtain the trained sensitivity evaluation model.
In this embodiment, before using the sensitivity evaluation model, the sample web page content and the corresponding label data of each content type need to be obtained to train the sensitivity evaluation model, so as to obtain a trained sensitivity evaluation model. And the labeled data is the evaluation value of the pornographic index labeled for the sample webpage content by a labeling person or a labeling model.
In one possible implementation, the preset evaluation value difference interval is set to be the maximum, for example, to be [ 0,1 ], so that all the web contents that need to be sent to the manual review client are taken as sample web contents.
In another possible implementation, considering that when the evaluation difference between the model evaluation value evaluated by the sensitivity evaluation model this time and the manual evaluation value evaluated by the reviewer is small, it indicates that the evaluation accuracy of the sensitivity evaluation model for the web content evaluated this time is high, if the web content with high evaluation accuracy is also used as the sample web content, the effect on improving the evaluation accuracy of the sensitivity evaluation model is low, and in order to improve the evaluation efficiency of the sensitivity evaluation model, it may be considered that the web content with accurate evaluation by the sensitivity evaluation model is not used as the sample web content. For example, the preset evaluation value difference interval is set to [ 0.2,1 ].
It should be noted that the device for determining a sensitive type of web page content provided in the embodiment of the present invention can implement the functions implemented by the method for determining a sensitive type of web page content, and specific implementation of the functions refers to the description in the method for determining a sensitive type of web page content, which is not described herein again.
The embodiment of the invention also provides a computer readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the method for determining the sensitive type of the web page content is implemented. The storage medium includes, but is not limited to, any type of disk (including floppy disks, hard disks, optical disks, CD-ROMs, and magneto-optical disks), ROMs (Read-Only memories), RAMs (Random AcceSS memories), EPROMs (EraSable Programmable Read-Only memories), EEPROMs (Electrically EraSable Programmable Read-Only memories), flash memories, magnetic cards, or optical cards. That is, a storage medium includes any medium that stores or transmits information in a form readable by a device (e.g., a computer). Which may be a read-only memory, magnetic or optical disk, or the like.
An embodiment of the present invention further provides a computer device, where the computer device includes:
one or more processors 510;
a storage 520 to store one or more programs 500,
when executed by the one or more processors 510, the one or more programs 500 cause the one or more processors 510 to implement the web page content sensitivity type determination method described above.
Fig. 7 is a schematic structural diagram of the computer apparatus of the present invention, which includes a processor 510, a storage device 520, an input unit 530, a display unit 540, and other components. Those skilled in the art will appreciate that the structural elements shown in fig. 3 do not constitute a limitation of all computer devices and may include more or fewer components than those shown, or some of the components may be combined. The storage 520 may be used to store the application 500 and various functional modules, and the processor 510 executes the application 500 stored in the storage 520, thereby performing various functional applications of the device and data processing. The storage 520 may be an internal memory or an external memory, or include both internal and external memories. The memory may comprise read-only memory, Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), flash memory, or random access memory. The external memory may include a hard disk, a floppy disk, a ZIP disk, a usb-disk, a magnetic tape, etc. The disclosed memory devices include, but are not limited to, these types of memory devices. The disclosed memory device 520 is provided by way of example only and not by way of limitation.
The input unit 530 is used for receiving input of signals and receiving related requests of selecting voice files and the like input by users. The input unit 530 may include a touch panel and other input devices. The touch panel can collect touch operations of a user on or near the touch panel (for example, operations of the user on or near the touch panel by using any suitable object or accessory such as a finger, a stylus and the like) and drive the corresponding connecting device according to a preset program; other input devices may include, but are not limited to, one or more of a physical keyboard, function keys (e.g., play control keys, switch keys, etc.), a trackball, a mouse, a joystick, and the like. The display unit 540 may be used to display information input by a user or information provided to a user and various menus of the computer device. The display unit 540 may take the form of a liquid crystal display, an organic light emitting diode, or the like. The processor 510 is a control center of the computer device, connects various parts of the entire computer using various interfaces and lines, and performs various functions and processes data by operating or executing software programs and/or modules stored in the storage device 520 and calling data stored in the storage device.
In one embodiment, a computer device includes one or more processors 510, and one or more storage 520, one or more applications 500, wherein the one or more applications 500 are stored in the storage 520 and configured to be executed by the one or more processors 510, and the one or more applications 500 are configured to perform the web page content sensitivity type determination method described in the above embodiments.
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
It should be understood that each functional unit in the embodiments of the present invention may be integrated into one processing module, each unit may exist alone physically, or two or more units may be integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode.
The foregoing is only a partial embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (10)

1. A method for determining sensitive types of web page contents, comprising:
acquiring the content of the webpage to be evaluated according to the URL of the webpage to be evaluated;
evaluating the webpage content to be evaluated by using a sensitivity evaluation model corresponding to the content type of the webpage content to be evaluated to obtain a corresponding model evaluation value;
judging whether the model evaluation value is in a preset threshold interval or not;
if so, sending the webpage content to be evaluated to a manual checking client;
and receiving an artificial evaluation value corresponding to the webpage content to be evaluated, and determining the sensitivity type of the webpage content to be evaluated according to the artificial evaluation value.
2. The method for determining the sensitive type of web page content according to claim 1,
the obtaining of the content of the webpage to be evaluated according to the URL of the webpage to be evaluated comprises the following steps:
acquiring webpage data corresponding to the URL of the webpage to be evaluated;
extracting a link matched with a preset matching rule from the webpage data;
acquiring webpage content according to the link;
and taking the webpage content as the webpage content to be evaluated.
3. The method for determining the sensitive type of web page content according to claim 2,
the step of taking the webpage content as the webpage content to be evaluated comprises the following steps:
if the content type of the webpage content is an image, acquiring a resolution threshold of the webpage content;
judging whether the resolution threshold is larger than a preset resolution threshold or not;
and if so, taking the webpage content as the webpage content to be evaluated.
4. The method for determining the sensitive type of web page content according to claim 2,
the step of taking the webpage content as the webpage content to be evaluated comprises the following steps:
if the content type of the webpage content is an image, processing the webpage content by using a preset algorithm to obtain a picture characteristic value;
comparing the picture characteristic value with a prestored picture characteristic value corresponding to the URL of the webpage to be evaluated to obtain a picture characteristic value difference value;
and if the picture characteristic value difference is larger than a preset characteristic value threshold, taking the webpage content as the webpage content to be evaluated.
5. The method for determining the sensitive type of web page content according to claim 2,
the step of taking the webpage content as the webpage content to be evaluated comprises the following steps:
if the content type of the webpage content is an image, determining the image quantity of the webpage content;
judging whether the number of the images is smaller than a preset number threshold value or not;
if the number of the images is smaller than a preset number threshold, acquiring a webpage screenshot of the webpage to be evaluated according to the URL of the webpage to be evaluated;
and taking the webpage screenshot as the webpage content to be evaluated.
6. The method for determining the sensitive type of web page content according to claim 1,
determining the sensitivity type of the webpage content to be evaluated according to the artificial evaluation value, and then:
if the sensitivity type is a primary sensitivity type, determining that the webpage sensitivity type corresponding to the webpage to be evaluated is a first sensitivity, and stopping determining webpage content of the webpage to be evaluated, which does not have the corresponding sensitivity type;
if the sensitive type is not the primary sensitive type, judging whether all the webpage contents in the webpage to be evaluated have the corresponding sensitive type;
if the web page sensitivity types corresponding to the web pages to be evaluated are determined to be the second sensitivity types;
if the determined sensitive types do not all exist, the webpage content of which the corresponding sensitive types are not determined continues to be determined.
7. The method for determining the sensitive type of web page content according to claim 1,
determining the sensitivity type of the webpage content to be evaluated according to the artificial evaluation value, and then:
comparing the model evaluation value with the artificial evaluation value to obtain an evaluation difference value;
judging whether the evaluation difference value is in a preset evaluation value difference value interval or not;
if so, taking the webpage content to be evaluated and the artificial evaluation value as sample webpage content and the evaluation value thereof;
and training a sensitivity evaluation model corresponding to the content type by using the sample webpage content and the evaluation value thereof to obtain the trained sensitivity evaluation model.
8. A web page content sensitivity type determination apparatus, comprising:
the content obtaining module is used for obtaining the webpage content to be evaluated according to the URL of the webpage to be evaluated;
the model evaluation value obtaining module is used for evaluating the webpage content to be evaluated by using a sensitivity evaluation model corresponding to the content type of the webpage content to be evaluated to obtain a corresponding model evaluation value;
the judging module is used for judging whether the model evaluation value is in a preset threshold interval or not;
the content sending module is used for sending the webpage content to be evaluated to a manual checking client when the model evaluation value is within a preset threshold interval;
and the sensitivity type determining module is used for receiving the artificial evaluation value corresponding to the webpage content to be evaluated and determining the sensitivity type of the webpage content to be evaluated according to the artificial evaluation value.
9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out a method for determining a sensitive type of web page content according to any one of claims 1 to 7.
10. A computer device, characterized in that the computer device comprises:
one or more processors;
storage means for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement a web page content sensitivity type determination method as claimed in any one of claims 1-7.
CN201811629231.3A 2018-12-28 2018-12-28 Method, device, medium and computer equipment for determining sensitive type of webpage content Pending CN111382383A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811629231.3A CN111382383A (en) 2018-12-28 2018-12-28 Method, device, medium and computer equipment for determining sensitive type of webpage content

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811629231.3A CN111382383A (en) 2018-12-28 2018-12-28 Method, device, medium and computer equipment for determining sensitive type of webpage content

Publications (1)

Publication Number Publication Date
CN111382383A true CN111382383A (en) 2020-07-07

Family

ID=71220512

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811629231.3A Pending CN111382383A (en) 2018-12-28 2018-12-28 Method, device, medium and computer equipment for determining sensitive type of webpage content

Country Status (1)

Country Link
CN (1) CN111382383A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111984891A (en) * 2020-08-07 2020-11-24 游艺星际(北京)科技有限公司 Page display method and device, electronic equipment and storage medium
CN112036412A (en) * 2020-08-28 2020-12-04 绿盟科技集团股份有限公司 Webpage identification method, device, equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080320010A1 (en) * 2007-05-14 2008-12-25 Microsoft Corporation Sensitive webpage content detection
CN106776842A (en) * 2016-11-28 2017-05-31 腾讯科技(上海)有限公司 Multi-medium data detection method and device
CN108965916A (en) * 2017-05-25 2018-12-07 腾讯科技(深圳)有限公司 A kind of method, the method, device and equipment of model foundation of live video assessment
CN109086785A (en) * 2017-06-14 2018-12-25 北京图森未来科技有限公司 A kind of training method and device of image calibration model

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080320010A1 (en) * 2007-05-14 2008-12-25 Microsoft Corporation Sensitive webpage content detection
CN106776842A (en) * 2016-11-28 2017-05-31 腾讯科技(上海)有限公司 Multi-medium data detection method and device
CN108965916A (en) * 2017-05-25 2018-12-07 腾讯科技(深圳)有限公司 A kind of method, the method, device and equipment of model foundation of live video assessment
CN109086785A (en) * 2017-06-14 2018-12-25 北京图森未来科技有限公司 A kind of training method and device of image calibration model

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111984891A (en) * 2020-08-07 2020-11-24 游艺星际(北京)科技有限公司 Page display method and device, electronic equipment and storage medium
CN112036412A (en) * 2020-08-28 2020-12-04 绿盟科技集团股份有限公司 Webpage identification method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN110275958B (en) Website information identification method and device and electronic equipment
CN104765874B (en) For detecting the method and device for clicking cheating
CN108566399B (en) Phishing website identification method and system
CN109408821B (en) Corpus generation method and device, computing equipment and storage medium
CN109063456B (en) Security detection method and system for image type verification code
US20180174288A1 (en) SCORE WEIGHTS FOR USER INTERFACE (ui) ELEMENTS
CN105989268A (en) Safety access method and system for human-computer identification
CN105338001A (en) Method and device for recognizing phishing website
CN112541476B (en) Malicious webpage identification method based on semantic feature extraction
CN110069724A (en) The quick jump method of application program, device, electronic equipment and storage medium
CN111400586A (en) Group display method, terminal, server, system and storage medium
CN112347327A (en) Website detection method and device, readable storage medium and computer equipment
CN106407316B (en) Software question and answer recommendation method and device based on topic model
JP7182764B2 (en) Fraudulent web page detection device, control method and control program for fraudulent web page detection device
CN113190646A (en) User name sample labeling method and device, electronic equipment and storage medium
CN104881428A (en) Information graph extracting and retrieving method and device for information graph webpages
CN111382383A (en) Method, device, medium and computer equipment for determining sensitive type of webpage content
CN107786529B (en) Website detection method, device and system
CN108920955B (en) Webpage backdoor detection method, device, equipment and storage medium
CN114285641A (en) Network attack detection method and device, electronic equipment and storage medium
CN112597828B (en) Webpage recognition model training method and device and webpage recognition method
CN108985059B (en) Webpage backdoor detection method, device, equipment and storage medium
CN108171074B (en) Web tracking automatic detection method based on content association
US20220253503A1 (en) Generating interactive screenshot based on a static screenshot
CN116318974A (en) Site risk identification method and device, computer readable medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination