CN101055621A - Content based sensitive web page identification method - Google Patents

Content based sensitive web page identification method Download PDF

Info

Publication number
CN101055621A
CN101055621A CN 200610073172 CN200610073172A CN101055621A CN 101055621 A CN101055621 A CN 101055621A CN 200610073172 CN200610073172 CN 200610073172 CN 200610073172 A CN200610073172 A CN 200610073172A CN 101055621 A CN101055621 A CN 101055621A
Authority
CN
China
Prior art keywords
text
image
identification
responsive
webpage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN 200610073172
Other languages
Chinese (zh)
Other versions
CN100412888C (en
Inventor
胡卫明
吴偶
陈周耀
朱明亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CNB2006100731727A priority Critical patent/CN100412888C/en
Publication of CN101055621A publication Critical patent/CN101055621A/en
Application granted granted Critical
Publication of CN100412888C publication Critical patent/CN100412888C/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Character Input (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention discloses a method for identifying the sensitive webpage based on the content, including the steps of: under the condition of uniform resource localizer of given webpage, acquiring the source code of the webpage, splitting-flow for data and pretreating, and obtaining the text message and the effective image information; treating the text message using the continuous sensitive text classifier, and the treatment being completed if the results outputted from the classifier is bigger than the given threshold value in advance. Otherwise, the text messge may be treated using the dispersing sensitive text classifier, the results of identification is sentitive if the results outputted from the classifier is bigger than the given threshold value in advance, and the treatment is completed. Otherwise the image is identified using the image classifier, and the results of identification are mixed together with the results outputted from the dispersing classifier. The invention, using the combination of the continuous sensitive text classifier, the dispersing sensitive text classifier and the sensitive image claasifier, solves the existing technical problems. The invention, using web structural information and constructing an image collection identification, can carry out the fusion of information to improve the discrimination for sensitive webpage.

Description

Content-based sensitive web page identification method
Technical field
The present invention relates to the information filtering technical field, relate in particular to the method that identification contains the webpage of sensitive information.
Background technology
Because the internet sensitive information has caused great harm for Internet user especially teenager, therefore caused the extensive concern of researcher and industry.
A variety of sensitive information filter methods are arranged at present, comprise black and white lists, IP filtration and keyword coupling or the like filtration means.Generally speaking; on the one hand; these filtering techniques adopt a kind of very mechanical mode; can reach 100% filtration efficiency to some sensitive web pages; response time is also very short; but the cycle that filtration parameter upgrades can only followed the appearance of actual sensitive web page and changed, and can not tackle the quick variation of actual responsive website.On the other hand,, therefore caused very high mistake filterability, influenced user's normal online because the content information of webpage does not utilize basically or seldom utilizes.
Content-based sensitive information intelligent identification technology is a developing direction of filtering technique in recent years.At present existing multiple content-based sensitive information recognition methods.
On the responsive text identification of the general main foundation of the present sensitive web page identification method basis.Therefore core is the processing to text, at first extracts the text in the webpage, extracts feature then, utilizes the sorting algorithm of machine learning the inside to come feature is trained and classified then.What wherein feature extracting methods adopted usually is: (1) artificial given lists of keywords; (2) utilize the method for text matches to add up the number of times that each keyword occurs; (3) number of times of each keyword appearance is formed a vector, and after processing such as normalization, this vector is as the proper vector of the text.General given keyword number is less than 100.Choosing sorter then trains and predicts.People such as Singapore Pui Y.Lee utilize the Kohonen self organizing neural network to be used as sorter, have obtained actual effect preferably.Also have some sensitive image recognition methodss, for example our unit has proposed a kind of content-based sensitive image recognition methods, has obtained to surpass 80% discrimination on the CAMPAQ database.
Filter method with machinery is similar, above method is not well utilized the web feature, can't reach satisfied effect at present, for example the identification of text based sensitive web page can not well be discerned the normal webpage relevant with responsive theme, and the false recognition rate of discerning based on the sensitive web page of image is very high.Already present blending algorithm also only be by with or the operation merge, can not fundamentally improve discrimination.
In order to solve the deficiencies in the prior art, the objective of the invention is to pay close attention to the sensitive information identification of carrying out from web webpage characteristics, further improve discrimination to sensitive web page, for this reason, the present invention proposes a kind of content-based sensitive web page identification method.
To achieve these goals, it is as follows to the present invention is based on the step of sensitive web page identification method of content: comprise pre-treatment step and identification text message step;
Pre-treatment step comprises:
Under the condition of the uniform resource locator of given webpage, obtain the source code of this webpage, carry out data distribution and pre-service, obtain text message;
Obtain image section structural information in the webpage, select significance map and look like to form effective image collection;
Identification sensitive information step comprises:
Utilize continuous responsive text identification device that text message is discerned treatment step;
Utilize the discrete text recognizer that text message is carried out identification step;
Utilize the sensitive image recognizer that the image of image collection is carried out identification step.
Described identification sensitive information step is as follows:
Utilize continuous responsive text identification device that text message is discerned processing,, then dispose if recognition result is responsive; If recognition result is insensitive, then carry out:
The discrete text recognizer carries out identification step to text message, if recognizer is exported the result greater than threshold value, then recognition result is responsive, disposes; If recognition result is insensitive, then carry out:
The sensitive image recognizer carries out identification step to the image of image collection, and the result of identification and the result of discrete responsive text identification device merge, and judges according to its fusion results whether this webpage is responsive.
The present invention is directed in the prior art, the identification of text based sensitive web page can not well be discerned the normal webpage relevant with responsive theme; Based on the sensitive web page of image identification be adopt with or the technical scheme that merges of operation, can not fundamentally improve the problem of discrimination, the present invention adopts the technical scheme of continuous responsive text identification device, discrete text recognizer and the triplicity of sensitive image recognizer to solve prior art problems, the present invention utilizes the web structural information and has constructed an image collection identification problem and carried out information fusion, improves the discrimination to sensitive web page.
Description of drawings
By the detailed description below in conjunction with accompanying drawing, above-mentioned and others, feature and advantage of the present invention will become more apparent.In the accompanying drawing:
Fig. 1 is a system framework synoptic diagram of the present invention
Embodiment
Below in conjunction with accompanying drawing the present invention is specified.Be noted that the described example of executing only is considered as illustrative purposes, rather than limitation of the present invention.
According to the present invention, shown Fig. 1 is a system framework synoptic diagram of the present invention, and concrete steps are as follows:
At step S1: the source code that obtains given webpage URL;
At step S2: isolate the Chinese text in the source code;
At step S3: obtain the size information of image in the source code, weed out parts of images according to rule;
At step S4: utilize the continuous text sorter that the Chinese text of separating is discerned, recognition result is 1, and this webpage is responsive, then withdraws from;
At step S5: utilize the discrete text sorter that Chinese text is discerned, if recognition result greater than setting threshold, this webpage is responsive, then withdraws from;
At step S6: utilize the image classification device that image is discerned;
At step S7: the result of the result of identification and discrete text identification merges.
According to step S3, pick out important image step and comprise:
Obtain this webpage and comprise every width of cloth size of images information;
If the picture size size meets the good rule of prior statistics, this image is considered as the significance map picture, then is divided in effective image collection.
According to step S4, utilize continuous responsive text identification device identification text step to comprise:
Extract the feature of the text;
Text feature is input in the support vector machine (Support VectorMachine is called for short SVM) that has trained in advance, and the output result is that 1 text is responsive, disposes, otherwise continues to handle.
According to step S5, utilize discrete responsive text identification device identification text step to comprise:
Utilize vector space model (VSM) to extract the feature of the text;
Text feature is input in the Bayesian network that trained (Bayes Networks is called for short BNS), and the result of output is the responsive probability of text input, if probable value greater than threshold tau, then text be responsive, disposes, otherwise the continuation processing.
According to step S6, the image recognition step comprises:
Utilize the image recognition device that every width of cloth image is discerned, recognition result is N for responsive amount of images 1, recognition result is that normal amount of images is N 2
According to step S7, the information fusion step comprises:
The result of discrete text identification and the result of step S6 image recognition merge, in the formula of substitution as a result (1-1) of identification, if the result greater than 1, then this webpage be a sensitivity, otherwise is normally, disposes.
In the inventive method step S1 and step S2,, the web webpage is divided three classes based on analysis to web.The first kind is the webpage based on continuous text, and wherein continuous text is defined as the text of article character, and being characterized in has stronger semantic association between the context, have abundant semantic information to utilize.The type webpage has one piece or several pieces of articles usually.Second class is the webpage based on discrete text, and wherein discrete text refers to continuous text text in addition, and for example explanatory text around homepage or some pictures or the like mainly plays link or illustration.The 3rd class is meant the webpage based on image, and what mainly present in the webpage is image information, and adding has a spot of discrete text.
Particularly, the present invention is for the webpage of the first kind, and continuous text is main, selects for use in conjunction with filter method semantic and statistics, has defined three class keywords and has provided descriptive definition:
The first kind is explicit keyword, and this class keyword only may appear at responsive text the inside basically, statistically is exactly the probability very big (approaching 1) that appears at responsive text the inside, and appears at the probability very little (approaching 0) inside the normal text.From semantically, itself is just carrying sensitive information these speech.
Second class is the implicit expression keyword, and this class keyword did not carry any sensitive information originally.But for a certain reason, this class speech in responsive text generating fixing contact, that is to say that these speech also are to occur with very big probability in responsive text the inside, also can occur certainly in other text the inside.
The 3rd class formula logic keyword, this class keyword is divided into two classes: a class is a polysemant, promptly this class keyword is normal in normal text the inside meaning, carries sensitive information in responsive text the inside; An other class keyword mainly be that certain speech is arranged in pairs or groups after, carrying sensitive information jointly.And this collocation, we can be divided into two kinds, and a kind of is the explicit logic that adds, and a kind of is the logical add logic.Based on above-mentioned definition, chosen keyword set, make up semantic rules simultaneously and described semantic association between the vocabulary, help correct characteristic information extraction.Feature after proposing is through after the normalization, as the proper vector of this continuous text.By step S4, select for use support vector machine (Support Vector Machine, be called for short SVM) as sorter, feature is trained and classified, output decides whether this webpage is sensitive web page according to SVM.
Particularly, the present invention is for the webpage of second type, according to step S4, an artificial constructed lists of keywords, behind the statistics of the text in webpage keyword, be input to the Bayes network the inside that trains as proper vector after the normalization, decide according to the output of network whether this webpage is sensitive web page.
Particularly, the present invention by step S3, obtains the satisfactory image of part of webpage the inside for the webpage of the 3rd type according to size; By step S6, utilize the image classification device that image is discerned one by one, the result of identification is (N 1, N 2), N wherein 1For recognition result is responsive image number, N 2For recognition result is normal image number.Whether be responsive priori as image simultaneously,, use and text is differentiated that the result of output is: P to the text of webpage the inside at the Bayes sorter of discrete text according to step S5 sAccording to step S7, utilize two parameters to describe image classification device: P 1Represent a secondary normal picture mistake is divided into the probability of sensitive image, P 2Represent a secondary sensitive image mistake is divided into the probability of normal picture, three following formula of parameter substitution merge:
( 1 - p 2 ) N 1 p 2 N 2 p 1 N 1 ( 1 - p 1 ) N 2 * P s 1 - P s - - - ( 1 - 1 )
The above-mentioned formula of each sorter output valve substitution, result calculated and threshold judge whether this webpage is sensitive web page.
In the foregoing description, each step is example, and those of ordinary skills can determine the actual step that will use according to actual conditions, and the realization of each step has several different methods, all should belong within the scope of the present invention.
Explanation at last: top description is to be used to realize the present invention and embodiment, and scope of the present invention should not described by this and limit.It should be appreciated by those skilled in the art,, all belong to claim of the present invention and come restricted portion in any modification or partial replacement that does not depart from the scope of the present invention.

Claims (6)

1, a kind of content-based sensitive web page identification method comprises step:
Pre-treatment step comprises:
Under the condition of the uniform resource locator of given webpage, obtain the source code of this webpage, carry out data distribution and pre-service, obtain text message;
Obtain the structural information of image section in the webpage, select significance map and look like to form effective image collection;
Webpage sensitive information identification step comprises:
Utilize continuous responsive text identification device that text message is discerned treatment step;
Utilize the discrete text recognizer that text message is carried out identification step;
Utilize the sensitive image recognizer that the image of image collection is carried out identification step.
According to the described content-based sensitive web page identification method of claim 1, it is characterized in that 2, described identification sensitive information step is as follows:
Utilize continuous responsive text identification device that text message is discerned processing,, then dispose if recognition result is responsive; If recognition result is insensitive, then carry out:
The discrete text recognizer carries out identification step to text message, if recognizer is exported the result greater than threshold value, then recognition result is responsive, disposes; If recognition result is insensitive, then carry out:
The sensitive image recognizer carries out identification step to the image of image collection, and the result of identification and the result of discrete responsive text identification device merge, and judges according to its fusion results whether this webpage is responsive.
3, according to the described content-based sensitive web page identification method of claim 1, it is characterized in that, pick out important image step and comprise:
Obtain this webpage and comprise every width of cloth size of images information;
If the picture size size meets the good rule of prior statistics, this image is considered as the significance map picture, then is divided in effective image collection.
4, according to the described content-based sensitive web page identification method of claim 1, it is characterized in that, utilize continuous responsive text identification device identification text step to comprise:
Extract the feature of the text;
Text feature is input in the support vector machine that has trained in advance, and the output result is that 1 text is responsive, disposes, otherwise continues to handle.
5, according to the described content-based sensitive web page identification method of claim 1, it is characterized in that, utilize discrete responsive text identification device identification text step to comprise:
Utilize vector space model to extract the feature of the text;
Text feature is input in the Bayesian network that has trained, and the result of output is the responsive probability of text input, if probable value greater than threshold tau, then text be responsive, disposes, otherwise continues processing.
According to the described content-based sensitive web page identification method of claim 1, it is characterized in that 6, image recognition and information fusion step comprise:
Utilize the image recognition device that every width of cloth image is discerned, recognition result is N for responsive amount of images 1, recognition result is that normal amount of images is N 2
The result of discrete text identification and the result of above-mentioned image recognition merge, if the result greater than 1, then this webpage be a sensitivity, otherwise is normally, disposes.
CNB2006100731727A 2006-04-10 2006-04-10 Content based sensitive web page identification method Active CN100412888C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB2006100731727A CN100412888C (en) 2006-04-10 2006-04-10 Content based sensitive web page identification method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB2006100731727A CN100412888C (en) 2006-04-10 2006-04-10 Content based sensitive web page identification method

Publications (2)

Publication Number Publication Date
CN101055621A true CN101055621A (en) 2007-10-17
CN100412888C CN100412888C (en) 2008-08-20

Family

ID=38795454

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2006100731727A Active CN100412888C (en) 2006-04-10 2006-04-10 Content based sensitive web page identification method

Country Status (1)

Country Link
CN (1) CN100412888C (en)

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102332028A (en) * 2011-10-15 2012-01-25 西安交通大学 Webpage-oriented unhealthy Web content identifying method
CN102541913A (en) * 2010-12-15 2012-07-04 中国人民解放军国防科学技术大学 Web-oriented VSM (vector space model) classifier training method, web-oriented OSSP (open resource software page) identifying method and Web-oriented OSS (open resource software) resource extracting method
CN101763502B (en) * 2008-12-24 2012-07-25 中国科学院自动化研究所 High-efficiency method and system for sensitive image detection
CN102647416A (en) * 2012-03-30 2012-08-22 上海明复信息技术有限公司 System and method for filtering harmful information based on internet data source control
CN102693236A (en) * 2011-03-24 2012-09-26 苏州风采信息技术有限公司 Bad information filtering method based on content understanding
CN103473299A (en) * 2013-09-06 2013-12-25 北京锐安科技有限公司 Website bad likelihood obtaining method and device
CN103810425A (en) * 2012-11-13 2014-05-21 腾讯科技(深圳)有限公司 Method and device for detecting malicious website
CN104391860A (en) * 2014-10-22 2015-03-04 安一恒通(北京)科技有限公司 Content type detection method and device
CN104951802A (en) * 2015-06-17 2015-09-30 中国科学院自动化研究所 Classifier updating method
CN105162652A (en) * 2015-08-21 2015-12-16 成都秋雷科技有限责任公司 Processing method for webpage browsing
CN106682694A (en) * 2016-12-27 2017-05-17 复旦大学 Sensitive image identification method based on depth learning
CN106992922A (en) * 2014-05-15 2017-07-28 周奇 The method of the subjective sensitive information of expression
CN107943954A (en) * 2017-11-24 2018-04-20 杭州安恒信息技术有限公司 Detection method, device and the electronic equipment of webpage sensitive information
CN108234392A (en) * 2016-12-14 2018-06-29 北京国双科技有限公司 The monitoring method and device of a kind of website
CN109656141A (en) * 2019-01-11 2019-04-19 武汉天喻聚联网络有限公司 Violation identification and machine behaviour control method, equipment, storage medium based on artificial intelligence technology
CN109902223A (en) * 2019-01-14 2019-06-18 中国科学院信息工程研究所 A kind of harmful content filter method based on multi-modal information feature
CN110275958A (en) * 2019-06-26 2019-09-24 北京市博汇科技股份有限公司 Site information recognition methods, device and electronic equipment
CN114021559A (en) * 2021-11-02 2022-02-08 航天信息股份有限公司 Asset confirmation method and device in network security inspection, electronic equipment and medium
CN114782670A (en) * 2022-05-11 2022-07-22 中航信移动科技有限公司 Multi-mode sensitive information identification method, equipment and medium

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101359372B (en) * 2008-09-26 2011-05-11 腾讯科技(深圳)有限公司 Training method and device of classifier, method and apparatus for recognising sensitization picture

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1349180A (en) * 2001-12-03 2002-05-15 上海交通大学 Web page server based on content grading
CN1508755A (en) * 2002-12-17 2004-06-30 中国科学院自动化研究所 Sensitive video-frequency detecting method
JP3801138B2 (en) * 2003-01-21 2006-07-26 ブラザー工業株式会社 Communication system, communication terminal, and communication program

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101763502B (en) * 2008-12-24 2012-07-25 中国科学院自动化研究所 High-efficiency method and system for sensitive image detection
CN102541913A (en) * 2010-12-15 2012-07-04 中国人民解放军国防科学技术大学 Web-oriented VSM (vector space model) classifier training method, web-oriented OSSP (open resource software page) identifying method and Web-oriented OSS (open resource software) resource extracting method
CN102541913B (en) * 2010-12-15 2017-10-03 中国人民解放军国防科学技术大学 VSM classifier trainings, the identification of the OSSP pages and the OSS Resource Access methods of web oriented
CN102693236A (en) * 2011-03-24 2012-09-26 苏州风采信息技术有限公司 Bad information filtering method based on content understanding
CN102332028A (en) * 2011-10-15 2012-01-25 西安交通大学 Webpage-oriented unhealthy Web content identifying method
CN102647416A (en) * 2012-03-30 2012-08-22 上海明复信息技术有限公司 System and method for filtering harmful information based on internet data source control
CN103810425A (en) * 2012-11-13 2014-05-21 腾讯科技(深圳)有限公司 Method and device for detecting malicious website
CN103810425B (en) * 2012-11-13 2015-09-30 腾讯科技(深圳)有限公司 The detection method of malice network address and device
US9935967B2 (en) 2012-11-13 2018-04-03 Tencent Technology (Shenzhen) Company Limited Method and device for detecting malicious URL
CN103473299A (en) * 2013-09-06 2013-12-25 北京锐安科技有限公司 Website bad likelihood obtaining method and device
CN103473299B (en) * 2013-09-06 2017-02-08 北京锐安科技有限公司 Website bad likelihood obtaining method and device
CN106992922A (en) * 2014-05-15 2017-07-28 周奇 The method of the subjective sensitive information of expression
CN104391860A (en) * 2014-10-22 2015-03-04 安一恒通(北京)科技有限公司 Content type detection method and device
CN104391860B (en) * 2014-10-22 2018-03-02 安一恒通(北京)科技有限公司 content type detection method and device
CN104951802A (en) * 2015-06-17 2015-09-30 中国科学院自动化研究所 Classifier updating method
CN105162652A (en) * 2015-08-21 2015-12-16 成都秋雷科技有限责任公司 Processing method for webpage browsing
CN108234392A (en) * 2016-12-14 2018-06-29 北京国双科技有限公司 The monitoring method and device of a kind of website
CN108234392B (en) * 2016-12-14 2021-06-08 北京国双科技有限公司 Website monitoring method and device
CN106682694A (en) * 2016-12-27 2017-05-17 复旦大学 Sensitive image identification method based on depth learning
CN107943954A (en) * 2017-11-24 2018-04-20 杭州安恒信息技术有限公司 Detection method, device and the electronic equipment of webpage sensitive information
CN107943954B (en) * 2017-11-24 2020-07-10 杭州安恒信息技术股份有限公司 Method and device for detecting webpage sensitive information and electronic equipment
CN109656141A (en) * 2019-01-11 2019-04-19 武汉天喻聚联网络有限公司 Violation identification and machine behaviour control method, equipment, storage medium based on artificial intelligence technology
CN109902223A (en) * 2019-01-14 2019-06-18 中国科学院信息工程研究所 A kind of harmful content filter method based on multi-modal information feature
CN110275958A (en) * 2019-06-26 2019-09-24 北京市博汇科技股份有限公司 Site information recognition methods, device and electronic equipment
CN110275958B (en) * 2019-06-26 2021-07-27 北京市博汇科技股份有限公司 Website information identification method and device and electronic equipment
CN114021559A (en) * 2021-11-02 2022-02-08 航天信息股份有限公司 Asset confirmation method and device in network security inspection, electronic equipment and medium
CN114782670A (en) * 2022-05-11 2022-07-22 中航信移动科技有限公司 Multi-mode sensitive information identification method, equipment and medium

Also Published As

Publication number Publication date
CN100412888C (en) 2008-08-20

Similar Documents

Publication Publication Date Title
CN101055621A (en) Content based sensitive web page identification method
CN100565523C (en) A kind of filtering sensitive web page method and system based on multiple Classifiers Combination
US8078625B1 (en) URL-based content categorization
CN104679825B (en) Macroscopic abnormity of earthquake acquisition of information based on network text and screening technique
CN106126502B (en) A kind of emotional semantic classification system and method based on support vector machines
CN111324797B (en) Method and device for precisely acquiring data at high speed
CN112541476B (en) Malicious webpage identification method based on semantic feature extraction
CN110738033B (en) Report template generation method, device and storage medium
CN104317891B (en) A kind of method and device that label is marked to the page
CN111310476A (en) Public opinion monitoring method and system using aspect-based emotion analysis method
CN101038596A (en) Method and system for classifying website
CN112256861B (en) Rumor detection method based on search engine return result and electronic device
CN103064984A (en) Spam webpage identifying method and spam webpage identifying system
US8699796B1 (en) Identifying sensitive expressions in images for languages with large alphabets
Wahsheh et al. A link and content hybrid approach for Arabic web spam detection
Li et al. Semantic‐enhanced multimodal fusion network for fake news detection
CN114003803B (en) Method and system for discovering media account numbers of specific regions on social platform
CN111222031A (en) Website distinguishing method and system
CN114372144A (en) Gambling domain name identification method based on certificate and domain name resolution
CN110175288B (en) Method and system for filtering character and image data for teenager group
CN117614644A (en) Malicious website identification method, electronic equipment and storage medium
CN114764463A (en) Internet public opinion event automatic early warning system based on event propagation characteristics
LI et al. WAF‐based chinese character recognition for spam image filtering
CN117194773A (en) Website identification method and device based on multi-modal characteristics
CN110083760A (en) A kind of more recordable type dynamic web page information extracting methods based on visible-block

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant