CN106383862A - Violation short message detection method and system - Google Patents

Violation short message detection method and system Download PDF

Info

Publication number
CN106383862A
CN106383862A CN201610799866.2A CN201610799866A CN106383862A CN 106383862 A CN106383862 A CN 106383862A CN 201610799866 A CN201610799866 A CN 201610799866A CN 106383862 A CN106383862 A CN 106383862A
Authority
CN
China
Prior art keywords
violation
webpage
link
phrase
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610799866.2A
Other languages
Chinese (zh)
Other versions
CN106383862B (en
Inventor
肖耿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Clouds Network Technology Co Ltd
Original Assignee
Hangzhou Clouds Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Clouds Network Technology Co Ltd filed Critical Hangzhou Clouds Network Technology Co Ltd
Priority to CN201610799866.2A priority Critical patent/CN106383862B/en
Publication of CN106383862A publication Critical patent/CN106383862A/en
Application granted granted Critical
Publication of CN106383862B publication Critical patent/CN106383862B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Abstract

The invention discloses a violation short message detection method. The method comprises the steps of obtaining a link in short message contents, and obtaining a webpage which the link points to; judging whether the link is a violation link or not according to a violation keyword filtering result of text contents in the webpage; and if a short message contains the violation link, judging that the short message is a violation short message. Meanwhile, the invention provides a violation short message detection system. The system comprises a link obtaining module used for obtaining the link in the short message contents and obtaining the webpage which the link points to, a violation keyword filtering module used for judging whether the link is the violation link or not according to the violation keyword filtering result of the text contents in the webpage obtained by the link obtaining module, and a judgment module used for judging that the short message is the violation short message if it is judged that the short message contains the violation link according to a judgment result of the violation keyword filtering module. Through the technical scheme disclosed by the invention, the short message can be subjected to link content detection, so that the success rate of violation short message interception is effectively increased.

Description

A kind of violation note detection method and system
Technical field
The present invention relates to communication technique field, more particularly, to a kind of violation note detection method and realize changing method be System.
Background technology
Short message service is important component part in mobile phone communication service although personal under the impact of mobile social networking application Proportion using note exchange has declined, but the Extension Model based on bulk SMS still have its special advantage and always Continue to use.Bulk SMS federation as promotion media comprises user's information to be passed on, and such as has its ProductName, or meeting There is link it is desirable to note recipient passes through link checks their product, bring interests to them.
Short message sending platform, as service side, is had a responsibility for the content of bulk SMS is examined it is ensured that its short message content Do not comprise the related content of the illegal laws and regulations such as gambling, pornographic.The detection of existing violation note and monitor mode substantially may be used It is divided into two classes:A kind of be short message sending operator detection, carry out violation key word by manually checking or to short message content Filter two ways, filter out violation note and intercept its transmission;Operator's end detection can fundamentally intercept violation note Send, but short message sending businessman can add link to be pointing directly at popularization webpage in order to avoid intercepted in note, and in literary composition Occur without violation word in word content, avoid easily being intercepted.Another kind is to pass through application software and violation dictionary in mobile phone terminal, The note that mobile phone is received carries out key word filtration, and shield pack contains the note of violation content.Because mobile phone terminal and service end exist Performance and the huge spread of note flow aspect, the method is difficult to be applied to the violation note detection of short message sending platform.
Content of the invention
The present invention is note linked contents to be carried out in violation of rules and regulations to overcome short message sending platform of the prior art to be difficult to Hold detection, and lead to shield completely the deficiency of violation short message sending, there is provided one kind can link to sending note Content detection, effectively improves a kind of violation note detection method and the system of violation SMS interception success rate.
For achieving the above object, the present invention employs the following technical solutions:
A kind of violation note detection method of the present invention, specifically includes following steps:Obtain the link in short message content, obtain Take the webpage that link is pointed to;According to the violation key word filter result of word content in webpage, judge whether link is violation chain Connect;If note comprises to link in violation of rules and regulations, judge note as violation note.
Preferably, the described step obtaining the link in short message content, further include:Obtain the whole interior of note Hold, using matching regular expressions method, extract the link in short message content.
Preferably, the described violation key word filter result according to word content in webpage, judge that whether link is The step of link, further includes in violation of rules and regulations:Analyzing web page key element simultaneously extracts word content, simultaneously labelling each several part word content Webpage key element is originated;Word content is carried out with word segmentation processing and obtains participle phrase, by participle phrase and default violation key word Violation key word in storehouse is mated, the violation phrase in identification participle phrase;Given according to different web pages key element source and disobeying The rule default weight coefficient of phrase, the Weighted Term Frequency of violation phrase in the word content of weighted calculation webpage;When violation phrase When Weighted Term Frequency exceedes default threshold value, judge webpage as violation webpage;If connecting the webpage pointing to is violation webpage, judge chain It is connected in link in violation of rules and regulations.
Preferably, described webpage key element includes not carrying hyperlink display text and band hyperlink display text, originate super for not carrying The weight coefficient of the violation phrase of link characters is less than the weight coefficient for the violation phrase with hyperlink display text for the source.
Preferably, described webpage key element includes not carrying hyperlinked picture and band hyperlinked picture, originate super for not carrying The weight coefficient of the violation phrase of link picture is less than the weight coefficient for the violation phrase with hyperlinked picture for the source;Described Analyzing web page key element simultaneously extracts word content, simultaneously the step in the webpage key element source of labelling each several part word content, further Including:Obtain webpage in picture, and distinguish do not carry hyperlinked picture and band hyperlinked picture;Using OCR Identify and extract with the word content in hyperlinked picture, the webpage key element of this segment word content of labelling is originated super for not carrying Link picture;Identified using OCR and extract with the word content in hyperlinked picture, this is partly civilian for labelling The webpage key element of word content is originated as carrying hyperlinked picture.
The present invention also provides a kind of violation note detecting system, and described system includes:
Link acquisition module, for obtaining the link in short message content, obtains the webpage that link is pointed to;
Violation keyword filtering module, for obtaining the violation key word of word content in webpage according to link acquisition module Filter result, judges whether link is link in violation of rules and regulations;
Determination module, for the judged result according to violation keyword filtering module, judges that note comprises to link, then in violation of rules and regulations Judge note as violation note.
Preferably, described violation keyword filtering module specifically includes:
Word resolution unit, for analyzing web page key element and extract word content;
Origin marking unit, the webpage key element source of each several part word content extracting for labelling word resolution unit;
Participle unit, the word content for extracting to word resolution unit carries out word segmentation processing and obtains participle phrase;
Violation phrase recognition unit, in the participle phrase and default violation keywords database that obtain participle unit Violation key word is mated, the violation phrase in identification participle phrase;
Computing unit, gives the default weight coefficient of violation phrase, weighted calculation for originating according to different web pages key element The Weighted Term Frequency of violation phrase in the word content of webpage;
Link identifying unit, for when the Weighted Term Frequency of violation phrase exceedes default threshold value, judging webpage as in violation of rules and regulations Webpage;If connecting the webpage pointing to is violation webpage, judge to be linked as linking in violation of rules and regulations.
Preferably, described webpage key element includes not carrying hyperlink display text and band hyperlink display text, originate super for not carrying The weight coefficient of the violation phrase of link characters is less than the weight coefficient for the violation phrase with hyperlink display text for the source.
Preferably, described webpage key element includes not carrying hyperlinked picture and band hyperlinked picture, originate super for not carrying The weight coefficient of the violation phrase of link picture is less than the weight coefficient for the violation phrase with hyperlinked picture for the source;Described Word resolution unit includes optical character recognition subelement, for not carrying hyperlinked picture and band hyperlink in identification extraction webpage Word content in picture.
The invention discloses a kind of violation note detection method, by extracting the link in note, and access link sensing Webpage, by violation key word filtration is carried out to the word content of webpage, judge whether this webpage comprises violation content, thus Judge whether link links as violation, if note comprises to link in violation of rules and regulations, judge this note as violation note, and carry out corresponding The operation such as interception.The object that violation key word described in this method filters is included in the pure words content and picture of webpage Character, and according to content whether with link, give the word frequency that different weight coefficients calculate violation phrases, thus according to Family custom more reasonably judges that the legitimacy of webpage is pointed in link.Meanwhile, the invention also discloses a kind of violation note detects System, by linking acquisition module, obtains linking in short message content and obtains the webpage that link is pointed to, by violation key word Filtering module carries out violation key word filtration to described web page contents, thus judging webpage whether as violation webpage, detection is simultaneously Intercept the note comprising to link in violation of rules and regulations.The technical program is different from prior art and the linked contents in note can be examined Survey, thus ensureing violation SMS interception accuracy, businessman cannot avoid violation note to be intercepted by way of adding and linking, scheme Take undue profits.
Brief description
Fig. 1 implements a kind of schematic diagram of violation note detecting system of offer for the present invention.
Fig. 2 implements the first schematic diagram of the violation keyword filtering module of offer for the present invention.
Fig. 3 implements the second schematic diagram of the violation keyword filtering module of offer for the present invention.
Specific embodiment
With reference to the accompanying drawings and detailed description the present invention is described further.
The invention discloses a kind of violation note detection method and a kind of violation note detecting system, by extracting in note Link, and access link point to webpage;By violation key word filtration is carried out to the word content of webpage, judge this webpage Whether comprising violation content, thus judging whether link links as violation, if note comprises to link in violation of rules and regulations, judging this note For violation note, and the operation such as intercepted accordingly.The technical program is different from prior art can be to the link in note Content is detected, thus ensureing violation SMS interception accuracy, businessman cannot be avoided short in violation of rules and regulations by way of adding and linking Letter is intercepted, seeks undue profits.
A kind of violation note detection method specific embodiment:
Embodiment 1:A kind of violation note detection method specifically includes following steps:
S101 obtains the link in short message content, obtains the webpage that link is pointed to.
This step specifically includes the full content obtaining note, using matching regular expressions method, extracts short message content In link.Regular expression is a concept of computer science.Regular expression is described using single character string, mates A series of character strings meeting certain syntactic rule.This step obtains after deleting the punctuate in the space in short message content and no implication Obtain the word content of note, then link therein is identified by default regular expression, so compare direct method link, can Pass through to add space and idle character to hide link in short message editing to be prevented effectively from businessman, thus effectively improving the knowledge of link Not rate.
S102, according to the violation key word filter result of word content in webpage, judges whether link is link in violation of rules and regulations.
Preferably, described step further includes:Analyzing web page key element simultaneously extracts word content, simultaneously labelling each several part literary composition The webpage key element source of word content;Word content is carried out word segmentation processing obtain participle phrase, by participle phrase and default disobey Violation key word in rule keywords database is mated, the violation phrase in identification participle phrase;According to different web pages key element Lai Source gives the default weight coefficient of violation phrase, the Weighted Term Frequency of violation phrase in the word content of weighted calculation webpage;When separated When the Weighted Term Frequency of rule phrase exceedes default threshold value, judge webpage as violation webpage;If connecting the webpage pointing to is violation net Page, judges to be linked as linking in violation of rules and regulations.
If S103 note comprises to link in violation of rules and regulations, judge note as violation note.Receive the restriction of short message content, businessman The link added in note is usually the main contents that it is promoted, therefore only need to judgement be linked as linking in violation of rules and regulations it is possible to Judge note as violation note.
The invention discloses a kind of violation note detection method, by extracting the link in note, and access link sensing Webpage, by violation key word filtration is carried out to the word content of webpage, judge whether this webpage comprises violation content, thus Judge whether link links as violation, if note comprises to link in violation of rules and regulations, judge this note as violation note, and carry out corresponding The operation such as interception.The object that violation key word described in this method filters is included in the pure words content and picture of webpage Character, and according to content whether with link, give the word frequency that different weight coefficients calculate violation phrases, thus according to Family custom more reasonably judges that the legitimacy of webpage is pointed in link.
Embodiment 2:A kind of violation note detection method specifically includes following steps:
S201 obtains the full content of note, using matching regular expressions method, extracts the link in short message content.
S202 analyzing web page key element simultaneously extracts word content, simultaneously the webpage key element source of labelling each several part word content; Described webpage key element includes not carrying hyperlink display text and band hyperlink display text.
S203 carries out word segmentation processing and obtains participle phrase, by participle phrase and default violation keywords database to word content Interior violation key word is mated, the violation phrase in identification participle phrase.
S204 originates according to different web pages key element and gives the default weight coefficient of violation phrase, the word of weighted calculation webpage The Weighted Term Frequency of violation phrase in content;Preferably, it is little for the weight coefficient of the violation phrase with hyperlink display text to originate It is the weight coefficient of the violation phrase with hyperlink display text in source.
S205, when the Weighted Term Frequency of violation phrase exceedes default threshold value, judges webpage as violation webpage;
If it is violation webpage that S206 connects the webpage pointing to, judge to be linked as linking in violation of rules and regulations.
Embodiment 3:A kind of violation note detection method specifically includes following steps:
S301 obtains the full content of note, using matching regular expressions method, extracts the link in short message content.
S302 analyzing web page key element simultaneously extracts word content, simultaneously the webpage key element source of labelling each several part word content.
S303 obtain webpage in picture, and distinguish do not carry hyperlinked picture and band hyperlinked picture.
S304 is identified using OCR and extracts with the word content in hyperlinked picture, this portion of labelling The webpage key element dividing word content is originated as not carrying hyperlinked picture;Identify and extract band hyperlink using OCR Word content in map interlinking piece, the webpage key element of this segment word content of labelling is originated as carrying hyperlinked picture.
S305 carries out word segmentation processing and obtains participle phrase, by participle phrase and default violation keywords database to word content Interior violation key word is mated, the violation phrase in identification participle phrase.
S306 originates according to different web pages key element and gives the default weight coefficient of violation phrase, the word of weighted calculation webpage The Weighted Term Frequency of violation phrase in content;Preferably, it is little for the weight coefficient of the violation phrase with hyperlink display text to originate It is the weight coefficient of the violation phrase with hyperlink display text in source.
S307, when the Weighted Term Frequency of violation phrase exceedes default threshold value, judges webpage as violation webpage;
If it is violation webpage that S308 connects the webpage pointing to, judge to be linked as linking in violation of rules and regulations.
Embodiment 2, referring to Fig. 1, is a kind of the first schematic diagram of violation note detecting system of the present invention, as illustrated, A kind of violation note detecting system specifically includes:Link acquisition module, violation keyword filtering module and determination module.
Link acquisition module, for obtaining the link in short message content, obtains the webpage that link is pointed to;
Violation keyword filtering module, for obtaining the violation key word of word content in webpage according to link acquisition module Filter result, judges whether link is link in violation of rules and regulations.
Preferably, described violation keyword filtering module includes:Word resolution unit, for analyzing web page key element and carry Take word content;Origin marking unit, the webpage key element of each several part word content extracting for labelling word resolution unit is come Source;Participle unit, the word content for extracting to word resolution unit carries out word segmentation processing and obtains participle phrase;Violation phrase Recognition unit, is carried out for the violation key word in the participle phrase and default violation keywords database that obtain participle unit Join, the violation phrase in identification participle phrase.Computing unit, gives violation phrase for originating according to different web pages key element and presets Weight coefficient, the Weighted Term Frequency of violation phrase in the word content of weighted calculation webpage;Link identifying unit, for when in violation of rules and regulations When the Weighted Term Frequency of phrase exceedes default threshold value, judge webpage as violation webpage;If connecting the webpage pointing to is violation webpage, Judge to be linked as linking in violation of rules and regulations.
Determination module, for the judged result according to violation keyword filtering module, judges that note comprises to link, then in violation of rules and regulations Judge note as violation note.
The invention also discloses a kind of violation note detecting system, by linking acquisition module, obtain in short message content Link and obtain the webpage that link is pointed to, violation key word is carried out to described web page contents by violation keyword filtering module Filter, thus judging that webpage, whether as violation webpage, detects and intercept the note comprising to link in violation of rules and regulations.The technical program is different from Prior art can detect to the linked contents in note, thus ensureing violation SMS interception accuracy, businessman cannot lead to Cross and add the mode of link to avoid violation note to be intercepted, seek undue profits.
Embodiment 3:As shown in figure 1, a kind of violation note detecting system specifically includes:Link acquisition module, in violation of rules and regulations key Word filtering module and determination module.
Link acquisition module, for obtaining the link in short message content, obtains the webpage that link is pointed to;
As shown in Fig. 2 described violation keyword filtering module includes:
Word resolution unit, for analyzing web page key element and extract word content;
Origin marking unit, the webpage key element source of each several part word content extracting for labelling word resolution unit;
Participle unit, the word content for extracting to word resolution unit carries out word segmentation processing and obtains participle phrase;
Violation phrase recognition unit, in the participle phrase and default violation keywords database that obtain participle unit Violation key word is mated, the violation phrase in identification participle phrase;
Computing unit, gives the default weight coefficient of violation phrase, weighted calculation for originating according to different web pages key element The Weighted Term Frequency of violation phrase in the word content of webpage;
Link identifying unit, for when the Weighted Term Frequency of violation phrase exceedes default threshold value, judging webpage as in violation of rules and regulations Webpage;If connecting the webpage pointing to is violation webpage, judge to be linked as linking in violation of rules and regulations.
Preferably, described webpage key element includes not carrying hyperlink display text and band hyperlink display text, originates as not carrying hyperlink The weight coefficient connecing the violation phrase of word is less than the weight coefficient for the violation phrase with hyperlink display text for the source.
Determination module, for the judged result according to violation keyword filtering module, judges that note comprises to link, then in violation of rules and regulations Judge note as violation note.
This programme refines further to the violation keyword filtering module of violation note detecting system, by origin marking list Whether the webpage key element source that meta-tag goes out the webpage word content of extraction is with link, and by computing unit according to separated The webpage key element source of rule phrase gives different weight coefficients, and the Weighted Term Frequency of weighted calculation violation key word is as judgement The parameter of web page contents whether violation.Due to the word with link, it has the function of jump page after click, therefore this portion The weight that single cent word carries violation content is higher, by the Weighted Term Frequency of weighted calculation violation phrase thus greatly improving in webpage The violation detection accuracy of word content.
Embodiment 6:As shown in figure 1, a kind of violation note detecting system specifically includes:Link acquisition module, in violation of rules and regulations key Word filtering module and determination module.
Link acquisition module, for obtaining the link in short message content, obtains the webpage that link is pointed to;
As shown in figure 3, described violation keyword filtering module further includes:
Word resolution unit, for analyzing web page key element and extract word content;Described word resolution unit includes light Learn character recognition subelement, in identification extraction webpage with hyperlinked picture and with the word content in hyperlinked picture.
Origin marking unit, the webpage key element source of each several part word content extracting for labelling word resolution unit.
Participle unit, the word content for extracting to word resolution unit carries out word segmentation processing and obtains participle phrase.
Violation phrase recognition unit, in the participle phrase and default violation keywords database that obtain participle unit Violation key word is mated, the violation phrase in identification participle phrase.
Computing unit, gives the default weight coefficient of violation phrase, weighted calculation for originating according to different web pages key element The Weighted Term Frequency of violation phrase in the word content of webpage.
Preferably, described webpage key element includes not carrying hyperlinked picture and band hyperlinked picture, originates as not carrying hyperlink The weight coefficient of the violation phrase of map interlinking piece is less than the weight coefficient for the violation phrase with hyperlinked picture for the source.
Link identifying unit, for when the Weighted Term Frequency of violation phrase exceedes default threshold value, judging webpage as in violation of rules and regulations Webpage;If connecting the webpage pointing to is violation webpage, judge to be linked as linking in violation of rules and regulations.
Determination module, for the judged result according to violation keyword filtering module, judges that note comprises to link, then in violation of rules and regulations Judge note as violation note.
This programme refines further to the violation keyword filtering module of violation note detecting system, by origin marking list Meta-tag goes out the picture that the webpage key element source of the webpage word content of extraction expands in webpage, optical character recognition subelement Character in identification extraction picture, distinguish simultaneously as webpage key element source picture no be with link, by computing unit Give different weight coefficients according to the webpage key element source of violation phrase, and the Weighted Term Frequency of weighted calculation violation key word is made For judging the parameter of web page contents whether violation.Because displaying in webpage for the picture is more directly perceived and attractive therefore separated Probability that rule content occurs and impact have larger, are assigned with higher weight, and carry link picture its there is click after jump The function in blade-rotating face, therefore this segment word carry the weight highest of violation content, by the weighting of weighted calculation violation phrase Word frequency is thus greatly improve the violation detection accuracy of word content in webpage.

Claims (9)

1. a kind of violation note detection method, is characterized in that, comprise the following steps:
Obtain the link in short message content, obtain the webpage that link is pointed to;
According to the violation key word filter result of word content in webpage, judge whether link is link in violation of rules and regulations;
If note comprises to link in violation of rules and regulations, judge note as violation note.
2. a kind of violation note detection method according to claim 1, is characterized in that, the described chain obtaining in short message content The step connecing, further includes:
Obtain the full content of note, using matching regular expressions method, extract the link in short message content.
3. a kind of violation note detection method according to claim 1, is characterized in that, described according in word in webpage The violation key word filter result held, judges that whether link is the step linking in violation of rules and regulations, further includes:
Analyzing web page key element simultaneously extracts word content, simultaneously the webpage key element source of labelling each several part word content;
Word content is carried out with word segmentation processing and obtains participle phrase, by the violation in participle phrase and default violation keywords database Key word is mated, the violation phrase in identification participle phrase;
Originated according to different web pages key element and give the default weight coefficient of violation phrase, disobey in the word content of weighted calculation webpage The Weighted Term Frequency of rule phrase;
When the Weighted Term Frequency of violation phrase exceedes default threshold value, judge webpage as violation webpage;
If connecting the webpage pointing to is violation webpage, judge to be linked as linking in violation of rules and regulations.
4. a kind of violation note detection method according to claim 3, is characterized in that, described webpage key element includes not carrying Hyperlink display text and band hyperlink display text, the weight coefficient for the violation phrase with hyperlink display text of originating is less than source for band The weight coefficient of the violation phrase of hyperlink display text.
5. a kind of violation note detection method according to claim 3 or 4, is characterized in that, described webpage key element includes Do not carry hyperlinked picture and band hyperlinked picture, the weight coefficient for the violation phrase with hyperlinked picture of originating is less than source Weight coefficient for the violation phrase with hyperlinked picture;
Described analyzing web page key element simultaneously extracts word content, simultaneously the step in the webpage key element source of labelling each several part word content Suddenly, further include:
Obtain webpage in picture, and distinguish do not carry hyperlinked picture and band hyperlinked picture;
Identified using OCR and extract with the word content in hyperlinked picture, in this segment word of labelling The webpage key element held is originated as not carrying hyperlinked picture;
Identified using OCR and extract with the word content in hyperlinked picture, this segment word content of labelling Webpage key element originate for carry hyperlinked picture.
6. a kind of violation note detecting system, is characterized in that, including:
Link acquisition module, for obtaining the link in short message content, obtains the webpage that link is pointed to;
Violation keyword filtering module, the violation key word for obtaining word content in webpage according to link acquisition module filters As a result, judge whether link is link in violation of rules and regulations;
Determination module, for the judged result according to violation keyword filtering module, judges that note comprises to link in violation of rules and regulations, then judges Note is violation note.
7. a kind of violation note detecting system according to claim 6, is characterized in that, described violation key word filter module Block, including:
Word resolution unit, for analyzing web page key element and extract word content;
Origin marking unit, the webpage key element source of each several part word content extracting for labelling word resolution unit;
Participle unit, the word content for extracting to word resolution unit carries out word segmentation processing and obtains participle phrase;
Violation phrase recognition unit, for the violation in the participle phrase and default violation keywords database that obtain participle unit Key word is mated, the violation phrase in identification participle phrase;
Computing unit, gives the default weight coefficient of violation phrase, weighted calculation webpage for originating according to different web pages key element Word content in violation phrase Weighted Term Frequency;
Link identifying unit, for when the Weighted Term Frequency of violation phrase exceedes default threshold value, judging webpage as violation webpage; If connecting the webpage pointing to is violation webpage, judge to be linked as linking in violation of rules and regulations.
8. a kind of violation note detecting system according to claim 7, is characterized in that, described webpage key element includes not carrying Hyperlink display text and band hyperlink display text, the weight coefficient for the violation phrase with hyperlink display text of originating is less than source for band The weight coefficient of the violation phrase of hyperlink display text.
9. a kind of violation note detecting system according to claim 7 or 8, is characterized in that, described webpage key element includes Do not carry hyperlinked picture and band hyperlinked picture, the weight coefficient for the violation phrase with hyperlinked picture of originating is less than source Weight coefficient for the violation phrase with hyperlinked picture;Described word resolution unit includes optical character recognition subelement, For in identification extraction webpage with hyperlinked picture and with the word content in hyperlinked picture.
CN201610799866.2A 2016-08-31 2016-08-31 Illegal short message detection method and system Active CN106383862B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610799866.2A CN106383862B (en) 2016-08-31 2016-08-31 Illegal short message detection method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610799866.2A CN106383862B (en) 2016-08-31 2016-08-31 Illegal short message detection method and system

Publications (2)

Publication Number Publication Date
CN106383862A true CN106383862A (en) 2017-02-08
CN106383862B CN106383862B (en) 2019-12-31

Family

ID=57938012

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610799866.2A Active CN106383862B (en) 2016-08-31 2016-08-31 Illegal short message detection method and system

Country Status (1)

Country Link
CN (1) CN106383862B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107992578A (en) * 2017-12-06 2018-05-04 任明和 The database automatic testing method in objectionable video source
CN108960952A (en) * 2017-05-24 2018-12-07 阿里巴巴集团控股有限公司 A kind of detection method and device of violated information
CN110110577A (en) * 2019-01-22 2019-08-09 口碑(上海)信息技术有限公司 Identify method and device, the storage medium, electronic device of name of the dish
CN111597805A (en) * 2020-05-21 2020-08-28 上海创蓝文化传播有限公司 Method and device for auditing short message text links based on deep learning
CN115408420A (en) * 2022-09-02 2022-11-29 自然资源部地图技术审查中心 Method and apparatus for automatically filtering map markers and points of interest using a computer

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103902889A (en) * 2012-12-26 2014-07-02 腾讯科技(深圳)有限公司 Malicious message cloud detection method and server
US20150309981A1 (en) * 2014-04-28 2015-10-29 Elwha Llc Methods, systems, and devices for outcome prediction of text submission to network based on corpora analysis
CN105205090A (en) * 2015-05-29 2015-12-30 湖南大学 Web page text classification algorithm research based on web page link analysis and support vector machine
WO2016003084A1 (en) * 2014-07-01 2016-01-07 Samsung Electronics Co., Ltd. Method and apparatus of notifying of smishing
CN105335354A (en) * 2015-12-09 2016-02-17 中国联合网络通信集团有限公司 Cheat information recognition method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103902889A (en) * 2012-12-26 2014-07-02 腾讯科技(深圳)有限公司 Malicious message cloud detection method and server
US20150309981A1 (en) * 2014-04-28 2015-10-29 Elwha Llc Methods, systems, and devices for outcome prediction of text submission to network based on corpora analysis
WO2016003084A1 (en) * 2014-07-01 2016-01-07 Samsung Electronics Co., Ltd. Method and apparatus of notifying of smishing
CN105205090A (en) * 2015-05-29 2015-12-30 湖南大学 Web page text classification algorithm research based on web page link analysis and support vector machine
CN105335354A (en) * 2015-12-09 2016-02-17 中国联合网络通信集团有限公司 Cheat information recognition method and device

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108960952A (en) * 2017-05-24 2018-12-07 阿里巴巴集团控股有限公司 A kind of detection method and device of violated information
CN107992578A (en) * 2017-12-06 2018-05-04 任明和 The database automatic testing method in objectionable video source
CN107992578B (en) * 2017-12-06 2019-11-22 山西睿信智达传媒科技股份有限公司 The database automatic testing method in objectionable video source
CN110110577A (en) * 2019-01-22 2019-08-09 口碑(上海)信息技术有限公司 Identify method and device, the storage medium, electronic device of name of the dish
CN111597805A (en) * 2020-05-21 2020-08-28 上海创蓝文化传播有限公司 Method and device for auditing short message text links based on deep learning
CN111597805B (en) * 2020-05-21 2021-01-05 上海创蓝文化传播有限公司 Method and device for auditing short message text links based on deep learning
CN115408420A (en) * 2022-09-02 2022-11-29 自然资源部地图技术审查中心 Method and apparatus for automatically filtering map markers and points of interest using a computer

Also Published As

Publication number Publication date
CN106383862B (en) 2019-12-31

Similar Documents

Publication Publication Date Title
CN106383862A (en) Violation short message detection method and system
CN103544255B (en) Text semantic relativity based network public opinion information analysis method
CN103544436B (en) System and method for distinguishing phishing websites
CN103810425B (en) The detection method of malice network address and device
US9519718B2 (en) Webpage information detection method and system
CN105956180B (en) A kind of filtering sensitive words method
CN103336766A (en) Short text garbage identification and modeling method and device
CN102737183B (en) Method and device for webpage safety access
EP3933636A1 (en) Webpage tampering detection method and related apparatus
CN104156490A (en) Method and device for detecting suspicious fishing webpage based on character recognition
CN106713579B (en) Telephone number identification method and device
CN106874253A (en) Recognize the method and device of sensitive information
CN112541476B (en) Malicious webpage identification method based on semantic feature extraction
CN107341399A (en) Assess the method and device of code file security
CN110727766A (en) Method for detecting sensitive words
CN105939359A (en) Method and device for detecting privacy leakage of mobile terminal
CN103605691A (en) Device and method used for processing issued contents in social network
CN112328936A (en) Website identification method, device and equipment and computer readable storage medium
CN102646124A (en) Method for automatically identifying address information
CN110020161B (en) Data processing method, log processing method and terminal
CN105589916B (en) Method for extracting explicit and implicit interest knowledge
Wang et al. Validating multimedia content moderation software via semantic fusion
CN105653941A (en) Heuristic detection method and system for phishing website
CN111383660B (en) Website bad information monitoring system and monitoring method thereof
CN110175288B (en) Method and system for filtering character and image data for teenager group

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant