CN106383862A - Violation short message detection method and system - Google Patents
Violation short message detection method and system Download PDFInfo
- Publication number
- CN106383862A CN106383862A CN201610799866.2A CN201610799866A CN106383862A CN 106383862 A CN106383862 A CN 106383862A CN 201610799866 A CN201610799866 A CN 201610799866A CN 106383862 A CN106383862 A CN 106383862A
- Authority
- CN
- China
- Prior art keywords
- violation
- webpage
- link
- phrase
- word
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/955—Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Abstract
The invention discloses a violation short message detection method. The method comprises the steps of obtaining a link in short message contents, and obtaining a webpage which the link points to; judging whether the link is a violation link or not according to a violation keyword filtering result of text contents in the webpage; and if a short message contains the violation link, judging that the short message is a violation short message. Meanwhile, the invention provides a violation short message detection system. The system comprises a link obtaining module used for obtaining the link in the short message contents and obtaining the webpage which the link points to, a violation keyword filtering module used for judging whether the link is the violation link or not according to the violation keyword filtering result of the text contents in the webpage obtained by the link obtaining module, and a judgment module used for judging that the short message is the violation short message if it is judged that the short message contains the violation link according to a judgment result of the violation keyword filtering module. Through the technical scheme disclosed by the invention, the short message can be subjected to link content detection, so that the success rate of violation short message interception is effectively increased.
Description
Technical field
The present invention relates to communication technique field, more particularly, to a kind of violation note detection method and realize changing method be
System.
Background technology
Short message service is important component part in mobile phone communication service although personal under the impact of mobile social networking application
Proportion using note exchange has declined, but the Extension Model based on bulk SMS still have its special advantage and always
Continue to use.Bulk SMS federation as promotion media comprises user's information to be passed on, and such as has its ProductName, or meeting
There is link it is desirable to note recipient passes through link checks their product, bring interests to them.
Short message sending platform, as service side, is had a responsibility for the content of bulk SMS is examined it is ensured that its short message content
Do not comprise the related content of the illegal laws and regulations such as gambling, pornographic.The detection of existing violation note and monitor mode substantially may be used
It is divided into two classes:A kind of be short message sending operator detection, carry out violation key word by manually checking or to short message content
Filter two ways, filter out violation note and intercept its transmission;Operator's end detection can fundamentally intercept violation note
Send, but short message sending businessman can add link to be pointing directly at popularization webpage in order to avoid intercepted in note, and in literary composition
Occur without violation word in word content, avoid easily being intercepted.Another kind is to pass through application software and violation dictionary in mobile phone terminal,
The note that mobile phone is received carries out key word filtration, and shield pack contains the note of violation content.Because mobile phone terminal and service end exist
Performance and the huge spread of note flow aspect, the method is difficult to be applied to the violation note detection of short message sending platform.
Content of the invention
The present invention is note linked contents to be carried out in violation of rules and regulations to overcome short message sending platform of the prior art to be difficult to
Hold detection, and lead to shield completely the deficiency of violation short message sending, there is provided one kind can link to sending note
Content detection, effectively improves a kind of violation note detection method and the system of violation SMS interception success rate.
For achieving the above object, the present invention employs the following technical solutions:
A kind of violation note detection method of the present invention, specifically includes following steps:Obtain the link in short message content, obtain
Take the webpage that link is pointed to;According to the violation key word filter result of word content in webpage, judge whether link is violation chain
Connect;If note comprises to link in violation of rules and regulations, judge note as violation note.
Preferably, the described step obtaining the link in short message content, further include:Obtain the whole interior of note
Hold, using matching regular expressions method, extract the link in short message content.
Preferably, the described violation key word filter result according to word content in webpage, judge that whether link is
The step of link, further includes in violation of rules and regulations:Analyzing web page key element simultaneously extracts word content, simultaneously labelling each several part word content
Webpage key element is originated;Word content is carried out with word segmentation processing and obtains participle phrase, by participle phrase and default violation key word
Violation key word in storehouse is mated, the violation phrase in identification participle phrase;Given according to different web pages key element source and disobeying
The rule default weight coefficient of phrase, the Weighted Term Frequency of violation phrase in the word content of weighted calculation webpage;When violation phrase
When Weighted Term Frequency exceedes default threshold value, judge webpage as violation webpage;If connecting the webpage pointing to is violation webpage, judge chain
It is connected in link in violation of rules and regulations.
Preferably, described webpage key element includes not carrying hyperlink display text and band hyperlink display text, originate super for not carrying
The weight coefficient of the violation phrase of link characters is less than the weight coefficient for the violation phrase with hyperlink display text for the source.
Preferably, described webpage key element includes not carrying hyperlinked picture and band hyperlinked picture, originate super for not carrying
The weight coefficient of the violation phrase of link picture is less than the weight coefficient for the violation phrase with hyperlinked picture for the source;Described
Analyzing web page key element simultaneously extracts word content, simultaneously the step in the webpage key element source of labelling each several part word content, further
Including:Obtain webpage in picture, and distinguish do not carry hyperlinked picture and band hyperlinked picture;Using OCR
Identify and extract with the word content in hyperlinked picture, the webpage key element of this segment word content of labelling is originated super for not carrying
Link picture;Identified using OCR and extract with the word content in hyperlinked picture, this is partly civilian for labelling
The webpage key element of word content is originated as carrying hyperlinked picture.
The present invention also provides a kind of violation note detecting system, and described system includes:
Link acquisition module, for obtaining the link in short message content, obtains the webpage that link is pointed to;
Violation keyword filtering module, for obtaining the violation key word of word content in webpage according to link acquisition module
Filter result, judges whether link is link in violation of rules and regulations;
Determination module, for the judged result according to violation keyword filtering module, judges that note comprises to link, then in violation of rules and regulations
Judge note as violation note.
Preferably, described violation keyword filtering module specifically includes:
Word resolution unit, for analyzing web page key element and extract word content;
Origin marking unit, the webpage key element source of each several part word content extracting for labelling word resolution unit;
Participle unit, the word content for extracting to word resolution unit carries out word segmentation processing and obtains participle phrase;
Violation phrase recognition unit, in the participle phrase and default violation keywords database that obtain participle unit
Violation key word is mated, the violation phrase in identification participle phrase;
Computing unit, gives the default weight coefficient of violation phrase, weighted calculation for originating according to different web pages key element
The Weighted Term Frequency of violation phrase in the word content of webpage;
Link identifying unit, for when the Weighted Term Frequency of violation phrase exceedes default threshold value, judging webpage as in violation of rules and regulations
Webpage;If connecting the webpage pointing to is violation webpage, judge to be linked as linking in violation of rules and regulations.
Preferably, described webpage key element includes not carrying hyperlink display text and band hyperlink display text, originate super for not carrying
The weight coefficient of the violation phrase of link characters is less than the weight coefficient for the violation phrase with hyperlink display text for the source.
Preferably, described webpage key element includes not carrying hyperlinked picture and band hyperlinked picture, originate super for not carrying
The weight coefficient of the violation phrase of link picture is less than the weight coefficient for the violation phrase with hyperlinked picture for the source;Described
Word resolution unit includes optical character recognition subelement, for not carrying hyperlinked picture and band hyperlink in identification extraction webpage
Word content in picture.
The invention discloses a kind of violation note detection method, by extracting the link in note, and access link sensing
Webpage, by violation key word filtration is carried out to the word content of webpage, judge whether this webpage comprises violation content, thus
Judge whether link links as violation, if note comprises to link in violation of rules and regulations, judge this note as violation note, and carry out corresponding
The operation such as interception.The object that violation key word described in this method filters is included in the pure words content and picture of webpage
Character, and according to content whether with link, give the word frequency that different weight coefficients calculate violation phrases, thus according to
Family custom more reasonably judges that the legitimacy of webpage is pointed in link.Meanwhile, the invention also discloses a kind of violation note detects
System, by linking acquisition module, obtains linking in short message content and obtains the webpage that link is pointed to, by violation key word
Filtering module carries out violation key word filtration to described web page contents, thus judging webpage whether as violation webpage, detection is simultaneously
Intercept the note comprising to link in violation of rules and regulations.The technical program is different from prior art and the linked contents in note can be examined
Survey, thus ensureing violation SMS interception accuracy, businessman cannot avoid violation note to be intercepted by way of adding and linking, scheme
Take undue profits.
Brief description
Fig. 1 implements a kind of schematic diagram of violation note detecting system of offer for the present invention.
Fig. 2 implements the first schematic diagram of the violation keyword filtering module of offer for the present invention.
Fig. 3 implements the second schematic diagram of the violation keyword filtering module of offer for the present invention.
Specific embodiment
With reference to the accompanying drawings and detailed description the present invention is described further.
The invention discloses a kind of violation note detection method and a kind of violation note detecting system, by extracting in note
Link, and access link point to webpage;By violation key word filtration is carried out to the word content of webpage, judge this webpage
Whether comprising violation content, thus judging whether link links as violation, if note comprises to link in violation of rules and regulations, judging this note
For violation note, and the operation such as intercepted accordingly.The technical program is different from prior art can be to the link in note
Content is detected, thus ensureing violation SMS interception accuracy, businessman cannot be avoided short in violation of rules and regulations by way of adding and linking
Letter is intercepted, seeks undue profits.
A kind of violation note detection method specific embodiment:
Embodiment 1:A kind of violation note detection method specifically includes following steps:
S101 obtains the link in short message content, obtains the webpage that link is pointed to.
This step specifically includes the full content obtaining note, using matching regular expressions method, extracts short message content
In link.Regular expression is a concept of computer science.Regular expression is described using single character string, mates
A series of character strings meeting certain syntactic rule.This step obtains after deleting the punctuate in the space in short message content and no implication
Obtain the word content of note, then link therein is identified by default regular expression, so compare direct method link, can
Pass through to add space and idle character to hide link in short message editing to be prevented effectively from businessman, thus effectively improving the knowledge of link
Not rate.
S102, according to the violation key word filter result of word content in webpage, judges whether link is link in violation of rules and regulations.
Preferably, described step further includes:Analyzing web page key element simultaneously extracts word content, simultaneously labelling each several part literary composition
The webpage key element source of word content;Word content is carried out word segmentation processing obtain participle phrase, by participle phrase and default disobey
Violation key word in rule keywords database is mated, the violation phrase in identification participle phrase;According to different web pages key element Lai
Source gives the default weight coefficient of violation phrase, the Weighted Term Frequency of violation phrase in the word content of weighted calculation webpage;When separated
When the Weighted Term Frequency of rule phrase exceedes default threshold value, judge webpage as violation webpage;If connecting the webpage pointing to is violation net
Page, judges to be linked as linking in violation of rules and regulations.
If S103 note comprises to link in violation of rules and regulations, judge note as violation note.Receive the restriction of short message content, businessman
The link added in note is usually the main contents that it is promoted, therefore only need to judgement be linked as linking in violation of rules and regulations it is possible to
Judge note as violation note.
The invention discloses a kind of violation note detection method, by extracting the link in note, and access link sensing
Webpage, by violation key word filtration is carried out to the word content of webpage, judge whether this webpage comprises violation content, thus
Judge whether link links as violation, if note comprises to link in violation of rules and regulations, judge this note as violation note, and carry out corresponding
The operation such as interception.The object that violation key word described in this method filters is included in the pure words content and picture of webpage
Character, and according to content whether with link, give the word frequency that different weight coefficients calculate violation phrases, thus according to
Family custom more reasonably judges that the legitimacy of webpage is pointed in link.
Embodiment 2:A kind of violation note detection method specifically includes following steps:
S201 obtains the full content of note, using matching regular expressions method, extracts the link in short message content.
S202 analyzing web page key element simultaneously extracts word content, simultaneously the webpage key element source of labelling each several part word content;
Described webpage key element includes not carrying hyperlink display text and band hyperlink display text.
S203 carries out word segmentation processing and obtains participle phrase, by participle phrase and default violation keywords database to word content
Interior violation key word is mated, the violation phrase in identification participle phrase.
S204 originates according to different web pages key element and gives the default weight coefficient of violation phrase, the word of weighted calculation webpage
The Weighted Term Frequency of violation phrase in content;Preferably, it is little for the weight coefficient of the violation phrase with hyperlink display text to originate
It is the weight coefficient of the violation phrase with hyperlink display text in source.
S205, when the Weighted Term Frequency of violation phrase exceedes default threshold value, judges webpage as violation webpage;
If it is violation webpage that S206 connects the webpage pointing to, judge to be linked as linking in violation of rules and regulations.
Embodiment 3:A kind of violation note detection method specifically includes following steps:
S301 obtains the full content of note, using matching regular expressions method, extracts the link in short message content.
S302 analyzing web page key element simultaneously extracts word content, simultaneously the webpage key element source of labelling each several part word content.
S303 obtain webpage in picture, and distinguish do not carry hyperlinked picture and band hyperlinked picture.
S304 is identified using OCR and extracts with the word content in hyperlinked picture, this portion of labelling
The webpage key element dividing word content is originated as not carrying hyperlinked picture;Identify and extract band hyperlink using OCR
Word content in map interlinking piece, the webpage key element of this segment word content of labelling is originated as carrying hyperlinked picture.
S305 carries out word segmentation processing and obtains participle phrase, by participle phrase and default violation keywords database to word content
Interior violation key word is mated, the violation phrase in identification participle phrase.
S306 originates according to different web pages key element and gives the default weight coefficient of violation phrase, the word of weighted calculation webpage
The Weighted Term Frequency of violation phrase in content;Preferably, it is little for the weight coefficient of the violation phrase with hyperlink display text to originate
It is the weight coefficient of the violation phrase with hyperlink display text in source.
S307, when the Weighted Term Frequency of violation phrase exceedes default threshold value, judges webpage as violation webpage;
If it is violation webpage that S308 connects the webpage pointing to, judge to be linked as linking in violation of rules and regulations.
Embodiment 2, referring to Fig. 1, is a kind of the first schematic diagram of violation note detecting system of the present invention, as illustrated,
A kind of violation note detecting system specifically includes:Link acquisition module, violation keyword filtering module and determination module.
Link acquisition module, for obtaining the link in short message content, obtains the webpage that link is pointed to;
Violation keyword filtering module, for obtaining the violation key word of word content in webpage according to link acquisition module
Filter result, judges whether link is link in violation of rules and regulations.
Preferably, described violation keyword filtering module includes:Word resolution unit, for analyzing web page key element and carry
Take word content;Origin marking unit, the webpage key element of each several part word content extracting for labelling word resolution unit is come
Source;Participle unit, the word content for extracting to word resolution unit carries out word segmentation processing and obtains participle phrase;Violation phrase
Recognition unit, is carried out for the violation key word in the participle phrase and default violation keywords database that obtain participle unit
Join, the violation phrase in identification participle phrase.Computing unit, gives violation phrase for originating according to different web pages key element and presets
Weight coefficient, the Weighted Term Frequency of violation phrase in the word content of weighted calculation webpage;Link identifying unit, for when in violation of rules and regulations
When the Weighted Term Frequency of phrase exceedes default threshold value, judge webpage as violation webpage;If connecting the webpage pointing to is violation webpage,
Judge to be linked as linking in violation of rules and regulations.
Determination module, for the judged result according to violation keyword filtering module, judges that note comprises to link, then in violation of rules and regulations
Judge note as violation note.
The invention also discloses a kind of violation note detecting system, by linking acquisition module, obtain in short message content
Link and obtain the webpage that link is pointed to, violation key word is carried out to described web page contents by violation keyword filtering module
Filter, thus judging that webpage, whether as violation webpage, detects and intercept the note comprising to link in violation of rules and regulations.The technical program is different from
Prior art can detect to the linked contents in note, thus ensureing violation SMS interception accuracy, businessman cannot lead to
Cross and add the mode of link to avoid violation note to be intercepted, seek undue profits.
Embodiment 3:As shown in figure 1, a kind of violation note detecting system specifically includes:Link acquisition module, in violation of rules and regulations key
Word filtering module and determination module.
Link acquisition module, for obtaining the link in short message content, obtains the webpage that link is pointed to;
As shown in Fig. 2 described violation keyword filtering module includes:
Word resolution unit, for analyzing web page key element and extract word content;
Origin marking unit, the webpage key element source of each several part word content extracting for labelling word resolution unit;
Participle unit, the word content for extracting to word resolution unit carries out word segmentation processing and obtains participle phrase;
Violation phrase recognition unit, in the participle phrase and default violation keywords database that obtain participle unit
Violation key word is mated, the violation phrase in identification participle phrase;
Computing unit, gives the default weight coefficient of violation phrase, weighted calculation for originating according to different web pages key element
The Weighted Term Frequency of violation phrase in the word content of webpage;
Link identifying unit, for when the Weighted Term Frequency of violation phrase exceedes default threshold value, judging webpage as in violation of rules and regulations
Webpage;If connecting the webpage pointing to is violation webpage, judge to be linked as linking in violation of rules and regulations.
Preferably, described webpage key element includes not carrying hyperlink display text and band hyperlink display text, originates as not carrying hyperlink
The weight coefficient connecing the violation phrase of word is less than the weight coefficient for the violation phrase with hyperlink display text for the source.
Determination module, for the judged result according to violation keyword filtering module, judges that note comprises to link, then in violation of rules and regulations
Judge note as violation note.
This programme refines further to the violation keyword filtering module of violation note detecting system, by origin marking list
Whether the webpage key element source that meta-tag goes out the webpage word content of extraction is with link, and by computing unit according to separated
The webpage key element source of rule phrase gives different weight coefficients, and the Weighted Term Frequency of weighted calculation violation key word is as judgement
The parameter of web page contents whether violation.Due to the word with link, it has the function of jump page after click, therefore this portion
The weight that single cent word carries violation content is higher, by the Weighted Term Frequency of weighted calculation violation phrase thus greatly improving in webpage
The violation detection accuracy of word content.
Embodiment 6:As shown in figure 1, a kind of violation note detecting system specifically includes:Link acquisition module, in violation of rules and regulations key
Word filtering module and determination module.
Link acquisition module, for obtaining the link in short message content, obtains the webpage that link is pointed to;
As shown in figure 3, described violation keyword filtering module further includes:
Word resolution unit, for analyzing web page key element and extract word content;Described word resolution unit includes light
Learn character recognition subelement, in identification extraction webpage with hyperlinked picture and with the word content in hyperlinked picture.
Origin marking unit, the webpage key element source of each several part word content extracting for labelling word resolution unit.
Participle unit, the word content for extracting to word resolution unit carries out word segmentation processing and obtains participle phrase.
Violation phrase recognition unit, in the participle phrase and default violation keywords database that obtain participle unit
Violation key word is mated, the violation phrase in identification participle phrase.
Computing unit, gives the default weight coefficient of violation phrase, weighted calculation for originating according to different web pages key element
The Weighted Term Frequency of violation phrase in the word content of webpage.
Preferably, described webpage key element includes not carrying hyperlinked picture and band hyperlinked picture, originates as not carrying hyperlink
The weight coefficient of the violation phrase of map interlinking piece is less than the weight coefficient for the violation phrase with hyperlinked picture for the source.
Link identifying unit, for when the Weighted Term Frequency of violation phrase exceedes default threshold value, judging webpage as in violation of rules and regulations
Webpage;If connecting the webpage pointing to is violation webpage, judge to be linked as linking in violation of rules and regulations.
Determination module, for the judged result according to violation keyword filtering module, judges that note comprises to link, then in violation of rules and regulations
Judge note as violation note.
This programme refines further to the violation keyword filtering module of violation note detecting system, by origin marking list
Meta-tag goes out the picture that the webpage key element source of the webpage word content of extraction expands in webpage, optical character recognition subelement
Character in identification extraction picture, distinguish simultaneously as webpage key element source picture no be with link, by computing unit
Give different weight coefficients according to the webpage key element source of violation phrase, and the Weighted Term Frequency of weighted calculation violation key word is made
For judging the parameter of web page contents whether violation.Because displaying in webpage for the picture is more directly perceived and attractive therefore separated
Probability that rule content occurs and impact have larger, are assigned with higher weight, and carry link picture its there is click after jump
The function in blade-rotating face, therefore this segment word carry the weight highest of violation content, by the weighting of weighted calculation violation phrase
Word frequency is thus greatly improve the violation detection accuracy of word content in webpage.
Claims (9)
1. a kind of violation note detection method, is characterized in that, comprise the following steps:
Obtain the link in short message content, obtain the webpage that link is pointed to;
According to the violation key word filter result of word content in webpage, judge whether link is link in violation of rules and regulations;
If note comprises to link in violation of rules and regulations, judge note as violation note.
2. a kind of violation note detection method according to claim 1, is characterized in that, the described chain obtaining in short message content
The step connecing, further includes:
Obtain the full content of note, using matching regular expressions method, extract the link in short message content.
3. a kind of violation note detection method according to claim 1, is characterized in that, described according in word in webpage
The violation key word filter result held, judges that whether link is the step linking in violation of rules and regulations, further includes:
Analyzing web page key element simultaneously extracts word content, simultaneously the webpage key element source of labelling each several part word content;
Word content is carried out with word segmentation processing and obtains participle phrase, by the violation in participle phrase and default violation keywords database
Key word is mated, the violation phrase in identification participle phrase;
Originated according to different web pages key element and give the default weight coefficient of violation phrase, disobey in the word content of weighted calculation webpage
The Weighted Term Frequency of rule phrase;
When the Weighted Term Frequency of violation phrase exceedes default threshold value, judge webpage as violation webpage;
If connecting the webpage pointing to is violation webpage, judge to be linked as linking in violation of rules and regulations.
4. a kind of violation note detection method according to claim 3, is characterized in that, described webpage key element includes not carrying
Hyperlink display text and band hyperlink display text, the weight coefficient for the violation phrase with hyperlink display text of originating is less than source for band
The weight coefficient of the violation phrase of hyperlink display text.
5. a kind of violation note detection method according to claim 3 or 4, is characterized in that, described webpage key element includes
Do not carry hyperlinked picture and band hyperlinked picture, the weight coefficient for the violation phrase with hyperlinked picture of originating is less than source
Weight coefficient for the violation phrase with hyperlinked picture;
Described analyzing web page key element simultaneously extracts word content, simultaneously the step in the webpage key element source of labelling each several part word content
Suddenly, further include:
Obtain webpage in picture, and distinguish do not carry hyperlinked picture and band hyperlinked picture;
Identified using OCR and extract with the word content in hyperlinked picture, in this segment word of labelling
The webpage key element held is originated as not carrying hyperlinked picture;
Identified using OCR and extract with the word content in hyperlinked picture, this segment word content of labelling
Webpage key element originate for carry hyperlinked picture.
6. a kind of violation note detecting system, is characterized in that, including:
Link acquisition module, for obtaining the link in short message content, obtains the webpage that link is pointed to;
Violation keyword filtering module, the violation key word for obtaining word content in webpage according to link acquisition module filters
As a result, judge whether link is link in violation of rules and regulations;
Determination module, for the judged result according to violation keyword filtering module, judges that note comprises to link in violation of rules and regulations, then judges
Note is violation note.
7. a kind of violation note detecting system according to claim 6, is characterized in that, described violation key word filter module
Block, including:
Word resolution unit, for analyzing web page key element and extract word content;
Origin marking unit, the webpage key element source of each several part word content extracting for labelling word resolution unit;
Participle unit, the word content for extracting to word resolution unit carries out word segmentation processing and obtains participle phrase;
Violation phrase recognition unit, for the violation in the participle phrase and default violation keywords database that obtain participle unit
Key word is mated, the violation phrase in identification participle phrase;
Computing unit, gives the default weight coefficient of violation phrase, weighted calculation webpage for originating according to different web pages key element
Word content in violation phrase Weighted Term Frequency;
Link identifying unit, for when the Weighted Term Frequency of violation phrase exceedes default threshold value, judging webpage as violation webpage;
If connecting the webpage pointing to is violation webpage, judge to be linked as linking in violation of rules and regulations.
8. a kind of violation note detecting system according to claim 7, is characterized in that, described webpage key element includes not carrying
Hyperlink display text and band hyperlink display text, the weight coefficient for the violation phrase with hyperlink display text of originating is less than source for band
The weight coefficient of the violation phrase of hyperlink display text.
9. a kind of violation note detecting system according to claim 7 or 8, is characterized in that, described webpage key element includes
Do not carry hyperlinked picture and band hyperlinked picture, the weight coefficient for the violation phrase with hyperlinked picture of originating is less than source
Weight coefficient for the violation phrase with hyperlinked picture;Described word resolution unit includes optical character recognition subelement,
For in identification extraction webpage with hyperlinked picture and with the word content in hyperlinked picture.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610799866.2A CN106383862B (en) | 2016-08-31 | 2016-08-31 | Illegal short message detection method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610799866.2A CN106383862B (en) | 2016-08-31 | 2016-08-31 | Illegal short message detection method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106383862A true CN106383862A (en) | 2017-02-08 |
CN106383862B CN106383862B (en) | 2019-12-31 |
Family
ID=57938012
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610799866.2A Active CN106383862B (en) | 2016-08-31 | 2016-08-31 | Illegal short message detection method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106383862B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107992578A (en) * | 2017-12-06 | 2018-05-04 | 任明和 | The database automatic testing method in objectionable video source |
CN108960952A (en) * | 2017-05-24 | 2018-12-07 | 阿里巴巴集团控股有限公司 | A kind of detection method and device of violated information |
CN110110577A (en) * | 2019-01-22 | 2019-08-09 | 口碑(上海)信息技术有限公司 | Identify method and device, the storage medium, electronic device of name of the dish |
CN111597805A (en) * | 2020-05-21 | 2020-08-28 | 上海创蓝文化传播有限公司 | Method and device for auditing short message text links based on deep learning |
CN115408420A (en) * | 2022-09-02 | 2022-11-29 | 自然资源部地图技术审查中心 | Method and apparatus for automatically filtering map markers and points of interest using a computer |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103902889A (en) * | 2012-12-26 | 2014-07-02 | 腾讯科技(深圳)有限公司 | Malicious message cloud detection method and server |
US20150309981A1 (en) * | 2014-04-28 | 2015-10-29 | Elwha Llc | Methods, systems, and devices for outcome prediction of text submission to network based on corpora analysis |
CN105205090A (en) * | 2015-05-29 | 2015-12-30 | 湖南大学 | Web page text classification algorithm research based on web page link analysis and support vector machine |
WO2016003084A1 (en) * | 2014-07-01 | 2016-01-07 | Samsung Electronics Co., Ltd. | Method and apparatus of notifying of smishing |
CN105335354A (en) * | 2015-12-09 | 2016-02-17 | 中国联合网络通信集团有限公司 | Cheat information recognition method and device |
-
2016
- 2016-08-31 CN CN201610799866.2A patent/CN106383862B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103902889A (en) * | 2012-12-26 | 2014-07-02 | 腾讯科技(深圳)有限公司 | Malicious message cloud detection method and server |
US20150309981A1 (en) * | 2014-04-28 | 2015-10-29 | Elwha Llc | Methods, systems, and devices for outcome prediction of text submission to network based on corpora analysis |
WO2016003084A1 (en) * | 2014-07-01 | 2016-01-07 | Samsung Electronics Co., Ltd. | Method and apparatus of notifying of smishing |
CN105205090A (en) * | 2015-05-29 | 2015-12-30 | 湖南大学 | Web page text classification algorithm research based on web page link analysis and support vector machine |
CN105335354A (en) * | 2015-12-09 | 2016-02-17 | 中国联合网络通信集团有限公司 | Cheat information recognition method and device |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108960952A (en) * | 2017-05-24 | 2018-12-07 | 阿里巴巴集团控股有限公司 | A kind of detection method and device of violated information |
CN107992578A (en) * | 2017-12-06 | 2018-05-04 | 任明和 | The database automatic testing method in objectionable video source |
CN107992578B (en) * | 2017-12-06 | 2019-11-22 | 山西睿信智达传媒科技股份有限公司 | The database automatic testing method in objectionable video source |
CN110110577A (en) * | 2019-01-22 | 2019-08-09 | 口碑(上海)信息技术有限公司 | Identify method and device, the storage medium, electronic device of name of the dish |
CN111597805A (en) * | 2020-05-21 | 2020-08-28 | 上海创蓝文化传播有限公司 | Method and device for auditing short message text links based on deep learning |
CN111597805B (en) * | 2020-05-21 | 2021-01-05 | 上海创蓝文化传播有限公司 | Method and device for auditing short message text links based on deep learning |
CN115408420A (en) * | 2022-09-02 | 2022-11-29 | 自然资源部地图技术审查中心 | Method and apparatus for automatically filtering map markers and points of interest using a computer |
Also Published As
Publication number | Publication date |
---|---|
CN106383862B (en) | 2019-12-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106383862A (en) | Violation short message detection method and system | |
CN103544255B (en) | Text semantic relativity based network public opinion information analysis method | |
CN103544436B (en) | System and method for distinguishing phishing websites | |
CN103810425B (en) | The detection method of malice network address and device | |
US9519718B2 (en) | Webpage information detection method and system | |
CN105956180B (en) | A kind of filtering sensitive words method | |
CN103336766A (en) | Short text garbage identification and modeling method and device | |
CN102737183B (en) | Method and device for webpage safety access | |
EP3933636A1 (en) | Webpage tampering detection method and related apparatus | |
CN104156490A (en) | Method and device for detecting suspicious fishing webpage based on character recognition | |
CN106713579B (en) | Telephone number identification method and device | |
CN106874253A (en) | Recognize the method and device of sensitive information | |
CN112541476B (en) | Malicious webpage identification method based on semantic feature extraction | |
CN107341399A (en) | Assess the method and device of code file security | |
CN110727766A (en) | Method for detecting sensitive words | |
CN105939359A (en) | Method and device for detecting privacy leakage of mobile terminal | |
CN103605691A (en) | Device and method used for processing issued contents in social network | |
CN112328936A (en) | Website identification method, device and equipment and computer readable storage medium | |
CN102646124A (en) | Method for automatically identifying address information | |
CN110020161B (en) | Data processing method, log processing method and terminal | |
CN105589916B (en) | Method for extracting explicit and implicit interest knowledge | |
Wang et al. | Validating multimedia content moderation software via semantic fusion | |
CN105653941A (en) | Heuristic detection method and system for phishing website | |
CN111383660B (en) | Website bad information monitoring system and monitoring method thereof | |
CN110175288B (en) | Method and system for filtering character and image data for teenager group |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |