CN104331475A - Information detection method and device - Google Patents

Information detection method and device Download PDF

Info

Publication number
CN104331475A
CN104331475A CN201410611713.1A CN201410611713A CN104331475A CN 104331475 A CN104331475 A CN 104331475A CN 201410611713 A CN201410611713 A CN 201410611713A CN 104331475 A CN104331475 A CN 104331475A
Authority
CN
China
Prior art keywords
word
text message
keyword
information
attribute
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410611713.1A
Other languages
Chinese (zh)
Other versions
CN104331475B (en
Inventor
张扬蕾
张丽辉
冯晓娜
刘建辉
文帅营
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ZHENGZHOU XIZHI INFORMATION TECHNOLOGY Co Ltd
Original Assignee
ZHENGZHOU XIZHI INFORMATION TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZHENGZHOU XIZHI INFORMATION TECHNOLOGY Co Ltd filed Critical ZHENGZHOU XIZHI INFORMATION TECHNOLOGY Co Ltd
Priority to CN201410611713.1A priority Critical patent/CN104331475B/en
Publication of CN104331475A publication Critical patent/CN104331475A/en
Application granted granted Critical
Publication of CN104331475B publication Critical patent/CN104331475B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3335Syntactic pre-processing, e.g. stopword elimination, stemming

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Document Processing Apparatus (AREA)

Abstract

The invention provides an information detection method and an information detection device. The information detection method comprises the steps of obtaining text information of to-be-detected information; comparing the text information with first attributive words in a multiattribute thesaurus, wherein the first attributive words include key words and anagrams of the key words; comparing five characters in front of the first attributive words and five characters behind the first attributive words in the text information with second attributive words in the multiattribute thesaurus when the text information contains the first attributive words, so as to obtain a comparison result, wherein the second attributive words are determiners of the key words; determining whether the text information is illegal information according to the comparison result. Compared with the prior art, the information detection method has the advantages that the illegal information is determined by comparing different words, so that the text information can be completely detected, and the probability of incorrect determination caused by single key words is reduced, and the information detection accuracy is increased.

Description

A kind of information detecting method and device
Technical field
The application relates to information detection technology field, particularly a kind of information detecting method and device.
Background technology
Website obtains the favor of more and more people as a kind of novel tool of communications, and in order to prevent invalid information, the information that "pornography, gambling and drug abuse and trafficking", violence, terror etc. country forbids issuing is related to as included, website is issued, needed first to carry out legitimacy detection to information before Information issued, so-called legitimacy shows the requirement of information conforms homeland security.
Instantly information detecting method is: treat Detection Information and carry out word segmentation processing, obtain multiple independently word, then the keyword in each independently word and keywords database is compared, when word is identical with the keyword in keywords database, judge that information to be detected is as invalid information, namely do not allow the information carrying out announcing, the keyword wherein in keywords database is the word showing to relate to the information such as "pornography, gambling and drug abuse and trafficking", violence, terror.
As can be seen from said process, keyword whether is contained to judge whether information to be detected is invalid information in one group of word that existing information detection method obtains after only carrying out participle according to information to be detected, this determination methods can not judge Detection Information usually comprehensively, and therefore prior art need to improve to the accuracy that invalid information judges.
Summary of the invention
In view of this, the application provides a kind of information detecting method, for improving the accuracy of infomation detection.
The application also provides a kind of information detector, in order to ensure said method implementation and application in practice.
The technical scheme of the information detecting method that the application provides and device is as follows:
On the one hand, the embodiment of the present application provides a kind of information detecting method, and described method comprises:
Obtain the text message of information to be detected;
Text message and the first attribute word in the many attributes dictionary set up in advance are compared, wherein the first attribute word comprises the alternative word of keyword and keyword, and alternative word is the word having same pronunciation with keyword or comprise same morpheme;
When text message comprises the first attribute word, second attribute word of five characters before being arranged in the first attribute word in text message and five characters after being positioned at the first attribute word and many attributes dictionary is compared, obtain comparison result, second attribute word is the determiner of keyword, and determiner is used for limiting keyword;
According to comparison result, determine whether text message is invalid information.
Preferably, determiner comprises key player on a team's word, and key player on a team's word and keyword form illegal phrase;
According to comparison result, determine whether text message is that invalid information comprises: when comparison result shows that text message comprises key player on a team's word, determine that text message is invalid information;
When comparison result shows not comprise key player on a team's word in text message, determine that text message is legal information.
Preferably, determiner comprises and instead selects word, and anti-word and the keyword of selecting forms legal phrase;
According to comparison result, determine whether text message is that invalid information comprises: when comparison result show not comprise in text message counter select word time, determine that text message is invalid information;
When comparison result show text message comprise counter select word time, determine that text message is legal information.
Preferably, the text message obtaining information to be detected comprises:
Determine the position of symbol in information to be detected;
From determined position delete mark, obtain text message.
Preferably, the process of establishing in advance of many attributes dictionary comprises:
Obtain the keyword of arbitrary object to be detected;
Attributive analysis is carried out to keyword, obtains alternative word and the second attribute word of keyword;
According to the keyword obtained, determine obtained alternative word and the second position of attribute word in many attributes dictionary;
Obtained alternative word and the second attribute word are write in determined position.
On the other hand, the application provides a kind of information detector, and described device comprises:
Acquisition module, for obtaining the text message of information to be detected;
First comparing module, for text message and the first attribute word in the many attributes dictionary set up in advance are compared, wherein the first attribute word comprises the alternative word of keyword and keyword, and alternative word is the word having same pronunciation with keyword or comprise same morpheme;
Second comparing module, during for comprising the first attribute word when text message, second attribute word of five characters before being arranged in the first attribute word in text message and five characters after being positioned at the first attribute word and many attributes dictionary is compared, obtain comparison result, second attribute word is the determiner of keyword, and determiner is used for limiting keyword;
Determination module, for according to comparison result, determines whether text message is invalid information.
Preferably, determiner comprises key player on a team's word, and key player on a team's word and keyword form illegal phrase;
Determination module is used for when comparison result shows that text message comprises key player on a team's word, determines that text message is invalid information; And during for showing when comparison result not comprise key player on a team's word in text message, determine that text message is legal information.
Preferably, determiner comprises and instead selects word, and anti-word and the keyword of selecting forms legal phrase;
Determination module be used for when comparison result show not comprise in text message counter select word time, determine that text message is invalid information; And for show when comparison result text message comprise counter select word time, determine that text message is legal information.
Preferably, acquisition module comprises:
Determining unit, for determining the position of symbol in information to be detected;
Delete cells, for from determined position delete mark, obtains text message.
Preferably, information detector also comprises:
Keyword acquisition module, for obtaining the keyword of arbitrary object to be detected;
Analysis module, for carrying out attributive analysis to keyword, obtains alternative word and the second attribute word of keyword;
Position acquisition module, for according to the keyword obtained, determines obtained alternative word and the second position of attribute word in many attributes dictionary;
Write module, for obtained alternative word and the second attribute word being write in determined position.
Compared with prior art, the application comprises following advantage:
In this application, the text message of information to be detected is first obtained, text message and the first attribute word in the many attributes dictionary set up in advance are compared, when text message comprises the first attribute word, five characters before being positioned at the first attribute word in text message and five characters after being positioned at the first attribute word and the second attribute word are compared to obtain comparison result, then according to comparison result, judge whether text message is invalid information, compared with prior art, the application is not only whether text message by treating measurement information comprises keyword and judge whether it is invalid information, also can judge whether comprise five characters before being positioned at the first attribute word in the alternative word of keyword and text message and five characters determiner whether comprised for limiting keyword after being positioned at the first attribute word finally judges whether text message is invalid information until the text message of measurement information further, this by comparing to determine invalid information mode with different word relative to employing single keyword judgement invalid information method, comparatively comprehensively can detect text message, reduce the probability of the decision error that single keyword causes, thus improve the accuracy of infomation detection.
Accompanying drawing explanation
In order to be illustrated more clearly in the technical scheme in the embodiment of the present application, below the accompanying drawing used required in describing embodiment is briefly described, apparently, accompanying drawing in the following describes is only some embodiments of the application, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to these accompanying drawings.
The process flow diagram of a kind of information detecting method that Fig. 1 provides for the embodiment of the present application;
The second process flow diagram of a kind of information detecting method that Fig. 2 provides for the embodiment of the present application during key player on a team's word for determiner;
Fig. 3 for determiner for counter select word time the embodiment of the present application the third process flow diagram of a kind of information detecting method of providing;
The process flow diagram of process of establishing in advance of a kind of information detecting method many attributes dictionary that Fig. 4 provides for the embodiment of the present application;
Staff's inputting interface schematic diagram of a kind of information detecting method that Fig. 5 provides for the embodiment of the present application;
The schematic diagram of a kind of information detector that Fig. 6 provides for the embodiment of the present application;
The schematic diagram of the acquisition module of a kind of information detector that Fig. 7 provides for the embodiment of the present application;
For setting up the correlation module schematic diagram of many attributes dictionary in a kind of information detector that Fig. 8 provides for the embodiment of the present application.
Embodiment
The application is understood better in order to make those skilled in the art, below in conjunction with the accompanying drawing in the embodiment of the present application, the technical scheme in the embodiment of the present application is clearly and completely described, obviously, described embodiment is only some embodiments of the present application, instead of whole embodiments.Based on the embodiment in the application, those of ordinary skill in the art are not making the every other embodiment obtained under creative work prerequisite, all belong to the scope of the application's protection.
Refer to Fig. 1, it illustrates the process flow diagram of a kind of information detecting method that the embodiment of the present application provides, can comprise the following steps:
101: the text message obtaining information to be detected.
Wherein text message is the information of word segment composition in information to be detected, text information does not comprise the non-legible information such as punctuation mark, a kind of feasible pattern obtaining text message is in the embodiment of the present application: all deleted by the symbol in information to be detected, and remaining part is the text message of information to be detected.
Such as information to be detected is: during 12 days 6 October, and prohibition of drug group of Cang Yuan county, through careful investigation, establishes card to tackle traffic in drugs vehicle at little black Jiang Zhishuan Jiang Fangxiang two km.When 6 40 points, a mini van does not listen prohibition of drug people's police warning to rush card by force.At the text message obtained after treatment be: October 12,6 prohibition of drug groups of Shi Cangyuan county did not listen prohibition of drug people's police warning to rush card by force through careful investigation 40 points of mini vans when little black Jiang Zhishuan Jiang Fangxiang two km establishes card interception traffic in drugs vehicle 6, can find out that text message only comprises word from this example.
102: text message and the first attribute word in the many attributes dictionary set up in advance are compared.
First attribute word comprises the alternative word of keyword and keyword in the embodiment of the present application, and wherein keyword to determine that text message is the primary word of invalid information, such as, relate to the word that "pornography, gambling and drug abuse and trafficking", violence, terror etc. violate the information of national relevant regulations.
Alternative word is the word having same pronunciation with keyword or comprise same morpheme, and its extent of injury is identical with the extent of injury of keyword, for getting rid of this situation of artificial clerical error keyword when information to be detected is invalid information.When such as keyword is invoice, its alternative word can be unstable, send out drift etc.; Such as keyword is rifle again, and its alternative word can be wooden storehouse etc.
When the first attribute word in text message and many attributes dictionary is compared, be that text message and keyword and alternative word are compared, successively to determine whether comprise the first attribute word in text message; If do not comprise the first attribute word in text message, then text information is legal information, end operation; If text message comprises the first attribute word, then text information may be invalid information, now needs text message and other words to compare, finally to determine whether it is invalid information.
103: when text message comprises the first attribute word, the second attribute word of five characters before being arranged in the first attribute word in text message and five characters after being positioned at the first attribute word and many attributes dictionary is compared, obtains comparison result.
Wherein the second attribute word is the determiner of keyword, for limiting keyword.So-called restriction can be some restrictions of usable range, use-pattern, use approach etc. to keyword; Before in phrase order, determiner can be positioned at keyword, as " sucking " in " sucking methamphetamine ", the use-pattern before this determiner is positioned at keyword and for limiting methamphetamine; Certainly after in phrase order, determiner also can be positioned at keyword, as " detection " of " methamphetamine detection ", after this determiner is positioned at keyword and for limiting use approach.
First attribute word comprises keyword and alternative word in the embodiment of the present application, when text message comprises keyword, then will be positioned at five characters before keyword in text message and be positioned at five characters after keyword and the second attribute word is compared; When text message comprises alternative word, then will be positioned at five characters before alternative word in text message and be positioned at five characters after alternative word and the second attribute word is compared; When text message comprises keyword and alternative word simultaneously, then will be positioned at five characters before keyword in text message and be positioned at five characters after keyword, and five characters after being positioned at five characters before alternative word and being positioned at alternative word be all compared with the second attribute word.
As position in text message of the determiner of the second attribute word near keyword, therefore by forward and backward each five characters of the first attribute word in text message, totally ten characters and determiner are compared, to determine whether above-mentioned ten characters comprise the second attribute word, the accuracy of text message when whether detection comprises the second attribute word can be improved thus.If be spaced more than five and five characters in the second attribute word in text message and the first attribute word, the second attribute word just can not play restriction effect to the first attribute word, does not now then need to judge that whether text message is illegal according to the second attribute word.
104: according to comparison result, determine whether text message is invalid information.
In the embodiment of the present application after acquisition comparison result, can according to comparison result from semantically judging whether text message is invalid information.
Application technique scheme, first obtains the text message of information to be detected, text message and the first attribute word in the many attributes dictionary set up in advance are compared, when text message comprises the first attribute word, five characters before being positioned at the first attribute word in text message and five characters after being positioned at the first attribute word and the second attribute word are compared to obtain comparison result, then according to comparison result, judge whether text message is invalid information, compared with prior art, the application is not only whether text message by treating measurement information comprises keyword whether sentence it be invalid information, also can judge whether comprise five characters before being positioned at the first attribute word in the alternative word of keyword and text message and five characters determiner whether comprised for limiting keyword after being positioned at the first attribute word finally judges whether text message is invalid information until the text message of measurement information further, this by comparing to determine invalid information mode with different word relative to employing single keyword judgement invalid information method, comparatively comprehensively can detect text message, reduce the probability of the decision error that single keyword causes, thus improve the accuracy of infomation detection.
Carry out illustration the application in the embodiment of the present application by way of example and compare to determine invalid information mode relative to the accuracy adopting single keyword judgement invalid information method can improve infomation detection with different word:
As text message is: whether " selling this commodity of a kind of commodity can detect in food containing methamphetamine composition ", keyword is: methamphetamine, and its determiner is: detect.When adopting existing single keyword to judge, text information comprises keyword ice, then the current situation text information will be judged to be invalid information to adopt single keyword to judge.But actual by semantic analysis known text information is legal information, the judged result mistake of single keyword.When the infomation detection mode adopting the embodiment of the present application to provide, first judge that text information is likely invalid information by keyword, secondly text information and determiner " detection " are compared, obtaining comparison result is that text message comprises this determiner of detection, then according to comparison result from from semantically judging that text message is legal information, judged result is correct.Can prove that information detecting method that the embodiment of the present application provides can improve the accuracy of infomation detection by this example.
Key player on a team's word or the anti-word that selects will be comprised to whether being that invalid information is described according to comparison result determination text message in the embodiment of the present application below with determiner.Wherein key player on a team's word and keyword form illegal phrase, and the key player on a team's word as " invoice " comprises " generation opens ", " sale " etc., when comprising key player on a team's word and keyword in text message simultaneously, text information is invalid information.Corresponding anti-word and the keyword of selecting forms legal phrase, and the anti-word that selects of such as ice comprises " test paper ", " detection " etc., when text message comprise counter select word and keyword time, the text is legal information.From key player on a team's word with instead select word, both are different to the judgment mode of text message, specifically can consult shown in Fig. 2 and Fig. 3.
Wherein Fig. 2 is determiner when being key player on a team's word, and the second process flow diagram of the information detecting method that the embodiment of the present application provides, can comprise the following steps:
101: the text message obtaining information to be detected.All deleted by symbol in information to be detected, remaining part is the text message of information to be detected.
102: text message and the first attribute word in the many attributes dictionary set up in advance are compared, wherein the first attribute word comprises the alternative word of keyword and keyword, and alternative word is the word having same pronunciation with keyword or comprise same morpheme.
103: when text message comprises the first attribute word, the second attribute word of five characters before being arranged in the first attribute word in text message and five characters after being positioned at the first attribute word and many attributes dictionary is compared, obtains comparison result.Second attribute word is the determiner of keyword, for limiting keyword.
105: when comparison result shows that text message comprises key player on a team's word, determine that text message is invalid information.
106: when comparison result shows not comprise key player on a team's word in text message, determine that text message is legal information.
That to be determiner be Fig. 3 is anti-when selecting word, and the third process flow diagram of the information detecting method that the embodiment of the present application provides, can comprise the following steps:
101: the text message obtaining information to be detected.
All deleted by symbol in information to be detected, remaining part is the text message of information to be detected.
102: text message and the first attribute word in the many attributes dictionary set up in advance are compared, wherein the first attribute word comprises the alternative word of keyword and keyword, and alternative word is the word having same pronunciation with keyword or comprise same morpheme.
103: when text message comprises the first attribute word, the second attribute word of five characters before being arranged in the first attribute word in text message and five characters after being positioned at the first attribute word and many attributes dictionary is compared, obtains comparison result.Second attribute word is the determiner of keyword, for limiting keyword.
107: when comparison result show not comprise in text message counter select word time, determine that text message is invalid information;
108: when comparison result show text message comprise counter select word time, determine that text message is legal information.
It should be noted is that: the information detecting method that the embodiment of the present application provides whether can also comprise key player on a team's word to text message simultaneously and the anti-word that selects judges, when by key player on a team's word or counter select word to judge that text message is invalid information time, then determine that text message is invalid information.
Also comprise the process of establishing in advance of many attributes dictionary in above-mentioned all embodiments, refer to Fig. 4, it illustrates in the embodiment of the present application the process setting up many attributes dictionary, can comprise the following steps:
401: the keyword obtaining arbitrary object to be detected.
Wherein object to be detected is be present in text message to cause text message to be the things of invalid information, and as aforementioned methamphetamine is an object to be detected, the keyword so got is ice.
402: attributive analysis is carried out to keyword, obtain alternative word and the second attribute word of keyword.
Can be wherein completed by staff to the attributive analysis of keyword, after its attribute of analysis, input its alternative word thought and the second attribute word.Such as can provide the interface shown in Fig. 5 for staff, the alternative word thought by staff and the second attribute word write the relevant position at this interface, thus obtain alternative word and the second attribute word of keyword.
403: according to the keyword obtained, determine obtained alternative word and the second position of attribute word in many attributes dictionary.
After getting keyword, alternative word and the second attribute word, first need to determine that the second attribute word (i.e. determiner) of the position of keyword in many attributes dictionary and keyword selects word for key player on a team's word is still counter, then to determine with keyword in the position of same a line as alternative word and the second position of attribute word in many attributes dictionary according to the position of keyword.
404: obtained alternative word and the second attribute word are write in determined position.
For table 1, table 1 is a kind of form of many attributes dictionary in the embodiment of the present application, and it illustrates keyword, alternative word and the storage mode of the second attribute word in many attributes dictionary, wherein "×" represents that this word does not exist.
A kind of form of table 1 more than attribute dictionary
After many attributes dictionary has been set up, if need to add keyword, alternative word and the second attribute word, then whenever discovery keyword, alternative word and the second attribute word, repeat step 303 to 304 to improve many attributes dictionary.
For aforesaid each embodiment of the method, in order to simple description, therefore it is all expressed as a series of combination of actions, but those skilled in the art should know, the application is not by the restriction of described sequence of movement, because according to the application, some step can adopt other orders or carry out simultaneously.Secondly, those skilled in the art also should know, the embodiment described in instructions all belongs to preferred embodiment, and involved action and module might not be that the application is necessary.
Corresponding with said method embodiment, the embodiment of the present application also provides a kind of information detector, and a kind of structural representation of information detector as shown in Figure 6, comprising: acquisition module 11, first comparing module 12, second comparing module 13 and determination module 14.Wherein:
Acquisition module 11, for obtaining the text message of information to be detected.
Wherein text message is the information of word segment composition in information to be detected, text information does not comprise the non-legible information such as punctuation mark, a kind of feasible pattern of acquisition module 11 is in the embodiment of the present application: all deleted by the symbol in information to be detected, and remaining part is the text message of information to be detected.
Such as information to be detected is: during 12 days 6 October, and prohibition of drug group of Cang Yuan county, through careful investigation, establishes card to tackle traffic in drugs vehicle at little black Jiang Zhishuan Jiang Fangxiang two km.When 6 40 points, a mini van does not listen prohibition of drug people's police warning to rush card by force.At the text message that acquisition module 11 obtains after treatment be: October 12,6 prohibition of drug groups of Shi Cangyuan county did not listen prohibition of drug people's police warning to rush card by force through careful investigation 40 points of mini vans when little black Jiang Zhishuan Jiang Fangxiang two km establishes card interception traffic in drugs vehicle 6, can find out that text message only comprises word from this example.
Concrete acquisition module 11 can take structural representation as shown in Figure 7, and acquisition module 11 can comprise: determining unit 111 and delete cells 112, wherein:
Determining unit 111, for determining the position of symbol in described information to be detected;
Delete cells 112, for deleting described symbol from determined position, obtains described text message.
First comparing module 12, for comparing text message and the first attribute word in the many attributes dictionary set up in advance.
First attribute word comprises the alternative word of keyword and keyword in the embodiment of the present application, and wherein keyword to determine that text message is the primary word of invalid information, such as, relate to the word that "pornography, gambling and drug abuse and trafficking", violence, terror etc. violate the information of national relevant regulations.
Alternative word is the word having same pronunciation with keyword or comprise same morpheme, and its extent of injury is identical with the extent of injury of keyword, for getting rid of this situation of artificial clerical error keyword when information to be detected is invalid information.When such as keyword is invoice, its alternative word can be unstable, send out drift etc.; Such as keyword is rifle again, and its alternative word can be wooden storehouse etc.
First comparing module 12, when being compared by the first attribute word in text message and many attributes dictionary, is compare, text message and keyword and alternative word to determine whether comprise the first attribute word in text message successively; If do not comprise the first attribute word in text message, then text information is legal information, end operation; If text message comprises the first attribute word, then text information may be invalid information, needs to carry out next step operation and namely triggers the second comparing module 13, finally to determine whether it is invalid information.
Second comparing module 13, during for comprising the first attribute word when text message, second attribute word of five characters before being arranged in the first attribute word in text message and five characters after being positioned at the first attribute word and many attributes dictionary is compared, obtains comparison result.
Wherein the second attribute word is the determiner of keyword, for limiting keyword.So-called restriction can be some restrictions of usable range, use-pattern, use approach etc. to keyword; Before in phrase order, determiner can be positioned at keyword, as " sucking " in " sucking methamphetamine ", the use-pattern before this determiner is positioned at keyword and for limiting methamphetamine; Certainly after in phrase order, determiner also can be positioned at keyword, as " detection " of " methamphetamine detection ", after this determiner is positioned at keyword and for limiting use approach.
When text message comprises the first attribute word, by forward and backward each five characters of the first attribute word in text message, totally ten characters and the second attribute word are compared, to determine whether above-mentioned ten characters comprise the second attribute word, the accuracy of text message when whether detection comprises the second attribute word can be improved thus.If be spaced more than five and five characters in the second attribute word in text message and the first attribute word, the second attribute word just can not play restriction effect to the first attribute word, does not now then need to judge that whether text message is illegal according to the second attribute word.
Determination module 14, for according to comparison result, determines whether text message is invalid information.In the embodiment of the present application after acquisition comparison result, determination module 14 can according to comparison result from semantically judging whether text message is invalid information.
Key player on a team's word will be comprised below or the anti-word that selects is described determination module in the embodiment of the present application 14 with determiner.Wherein key player on a team's word and keyword form illegal phrase, and the key player on a team's word as " invoice " comprises " generation opens ", " sale " etc., when comprising key player on a team's word and keyword in text message simultaneously, text information is invalid information.Corresponding anti-word and the keyword of selecting forms legal phrase, and the anti-word that selects of such as ice comprises " test paper ", " detection " etc., when text message comprise counter select word and keyword time, the text is legal information.
When determiner comprises key player on a team's word, determination module 14 for: when comparison result shows that text message comprises key player on a team's word, determine that text message is invalid information; When comparison result shows not comprise key player on a team's word in text message, determine that text message is legal information;
Determiner comprise counter select word time, determination module 14 for: when comparison result show not comprise in text message counter select word time, determine that text message is invalid information; And for show when comparison result text message comprise counter select word time, determine that text message is legal information.
The device of above-mentioned all embodiments all stores many attributes dictionary.Refer to Fig. 8, it illustrates the correlation module for setting up many attributes dictionary that a kind of information detector of the embodiment of the present application can comprise, comprising: keyword acquisition module 15, analysis module 16, position acquisition module 17 and write module 18.Wherein:
Keyword acquisition module 15, for obtaining the keyword of arbitrary object to be detected.
Wherein object to be detected is be present in text message to cause text message to be the things of invalid information, and as aforementioned methamphetamine is an object to be detected, the keyword so got is ice
Analysis module 16, for carrying out attributive analysis to described keyword, obtains the alternative word of described keyword and described second attribute word.
Can be wherein completed by staff to the attributive analysis of keyword, after its attribute of analysis, input its alternative word thought and the second attribute word.Such as can provide the interface shown in Fig. 5 for staff, the alternative word thought by staff and the second attribute word write the relevant position at this interface, thus analysis module 16 obtains alternative word and the second attribute word of keyword.
Position acquisition module 17, for according to the described keyword obtained, determines obtained described alternative word and the described position of the second attribute word in described many attributes dictionary.
After getting keyword, alternative word and the second attribute word, first the second attribute word (i.e. determiner) of the position of keyword in many attributes dictionary and keyword selects word for key player on a team's word is still counter to need position acquisition module 17 to determine, then position acquisition module 17 to be determined with keyword in the position of same a line as alternative word and the second position of attribute word in many attributes dictionary according to the position of keyword.
Write module 18, for obtained described alternative word and described second attribute word being write in determined position.
For table 1, table 1 is a kind of form of many attributes dictionary in the embodiment of the present application, and it illustrates keyword, alternative word and the storage mode of the second attribute word in many attributes dictionary, wherein "×" represents that this word does not exist.
A kind of form of table 1 more than attribute dictionary
For convenience of description, various unit is divided into describe respectively with function when describing above device.Certainly, the function of each unit can be realized in same or multiple software and/or hardware when implementing the application.
A kind of information detecting method provided the application above and device are described in detail, apply specific case herein to set forth the principle of the application and embodiment, the explanation of above embodiment is just for helping method and the core concept thereof of understanding the application; Meanwhile, for one of ordinary skill in the art, according to the thought of the application, all will change in specific embodiments and applications, in sum, this description should not be construed as the restriction to the application.

Claims (10)

1. an information detecting method, is characterized in that, described method comprises:
Obtain the text message of information to be detected;
Described text message and the first attribute word in the many attributes dictionary set up in advance are compared, wherein said first attribute word comprises the alternative word of keyword and described keyword, and described alternative word is have same pronunciation with described keyword or comprise the word of same morpheme;
When described text message comprises described first attribute word, second attribute word of five characters before being arranged in described first attribute word in described text message and five characters after being positioned at described first attribute word and described many attributes dictionary is compared, obtain comparison result, described second attribute word is the determiner of described keyword, and described determiner is used for limiting described keyword;
According to described comparison result, determine whether described text message is invalid information.
2. method according to claim 1, is characterized in that, described determiner comprises key player on a team's word, and described key player on a team's word and described keyword form illegal phrase;
Described according to described comparison result, determine whether described text message is that invalid information comprises: when described comparison result shows that described text message comprises described key player on a team's word, determine that described text message is invalid information;
When described comparison result shows not comprise described key player on a team's word in described text message, determine that described text message is legal information.
3. method according to claim 1, is characterized in that, described determiner comprises and instead selects word, and described anti-word and the described keyword of selecting forms legal phrase;
Described according to described comparison result, determine whether described text message is that invalid information comprises: when described comparison result show not comprise in described text message described counter select word time, determine that described text message is invalid information;
When described comparison result show described text message comprise described counter select word time, determine that described text message is legal information.
4. method according to claim 1, is characterized in that, the text message of described acquisition information to be detected comprises:
Determine the position of symbol in described information to be detected;
Delete described symbol from determined position, obtain described text message.
5. the method according to Claims 1-4 any one, is characterized in that, the process of establishing in advance of many attributes dictionary comprises:
Obtain the keyword of arbitrary object to be detected;
Attributive analysis is carried out to described keyword, obtains the alternative word of described keyword and described second attribute word;
According to the described keyword obtained, determine obtained described alternative word and the described position of the second attribute word in described many attributes dictionary;
Obtained described alternative word and described second attribute word are write in determined position.
6. an information detector, is characterized in that, described device comprises:
Acquisition module, for obtaining the text message of information to be detected;
First comparing module, for described text message and the first attribute word in the many attributes dictionary set up in advance are compared, wherein said first attribute word comprises the alternative word of keyword and described keyword, and described alternative word is have same pronunciation with described keyword or comprise the word of same morpheme;
Second comparing module, for when described text message comprises described first attribute word, second attribute word of five characters before being arranged in described first attribute word in described text message and five characters after being positioned at described first attribute word and described many attributes dictionary is compared, obtain comparison result, described second attribute word is the determiner of described keyword, and described determiner is used for limiting described keyword;
Determination module, for according to described comparison result, determines whether described text message is invalid information.
7. device according to claim 6, is characterized in that, described determiner comprises key player on a team's word, and described key player on a team's word and described keyword form illegal phrase;
Described determination module is used for when described comparison result shows that described text message comprises described key player on a team's word, determines that described text message is invalid information; And during for showing when described comparison result not comprise described key player on a team's word in described text message, determine that described text message is legal information.
8. device according to claim 6, is characterized in that, described determiner comprises and instead selects word, and described anti-word and the described keyword of selecting forms legal phrase;
Described determination module be used for when described comparison result show not comprise in described text message described counter select word time, determine that described text message is invalid information; And for show when described comparison result described text message comprise described counter select word time, determine that described text message is legal information.
9. device according to claim 6, is characterized in that, described acquisition module comprises:
Determining unit, for determining the position of symbol in described information to be detected;
Delete cells, for deleting described symbol from determined position, obtains described text message.
10. the device according to claim 6 to 9 any one, is characterized in that, described device also comprises:
Keyword acquisition module, for obtaining the keyword of arbitrary object to be detected;
Analysis module, for carrying out attributive analysis to described keyword, obtains the alternative word of described keyword and described second attribute word;
Position acquisition module, for according to the described keyword obtained, determines obtained described alternative word and the described position of the second attribute word in described many attributes dictionary;
Write module, for obtained described alternative word and described second attribute word being write in determined position.
CN201410611713.1A 2014-11-04 2014-11-04 A kind of information detecting method and device Active CN104331475B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410611713.1A CN104331475B (en) 2014-11-04 2014-11-04 A kind of information detecting method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410611713.1A CN104331475B (en) 2014-11-04 2014-11-04 A kind of information detecting method and device

Publications (2)

Publication Number Publication Date
CN104331475A true CN104331475A (en) 2015-02-04
CN104331475B CN104331475B (en) 2018-03-23

Family

ID=52406202

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410611713.1A Active CN104331475B (en) 2014-11-04 2014-11-04 A kind of information detecting method and device

Country Status (1)

Country Link
CN (1) CN104331475B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018095281A1 (en) * 2016-11-25 2018-05-31 阿里巴巴集团控股有限公司 Name matching method and apparatus
CN108536859A (en) * 2018-04-18 2018-09-14 北京小度信息科技有限公司 Content authentication method, apparatus, electronic equipment and computer readable storage medium
CN109886683A (en) * 2019-02-25 2019-06-14 北京神荼科技有限公司 Monitor the method, apparatus and storage medium of block chain data
CN109933775A (en) * 2017-12-15 2019-06-25 腾讯科技(深圳)有限公司 UGC content processing method and device
CN111488738A (en) * 2019-01-25 2020-08-04 阿里巴巴集团控股有限公司 Illegal information identification method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2415062A (en) * 2004-06-08 2005-12-14 Malcolm Ripley Junk mail filter for emails based on subject field text
CN101247279A (en) * 2007-10-23 2008-08-20 北京邮电大学 Internet content safety detecting system
CN102053993A (en) * 2009-11-10 2011-05-11 阿里巴巴集团控股有限公司 Text filtering method and text filtering system
CN102779176A (en) * 2012-06-27 2012-11-14 北京奇虎科技有限公司 System and method for key word filtering

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2415062A (en) * 2004-06-08 2005-12-14 Malcolm Ripley Junk mail filter for emails based on subject field text
CN101247279A (en) * 2007-10-23 2008-08-20 北京邮电大学 Internet content safety detecting system
CN102053993A (en) * 2009-11-10 2011-05-11 阿里巴巴集团控股有限公司 Text filtering method and text filtering system
CN102779176A (en) * 2012-06-27 2012-11-14 北京奇虎科技有限公司 System and method for key word filtering

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018095281A1 (en) * 2016-11-25 2018-05-31 阿里巴巴集团控股有限公司 Name matching method and apparatus
US10726028B2 (en) 2016-11-25 2020-07-28 Alibaba Group Holding Limited Method and apparatus for matching names
CN109933775A (en) * 2017-12-15 2019-06-25 腾讯科技(深圳)有限公司 UGC content processing method and device
CN108536859A (en) * 2018-04-18 2018-09-14 北京小度信息科技有限公司 Content authentication method, apparatus, electronic equipment and computer readable storage medium
CN111488738A (en) * 2019-01-25 2020-08-04 阿里巴巴集团控股有限公司 Illegal information identification method and device
CN111488738B (en) * 2019-01-25 2023-04-28 阿里巴巴集团控股有限公司 Illegal information identification method and device
CN109886683A (en) * 2019-02-25 2019-06-14 北京神荼科技有限公司 Monitor the method, apparatus and storage medium of block chain data

Also Published As

Publication number Publication date
CN104331475B (en) 2018-03-23

Similar Documents

Publication Publication Date Title
EP2803031B1 (en) Machine-learning based classification of user accounts based on email addresses and other account information
US9594806B1 (en) Detecting name-triggering queries
CN104933352B (en) A kind of weak passwurd detection method and device
CN104331475A (en) Information detection method and device
US9519718B2 (en) Webpage information detection method and system
CN105956180B (en) A kind of filtering sensitive words method
CN104866478B (en) Malicious text detection and identification method and device
CN110569335B (en) Triple verification method and device based on artificial intelligence and storage medium
CN107577755B (en) Searching method
CN110096573B (en) Text parsing method and device
CN110727766A (en) Method for detecting sensitive words
CN107357824B (en) Information processing method, service platform and computer storage medium
CN107872323B (en) Password security evaluation method and system based on user information detection
CN106815265B (en) Method and device for searching referee document
CN104239570B (en) The searching method and device of paper
US20140230054A1 (en) System and method for estimating typicality of names and textual data
CN109492118A (en) A kind of data detection method and detection device
Wibowo et al. Comparison between fingerprint and winnowing algorithm to detect plagiarism fraud on Bahasa Indonesia documents
CN110688831A (en) Method for identifying text template of short message
CN110704719B (en) Enterprise search text word segmentation method and device
CN110162752B (en) Article judging and re-processing method and device and electronic equipment
CN108153728A (en) A kind of keyword determines method and device
CN104238951A (en) Net label input prompting device
Derungs et al. Mining nearness relations from an n-grams Web corpus in geographical space
Shnarch et al. GRASP: Rich patterns for argumentation mining

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 450000 Zhengzhou science and technology zone, Henan high tech Road, building 169, building 1, No. 1

Applicant after: ZHENGZHOU XIZHI INFORMATION TECHNOLOGY CO., LTD.

Address before: 450000 Zhengzhou science and technology zone, Henan high tech Road, building 169, building 1, No. 1

Applicant before: ZHENGZHOU XIZHI INFORMATION TECHNOLOGY CO., LTD.

COR Change of bibliographic data
GR01 Patent grant
GR01 Patent grant