CN101976231A - Network supervision method for multi-language short messages - Google Patents

Network supervision method for multi-language short messages Download PDF

Info

Publication number
CN101976231A
CN101976231A CN2010102666235A CN201010266623A CN101976231A CN 101976231 A CN101976231 A CN 101976231A CN 2010102666235 A CN2010102666235 A CN 2010102666235A CN 201010266623 A CN201010266623 A CN 201010266623A CN 101976231 A CN101976231 A CN 101976231A
Authority
CN
China
Prior art keywords
short message
languages
language
character
sign indicating
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2010102666235A
Other languages
Chinese (zh)
Inventor
孙强国
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN2010102666235A priority Critical patent/CN101976231A/en
Publication of CN101976231A publication Critical patent/CN101976231A/en
Pending legal-status Critical Current

Links

Abstract

The invention discloses a network supervision method for multi-language short messages, and provides a labor-assisted method according to computer programs. The method comprises the following steps: judging the degree of similarity and goodness of fit between the transmitted language and the displayed language by matching the words of the language corresponding to a transmission code with the words in a language corpora, thereby judging the illegal short message; and finding out the relation between the transmission code and the display code and between the transmission code and display characters, and finally decoding the illegal short message, thereby achieving the goal of effectively supervising short messages the transmission code and graphemic code of which are inconsistent. The invention is suitable for departments of telecommunication, police, safety and the like to effectively supervise multi-language short messages on the short message platforms thereof, and can be used for countering terrorism, fighting against violent crime, checking and plugging harmful information, and the like. The technology can be used for supervision on communication contents on the Internet and the like.

Description

A kind of network supervision method of multilingual short message
Technical field:
The present invention relates to a kind of telecom operators or departments such as public security, safety network monitoring management method, particularly relate to a kind of network supervision method of multilingual short message short message.
Background technology:
The transmission of mobile phone short message be telecom operators on its SMS platform, come transmitting the character transmission of encoding according to the agreement of agreement and unified coding rule.According to CMPT agreement and unified coding standard short message is encoded and send as China's telecommunication department, be referred to as to transmit sign indicating number here.The user writes on mobile phone terminal or watches that short message then is based on the corresponding coding (internal code) of each character, and also corresponding simultaneously specific graphemic code is referred to as to show sign indicating number here.For the unification that information is shown, internal code has the world or national standard.But for identical internal code, different demonstration fonts then can be made by mobile phone manufacturer or mobile phone research and development company, but is the outward appearance of character b with the internal code demonstration of character a promptly.It is inconsistent promptly to transmit the corresponding content of the sign indicating number content shown with showing sign indicating number.As, what transmit that sign indicating number transmits is the unicode sign indicating number " 0414 " of Russion letter " Д ", and in fact writes on mobile phone or the graphemic code watched is English " a ".Same reason, writer's mobile phone of note on send and that show on recipient's mobile phone is " backfire ", be " exertion " and the transmission sign indicating number of this character string shows in telecommunications supervision department.
When terrorist or other offenders make corresponding different character with the demonstration sign indicating number of certain mobile phone with the transmission sign indicating number of telecommunication department, will make the mobile phone short message that this kind demonstration sign indicating number is housed become the instrument of communicating with code telegram, mobile phone will become the terrorist and the unprincipled fellow issues command destruction, gets in touch with, spreads rumours to confuse people, propagates the instrument of criminal activities such as obscene information.The content of institute's communication is only known with sender and recipient, and departments such as telecommunications, public security, safety can't implement effective supervision to this under the condition at present.
The monitoring and managing method of existing short message mainly is to utilize searching of responsive words that harmful short message is supervised, and this can play a role to transmitting a sign indicating number note consistent with graphemic code.But what show for the internal code of above-mentioned character a but is the short message of the outward appearance of character b, does not still have the way of solution at present.
Summary of the invention:
Purpose of the present invention is at above-mentioned defective, a kind of method according to computer program and indirect labor is proposed, to word that sends the pairing languages of sign indicating number and the technology that the word in this languages corpus mates, judge the similarity and the goodness of fit that send languages and show languages, judge illegal short message.And then, find out and transmit sign indicating number and the corresponding relation that shows sign indicating number and character display, finally crack illegal note.To reach to transmitting the purpose that sign indicating number and the inconsistent short message of graphemic code are effectively supervised.
The main points that the present invention solves the method that its technical matters adopts are:
One, utilizes computer program and indirect labor's means, handle by following process;
A, judge the languages of the character correspondence of short message according to the languages under the character that transmits in the sign indicating number corresponding codes character set, when related languages number surpasses certain value, this short message can be considered as suspicious short message, adopt artificial method identification again, shielding or deletion;
B, according to the languages of short message, judge to transmit the coding (abbreviation symbolic coding) of punctuation marks such as the comma that whether has space or line feed and these languages in the sign indicating number, fullstop, question mark, as there is not a symbolic coding, and the length of short message surpasses certain number of characters, this short message can be considered as suspicious short message, adopt artificial method identification again, shielding or deletion;
C, will transmit a sign indicating number corresponding characters string encoding with symbolic coding and divide into groups, the character string after the intercepted packet,
C1: the word in the responsive vocabulary of this languages of character string after the intercepted packet and setting (the responsive word here be meant comprise violent crime, kill a person, set fire, plunder, instigate, word such as the salaciousness) corpus is contrasted, when each character string after the grouping and the responsive word goodness of fit in the corpus or similarity during greater than certain numerical value, this short message can be considered as suspicious short message, adopt artificial method identification again, shielding or deletion;
C2: the word in the corpus of the Regular History Frequency speech of this languages of character string after the intercepted packet and setting is contrasted, when each character string after the grouping and the word goodness of fit in the corpus or similarity during less than certain numerical value, this short message can be considered as suspicious short message, adopt artificial method identification again, shielding or deletion; D, to suspicious short message, each character that each is transmitted the languages that sign indicating number may relate to the suspicious short message breath successively is corresponding successively respectively, carry out permutation and combination, word in the corpus of the responsive vocabulary corpus of the alphabet string after the permutation and combination and these languages and Regular History Frequency speech is contrasted, when the goodness of fit or similarity during greater than certain numerical value, can find out transmission sign indicating number and the corresponding relation that shows between the demonstration of sign indicating number and real character, and then crack illegal note.
Two, do not have symbolic coding for the short message intercharacter, the short message that its length does not surpass certain number of characters again adopts above-mentioned step c, d to handle;
Three, for the agglutinative language family of languages, as Arabic, Uighur, Turkish, Urdu, Iranian and the inflexional language family of languages, as Russian, German etc., word in the corpus that relates to can be stem or root, and the character string that is intercepted can be some positions of the character string front after the grouping;
Four, the described multilingual languages such as Chinese, English, German, Russian, French, Portuguese, Spanish, Arabic, Uighur, Turkish, Urdu, Iranian, Pushtu, Japanese, Korean that comprise;
Five, after the note of judging transmission belongs to illegal note, when supervision department shields deletion to note, utilize the fixed point function, lock the sending zone of illegal short message, with unprincipled fellow's rope and method.
Six, as sender and recipient (particularly at the bulk SMS breath) when being national A, and the short message that sends and receive relates to is the language of national B or the language of national B and national C, this short message can be considered as suspicious short message, adopt artificial method identification again, shielding or deletion;
Seven, for the Latin alphabet spelling Uighur time, the high frequency words and the responsive vocabulary of Uighur are spelt with the Latin alphabet, one handle as stated above again; Language to other the type also can be handled equally;
Eight, be Chinese or Korean for the character string that will transmit with symbolic coding after sign indicating number corresponding characters string encoding divides into groups, adopt existing participle technique, sentence is decomposed into phrase and individual character, individual character commonly used is considered as phrase, handle according to said method one, when judging that the short message word does not meet Modern Chinese or Korean custom, can be considered as suspicious short message with this short message.
Nine, the language of spelling with Arabic alphabet for Uighur, Kazak, Kirgiz language etc., when the character unicode that uses other languages does the transmission sign indicating number, owing to there is the distortion of character, the number of characters of its required normal demonstration is above 120, and 120 characters relate to multilingual coding at least, when related languages number surpasses certain value, this short message can be considered as suspicious short message;
The invention has the beneficial effects as follows; Compare with existing short message monitoring and managing method, have following advantage; It has overcome existing short message monitoring and managing method can't effectively supervise defective to transmitting sign indicating number and the inconsistent short message of graphemic code.
The present invention is applicable to that departments such as telecommunications, public security, safety effectively supervise multilingual short message on its SMS platform, can be used for anti-terrorism, hit violent crime, harmful information the aspect such as find out and stop.This technology also can be used for the Content of communciation on the internet such as is supervised at the aspect.
Embodiment:
Embodiment: the invention main points have promptly illustrated concrete embodiment.

Claims (9)

1. the network supervision method of a multilingual short message is characterized in that: utilizes computer program and indirect labor's means, handles by following process,
A; Judge according to the languages under the character that transmits in the sign indicating number corresponding codes character set and the languages of the character correspondence of short message when related languages number surpasses certain value, this short message can be considered as suspicious short message;
B: according to the languages of short message, judge to transmit the coding (abbreviation symbolic coding) of punctuation marks such as the comma that whether has space or line feed and these languages in the sign indicating number, fullstop, question mark, as there is not a symbolic coding, and the length of short message surpasses certain number of characters, this short message can be considered as suspicious short message;
C: will transmit sign indicating number corresponding characters string encoding with symbolic coding and divide into groups, the character string after the intercepted packet,
C1: the word in the responsive vocabulary of this languages of character string after the intercepted packet and setting (the responsive word here be meant comprise violent crime, kill a person, set fire, plunder, instigate, word such as the salaciousness) corpus is contrasted, when each character string and the responsive word goodness of fit in the corpus after the grouping or similarity during, get final product this short message is considered as suspicious short message greater than certain numerical value;
C2: the word in the corpus of the Regular History Frequency speech of this languages of character string after the intercepted packet and setting is contrasted, when each character string and the word goodness of fit in the corpus after the grouping or similarity during, get final product this short message is considered as suspicious short message less than certain numerical value;
D: for suspicious short message, adopt artificial method identification again, shield or deletion in accordance with the law.
2. method according to claim 1, it is characterized in that, to suspicious short message, utilize computer program, each character that each is transmitted the languages that sign indicating number may relate to the suspicious short message breath successively is corresponding successively respectively, carry out permutation and combination, word in the corpus of the responsive vocabulary corpus of the alphabet string after the permutation and combination and these languages and Regular History Frequency speech is contrasted, when the goodness of fit or similarity during greater than certain numerical value, can find out transmission sign indicating number and the corresponding relation that shows between the demonstration of sign indicating number and real character, and then crack illegal note.
3. method according to claim 1 is characterized in that, does not have symbolic coding for the short message intercharacter, and the short message that its length does not surpass certain number of characters again adopts above-mentioned step c, d to handle.
4. method according to claim 1 is characterized in that,
A: for the agglutinative language family of languages, as Arabic, Uighur, Turkish, Urdu, Iranian and the inflexional language family of languages, as Russian, German etc., the word in the corpus that relates to can be stem or root, and the character string that is intercepted can be some positions of the character string front after the grouping;
B: the described multilingual languages such as Chinese, English, German, Russian, French, Portuguese, Spanish, Arabic, Uighur, Turkish, Urdu, Iranian, Pushtu, Japanese, Korean that comprise.
5. method according to claim 1 is characterized in that, after the note of judging transmission belongs to illegal note, when supervision department shields deletion to note, utilizes the fixed point function, locks the sending zone of illegal short message, with unprincipled fellow's rope and method.
6. method according to claim 1, it is characterized in that, as sender and recipient (particularly at the bulk SMS breath) when being national A, and the short message that sends and receive relates to is the language of national B or the language of national B and national C, this short message can be considered as suspicious short message, adopt artificial method identification, shielding or deletion again.
7. method according to claim 1 is characterized in that, for the Latin alphabet spelling Uighur time, the high frequency words and the responsive vocabulary of Uighur is spelt with the Latin alphabet, one handles as stated above again; Language to other the type also can be handled equally.
8. method according to claim 1, it is characterized in that, for the character string that will transmit with symbolic coding after sign indicating number corresponding characters string encoding divides into groups is Chinese or Korean, adopt existing participle technique, sentence is decomposed into phrase and individual character, individual character commonly used is considered as phrase, handle according to said method one, when judging that the short message word does not meet Modern Chinese or Korean custom, can be considered as suspicious short message with this short message.
9. method according to claim 1, it is characterized in that, the language of spelling with Arabic alphabet for Uighur, Kazak, Kirgiz language etc., when the character unicode that uses other languages does the transmission sign indicating number, owing to there is the distortion of character, the number of characters of its required normal demonstration surpasses 120, and 120 characters relate to multilingual coding at least, when related languages number surpasses certain value, this short message can be considered as suspicious short message.
CN2010102666235A 2010-08-25 2010-08-25 Network supervision method for multi-language short messages Pending CN101976231A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2010102666235A CN101976231A (en) 2010-08-25 2010-08-25 Network supervision method for multi-language short messages

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2010102666235A CN101976231A (en) 2010-08-25 2010-08-25 Network supervision method for multi-language short messages

Publications (1)

Publication Number Publication Date
CN101976231A true CN101976231A (en) 2011-02-16

Family

ID=43576117

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2010102666235A Pending CN101976231A (en) 2010-08-25 2010-08-25 Network supervision method for multi-language short messages

Country Status (1)

Country Link
CN (1) CN101976231A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102694673A (en) * 2011-03-25 2012-09-26 腾讯科技(深圳)有限公司 Network speech monitoring method, equipment and system thereof
CN104317847A (en) * 2014-10-13 2015-01-28 孙伟力 Method and system for identifying languages in network text information
CN106156017A (en) * 2015-03-23 2016-11-23 北大方正集团有限公司 Information identifying method and information identification system
CN106211165A (en) * 2016-06-14 2016-12-07 北京奇虎科技有限公司 The detection foreign language harassing and wrecking method of note, device and corresponding client
CN106528536A (en) * 2016-11-14 2017-03-22 北京赛思信安技术股份有限公司 Multilingual word segmentation method based on dictionaries and grammar analysis
CN107613474A (en) * 2017-09-22 2018-01-19 刘三满 A kind of method of SMS network supervision
CN109740369A (en) * 2018-12-07 2019-05-10 中国联合网络通信集团有限公司 A kind of detection method and device of information steganography
CN114663246A (en) * 2022-05-24 2022-06-24 中国电子科技集团公司第三十研究所 Representation modeling method of information product in propagation simulation and multi-agent simulation method

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101150756A (en) * 2007-11-08 2008-03-26 电子科技大学 A spam filtering method

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101150756A (en) * 2007-11-08 2008-03-26 电子科技大学 A spam filtering method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
冯冲 等: "基于字符层马尔科夫模型的多语种识别", 《计算机科学》, vol. 33, no. 1, 31 January 2006 (2006-01-31) *
黄文良: "垃圾短信过滤关键技术研究", 《中国博士学位论文全文数据库 信息科技辑》, 15 July 2009 (2009-07-15) *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102694673A (en) * 2011-03-25 2012-09-26 腾讯科技(深圳)有限公司 Network speech monitoring method, equipment and system thereof
CN104317847A (en) * 2014-10-13 2015-01-28 孙伟力 Method and system for identifying languages in network text information
CN106156017A (en) * 2015-03-23 2016-11-23 北大方正集团有限公司 Information identifying method and information identification system
CN106211165A (en) * 2016-06-14 2016-12-07 北京奇虎科技有限公司 The detection foreign language harassing and wrecking method of note, device and corresponding client
CN106211165B (en) * 2016-06-14 2020-04-21 北京奇虎科技有限公司 Method and device for detecting foreign language harassment short message and corresponding client
CN106528536A (en) * 2016-11-14 2017-03-22 北京赛思信安技术股份有限公司 Multilingual word segmentation method based on dictionaries and grammar analysis
CN107613474A (en) * 2017-09-22 2018-01-19 刘三满 A kind of method of SMS network supervision
CN109740369A (en) * 2018-12-07 2019-05-10 中国联合网络通信集团有限公司 A kind of detection method and device of information steganography
CN114663246A (en) * 2022-05-24 2022-06-24 中国电子科技集团公司第三十研究所 Representation modeling method of information product in propagation simulation and multi-agent simulation method

Similar Documents

Publication Publication Date Title
CN101976231A (en) Network supervision method for multi-language short messages
CN100478953C (en) Static feature based web page malicious scenarios detection method
Sawa et al. Detection of social engineering attacks through natural language processing of conversations
CN102833269B (en) The detection method of cross-site attack, device and there is the fire compartment wall of this device
CN104361097A (en) Real-time detection method for electric power sensitive mail based on multimode matching
CN104640116B (en) A kind of fraud text message means of defence and communication terminal
CN101141322A (en) Multi-computer switch system capable of detecting keyword input and method thereof
US20110047265A1 (en) Computer Implemented Method for Identifying Risk Levels for Minors
Wang et al. A brief report and analysis on the July 19, 2019, explosion in the Yima gasification plant in Sanmenxia, China
CN107508834A (en) A kind of Information Authentication method and electronic equipment
CN105183181A (en) Input interaction control method
CN104850789A (en) Remote code injection vulnerability detection method based on Web browser helper object
Sparks et al. Sentiment monitoring of social media from Oceania
CN113709145A (en) Vulnerability verification system based on POC (point-of-sale) verification engine
CN107613474A (en) A kind of method of SMS network supervision
CN203745509U (en) Anti-electricity-stealing device for electric energy meter
Мартин et al. EMULATOR OF ANALYSIS OF BOMBSHELTERS
CN204659654U (en) A kind of window breaker
Tan Bhowmik A Multi-Modal Wildfire Prediction and Personalized Early-Warning System Based on a Novel Machine Learning Framework
Crawford-Brown et al. Comparing National Regulatory Processes for Safe Drinking Water
Johansen Taking stock of regularity theories of causation
CN103051590A (en) Method for safe use of important module in software system
Carpenter et al. " Pounding A Dendritic Peg into a Square Hole"-National Weather Service Impacts Based Decision Support Services role in Federal Agency Led Incident Response
Mostafazadeh Davani et al. Reporting the Unreported: Event Extraction for Analyzing the Local Representation of Hate Crimes
Fletcher et al. Fermi GBM Sub-Threshold Detection of GRB 211207A

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20110216