CN101976231A - Network supervision method for multi-language short messages - Google Patents
Network supervision method for multi-language short messages Download PDFInfo
- Publication number
- CN101976231A CN101976231A CN2010102666235A CN201010266623A CN101976231A CN 101976231 A CN101976231 A CN 101976231A CN 2010102666235 A CN2010102666235 A CN 2010102666235A CN 201010266623 A CN201010266623 A CN 201010266623A CN 101976231 A CN101976231 A CN 101976231A
- Authority
- CN
- China
- Prior art keywords
- short message
- languages
- language
- character
- sign indicating
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Abstract
The invention discloses a network supervision method for multi-language short messages, and provides a labor-assisted method according to computer programs. The method comprises the following steps: judging the degree of similarity and goodness of fit between the transmitted language and the displayed language by matching the words of the language corresponding to a transmission code with the words in a language corpora, thereby judging the illegal short message; and finding out the relation between the transmission code and the display code and between the transmission code and display characters, and finally decoding the illegal short message, thereby achieving the goal of effectively supervising short messages the transmission code and graphemic code of which are inconsistent. The invention is suitable for departments of telecommunication, police, safety and the like to effectively supervise multi-language short messages on the short message platforms thereof, and can be used for countering terrorism, fighting against violent crime, checking and plugging harmful information, and the like. The technology can be used for supervision on communication contents on the Internet and the like.
Description
Technical field:
The present invention relates to a kind of telecom operators or departments such as public security, safety network monitoring management method, particularly relate to a kind of network supervision method of multilingual short message short message.
Background technology:
The transmission of mobile phone short message be telecom operators on its SMS platform, come transmitting the character transmission of encoding according to the agreement of agreement and unified coding rule.According to CMPT agreement and unified coding standard short message is encoded and send as China's telecommunication department, be referred to as to transmit sign indicating number here.The user writes on mobile phone terminal or watches that short message then is based on the corresponding coding (internal code) of each character, and also corresponding simultaneously specific graphemic code is referred to as to show sign indicating number here.For the unification that information is shown, internal code has the world or national standard.But for identical internal code, different demonstration fonts then can be made by mobile phone manufacturer or mobile phone research and development company, but is the outward appearance of character b with the internal code demonstration of character a promptly.It is inconsistent promptly to transmit the corresponding content of the sign indicating number content shown with showing sign indicating number.As, what transmit that sign indicating number transmits is the unicode sign indicating number " 0414 " of Russion letter " Д ", and in fact writes on mobile phone or the graphemic code watched is English " a ".Same reason, writer's mobile phone of note on send and that show on recipient's mobile phone is " backfire ", be " exertion " and the transmission sign indicating number of this character string shows in telecommunications supervision department.
When terrorist or other offenders make corresponding different character with the demonstration sign indicating number of certain mobile phone with the transmission sign indicating number of telecommunication department, will make the mobile phone short message that this kind demonstration sign indicating number is housed become the instrument of communicating with code telegram, mobile phone will become the terrorist and the unprincipled fellow issues command destruction, gets in touch with, spreads rumours to confuse people, propagates the instrument of criminal activities such as obscene information.The content of institute's communication is only known with sender and recipient, and departments such as telecommunications, public security, safety can't implement effective supervision to this under the condition at present.
The monitoring and managing method of existing short message mainly is to utilize searching of responsive words that harmful short message is supervised, and this can play a role to transmitting a sign indicating number note consistent with graphemic code.But what show for the internal code of above-mentioned character a but is the short message of the outward appearance of character b, does not still have the way of solution at present.
Summary of the invention:
Purpose of the present invention is at above-mentioned defective, a kind of method according to computer program and indirect labor is proposed, to word that sends the pairing languages of sign indicating number and the technology that the word in this languages corpus mates, judge the similarity and the goodness of fit that send languages and show languages, judge illegal short message.And then, find out and transmit sign indicating number and the corresponding relation that shows sign indicating number and character display, finally crack illegal note.To reach to transmitting the purpose that sign indicating number and the inconsistent short message of graphemic code are effectively supervised.
The main points that the present invention solves the method that its technical matters adopts are:
One, utilizes computer program and indirect labor's means, handle by following process;
A, judge the languages of the character correspondence of short message according to the languages under the character that transmits in the sign indicating number corresponding codes character set, when related languages number surpasses certain value, this short message can be considered as suspicious short message, adopt artificial method identification again, shielding or deletion;
B, according to the languages of short message, judge to transmit the coding (abbreviation symbolic coding) of punctuation marks such as the comma that whether has space or line feed and these languages in the sign indicating number, fullstop, question mark, as there is not a symbolic coding, and the length of short message surpasses certain number of characters, this short message can be considered as suspicious short message, adopt artificial method identification again, shielding or deletion;
C, will transmit a sign indicating number corresponding characters string encoding with symbolic coding and divide into groups, the character string after the intercepted packet,
C1: the word in the responsive vocabulary of this languages of character string after the intercepted packet and setting (the responsive word here be meant comprise violent crime, kill a person, set fire, plunder, instigate, word such as the salaciousness) corpus is contrasted, when each character string after the grouping and the responsive word goodness of fit in the corpus or similarity during greater than certain numerical value, this short message can be considered as suspicious short message, adopt artificial method identification again, shielding or deletion;
C2: the word in the corpus of the Regular History Frequency speech of this languages of character string after the intercepted packet and setting is contrasted, when each character string after the grouping and the word goodness of fit in the corpus or similarity during less than certain numerical value, this short message can be considered as suspicious short message, adopt artificial method identification again, shielding or deletion; D, to suspicious short message, each character that each is transmitted the languages that sign indicating number may relate to the suspicious short message breath successively is corresponding successively respectively, carry out permutation and combination, word in the corpus of the responsive vocabulary corpus of the alphabet string after the permutation and combination and these languages and Regular History Frequency speech is contrasted, when the goodness of fit or similarity during greater than certain numerical value, can find out transmission sign indicating number and the corresponding relation that shows between the demonstration of sign indicating number and real character, and then crack illegal note.
Two, do not have symbolic coding for the short message intercharacter, the short message that its length does not surpass certain number of characters again adopts above-mentioned step c, d to handle;
Three, for the agglutinative language family of languages, as Arabic, Uighur, Turkish, Urdu, Iranian and the inflexional language family of languages, as Russian, German etc., word in the corpus that relates to can be stem or root, and the character string that is intercepted can be some positions of the character string front after the grouping;
Four, the described multilingual languages such as Chinese, English, German, Russian, French, Portuguese, Spanish, Arabic, Uighur, Turkish, Urdu, Iranian, Pushtu, Japanese, Korean that comprise;
Five, after the note of judging transmission belongs to illegal note, when supervision department shields deletion to note, utilize the fixed point function, lock the sending zone of illegal short message, with unprincipled fellow's rope and method.
Six, as sender and recipient (particularly at the bulk SMS breath) when being national A, and the short message that sends and receive relates to is the language of national B or the language of national B and national C, this short message can be considered as suspicious short message, adopt artificial method identification again, shielding or deletion;
Seven, for the Latin alphabet spelling Uighur time, the high frequency words and the responsive vocabulary of Uighur are spelt with the Latin alphabet, one handle as stated above again; Language to other the type also can be handled equally;
Eight, be Chinese or Korean for the character string that will transmit with symbolic coding after sign indicating number corresponding characters string encoding divides into groups, adopt existing participle technique, sentence is decomposed into phrase and individual character, individual character commonly used is considered as phrase, handle according to said method one, when judging that the short message word does not meet Modern Chinese or Korean custom, can be considered as suspicious short message with this short message.
Nine, the language of spelling with Arabic alphabet for Uighur, Kazak, Kirgiz language etc., when the character unicode that uses other languages does the transmission sign indicating number, owing to there is the distortion of character, the number of characters of its required normal demonstration is above 120, and 120 characters relate to multilingual coding at least, when related languages number surpasses certain value, this short message can be considered as suspicious short message;
The invention has the beneficial effects as follows; Compare with existing short message monitoring and managing method, have following advantage; It has overcome existing short message monitoring and managing method can't effectively supervise defective to transmitting sign indicating number and the inconsistent short message of graphemic code.
The present invention is applicable to that departments such as telecommunications, public security, safety effectively supervise multilingual short message on its SMS platform, can be used for anti-terrorism, hit violent crime, harmful information the aspect such as find out and stop.This technology also can be used for the Content of communciation on the internet such as is supervised at the aspect.
Embodiment:
Embodiment: the invention main points have promptly illustrated concrete embodiment.
Claims (9)
1. the network supervision method of a multilingual short message is characterized in that: utilizes computer program and indirect labor's means, handles by following process,
A; Judge according to the languages under the character that transmits in the sign indicating number corresponding codes character set and the languages of the character correspondence of short message when related languages number surpasses certain value, this short message can be considered as suspicious short message;
B: according to the languages of short message, judge to transmit the coding (abbreviation symbolic coding) of punctuation marks such as the comma that whether has space or line feed and these languages in the sign indicating number, fullstop, question mark, as there is not a symbolic coding, and the length of short message surpasses certain number of characters, this short message can be considered as suspicious short message;
C: will transmit sign indicating number corresponding characters string encoding with symbolic coding and divide into groups, the character string after the intercepted packet,
C1: the word in the responsive vocabulary of this languages of character string after the intercepted packet and setting (the responsive word here be meant comprise violent crime, kill a person, set fire, plunder, instigate, word such as the salaciousness) corpus is contrasted, when each character string and the responsive word goodness of fit in the corpus after the grouping or similarity during, get final product this short message is considered as suspicious short message greater than certain numerical value;
C2: the word in the corpus of the Regular History Frequency speech of this languages of character string after the intercepted packet and setting is contrasted, when each character string and the word goodness of fit in the corpus after the grouping or similarity during, get final product this short message is considered as suspicious short message less than certain numerical value;
D: for suspicious short message, adopt artificial method identification again, shield or deletion in accordance with the law.
2. method according to claim 1, it is characterized in that, to suspicious short message, utilize computer program, each character that each is transmitted the languages that sign indicating number may relate to the suspicious short message breath successively is corresponding successively respectively, carry out permutation and combination, word in the corpus of the responsive vocabulary corpus of the alphabet string after the permutation and combination and these languages and Regular History Frequency speech is contrasted, when the goodness of fit or similarity during greater than certain numerical value, can find out transmission sign indicating number and the corresponding relation that shows between the demonstration of sign indicating number and real character, and then crack illegal note.
3. method according to claim 1 is characterized in that, does not have symbolic coding for the short message intercharacter, and the short message that its length does not surpass certain number of characters again adopts above-mentioned step c, d to handle.
4. method according to claim 1 is characterized in that,
A: for the agglutinative language family of languages, as Arabic, Uighur, Turkish, Urdu, Iranian and the inflexional language family of languages, as Russian, German etc., the word in the corpus that relates to can be stem or root, and the character string that is intercepted can be some positions of the character string front after the grouping;
B: the described multilingual languages such as Chinese, English, German, Russian, French, Portuguese, Spanish, Arabic, Uighur, Turkish, Urdu, Iranian, Pushtu, Japanese, Korean that comprise.
5. method according to claim 1 is characterized in that, after the note of judging transmission belongs to illegal note, when supervision department shields deletion to note, utilizes the fixed point function, locks the sending zone of illegal short message, with unprincipled fellow's rope and method.
6. method according to claim 1, it is characterized in that, as sender and recipient (particularly at the bulk SMS breath) when being national A, and the short message that sends and receive relates to is the language of national B or the language of national B and national C, this short message can be considered as suspicious short message, adopt artificial method identification, shielding or deletion again.
7. method according to claim 1 is characterized in that, for the Latin alphabet spelling Uighur time, the high frequency words and the responsive vocabulary of Uighur is spelt with the Latin alphabet, one handles as stated above again; Language to other the type also can be handled equally.
8. method according to claim 1, it is characterized in that, for the character string that will transmit with symbolic coding after sign indicating number corresponding characters string encoding divides into groups is Chinese or Korean, adopt existing participle technique, sentence is decomposed into phrase and individual character, individual character commonly used is considered as phrase, handle according to said method one, when judging that the short message word does not meet Modern Chinese or Korean custom, can be considered as suspicious short message with this short message.
9. method according to claim 1, it is characterized in that, the language of spelling with Arabic alphabet for Uighur, Kazak, Kirgiz language etc., when the character unicode that uses other languages does the transmission sign indicating number, owing to there is the distortion of character, the number of characters of its required normal demonstration surpasses 120, and 120 characters relate to multilingual coding at least, when related languages number surpasses certain value, this short message can be considered as suspicious short message.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2010102666235A CN101976231A (en) | 2010-08-25 | 2010-08-25 | Network supervision method for multi-language short messages |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2010102666235A CN101976231A (en) | 2010-08-25 | 2010-08-25 | Network supervision method for multi-language short messages |
Publications (1)
Publication Number | Publication Date |
---|---|
CN101976231A true CN101976231A (en) | 2011-02-16 |
Family
ID=43576117
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2010102666235A Pending CN101976231A (en) | 2010-08-25 | 2010-08-25 | Network supervision method for multi-language short messages |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN101976231A (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102694673A (en) * | 2011-03-25 | 2012-09-26 | 腾讯科技(深圳)有限公司 | Network speech monitoring method, equipment and system thereof |
CN104317847A (en) * | 2014-10-13 | 2015-01-28 | 孙伟力 | Method and system for identifying languages in network text information |
CN106156017A (en) * | 2015-03-23 | 2016-11-23 | 北大方正集团有限公司 | Information identifying method and information identification system |
CN106211165A (en) * | 2016-06-14 | 2016-12-07 | 北京奇虎科技有限公司 | The detection foreign language harassing and wrecking method of note, device and corresponding client |
CN106528536A (en) * | 2016-11-14 | 2017-03-22 | 北京赛思信安技术股份有限公司 | Multilingual word segmentation method based on dictionaries and grammar analysis |
CN107613474A (en) * | 2017-09-22 | 2018-01-19 | 刘三满 | A kind of method of SMS network supervision |
CN109740369A (en) * | 2018-12-07 | 2019-05-10 | 中国联合网络通信集团有限公司 | A kind of detection method and device of information steganography |
CN114663246A (en) * | 2022-05-24 | 2022-06-24 | 中国电子科技集团公司第三十研究所 | Representation modeling method of information product in propagation simulation and multi-agent simulation method |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101150756A (en) * | 2007-11-08 | 2008-03-26 | 电子科技大学 | A spam filtering method |
-
2010
- 2010-08-25 CN CN2010102666235A patent/CN101976231A/en active Pending
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101150756A (en) * | 2007-11-08 | 2008-03-26 | 电子科技大学 | A spam filtering method |
Non-Patent Citations (2)
Title |
---|
冯冲 等: "基于字符层马尔科夫模型的多语种识别", 《计算机科学》, vol. 33, no. 1, 31 January 2006 (2006-01-31) * |
黄文良: "垃圾短信过滤关键技术研究", 《中国博士学位论文全文数据库 信息科技辑》, 15 July 2009 (2009-07-15) * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102694673A (en) * | 2011-03-25 | 2012-09-26 | 腾讯科技(深圳)有限公司 | Network speech monitoring method, equipment and system thereof |
CN104317847A (en) * | 2014-10-13 | 2015-01-28 | 孙伟力 | Method and system for identifying languages in network text information |
CN106156017A (en) * | 2015-03-23 | 2016-11-23 | 北大方正集团有限公司 | Information identifying method and information identification system |
CN106211165A (en) * | 2016-06-14 | 2016-12-07 | 北京奇虎科技有限公司 | The detection foreign language harassing and wrecking method of note, device and corresponding client |
CN106211165B (en) * | 2016-06-14 | 2020-04-21 | 北京奇虎科技有限公司 | Method and device for detecting foreign language harassment short message and corresponding client |
CN106528536A (en) * | 2016-11-14 | 2017-03-22 | 北京赛思信安技术股份有限公司 | Multilingual word segmentation method based on dictionaries and grammar analysis |
CN107613474A (en) * | 2017-09-22 | 2018-01-19 | 刘三满 | A kind of method of SMS network supervision |
CN109740369A (en) * | 2018-12-07 | 2019-05-10 | 中国联合网络通信集团有限公司 | A kind of detection method and device of information steganography |
CN114663246A (en) * | 2022-05-24 | 2022-06-24 | 中国电子科技集团公司第三十研究所 | Representation modeling method of information product in propagation simulation and multi-agent simulation method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN101976231A (en) | Network supervision method for multi-language short messages | |
CN100478953C (en) | Static feature based web page malicious scenarios detection method | |
Sawa et al. | Detection of social engineering attacks through natural language processing of conversations | |
CN102833269B (en) | The detection method of cross-site attack, device and there is the fire compartment wall of this device | |
CN104361097A (en) | Real-time detection method for electric power sensitive mail based on multimode matching | |
CN104640116B (en) | A kind of fraud text message means of defence and communication terminal | |
CN101141322A (en) | Multi-computer switch system capable of detecting keyword input and method thereof | |
US20110047265A1 (en) | Computer Implemented Method for Identifying Risk Levels for Minors | |
Wang et al. | A brief report and analysis on the July 19, 2019, explosion in the Yima gasification plant in Sanmenxia, China | |
CN107508834A (en) | A kind of Information Authentication method and electronic equipment | |
CN105183181A (en) | Input interaction control method | |
CN104850789A (en) | Remote code injection vulnerability detection method based on Web browser helper object | |
Sparks et al. | Sentiment monitoring of social media from Oceania | |
CN113709145A (en) | Vulnerability verification system based on POC (point-of-sale) verification engine | |
CN107613474A (en) | A kind of method of SMS network supervision | |
CN203745509U (en) | Anti-electricity-stealing device for electric energy meter | |
Мартин et al. | EMULATOR OF ANALYSIS OF BOMBSHELTERS | |
CN204659654U (en) | A kind of window breaker | |
Tan Bhowmik | A Multi-Modal Wildfire Prediction and Personalized Early-Warning System Based on a Novel Machine Learning Framework | |
Crawford-Brown et al. | Comparing National Regulatory Processes for Safe Drinking Water | |
Johansen | Taking stock of regularity theories of causation | |
CN103051590A (en) | Method for safe use of important module in software system | |
Carpenter et al. | " Pounding A Dendritic Peg into a Square Hole"-National Weather Service Impacts Based Decision Support Services role in Federal Agency Led Incident Response | |
Mostafazadeh Davani et al. | Reporting the Unreported: Event Extraction for Analyzing the Local Representation of Hate Crimes | |
Fletcher et al. | Fermi GBM Sub-Threshold Detection of GRB 211207A |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C02 | Deemed withdrawal of patent application after publication (patent law 2001) | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20110216 |