CN107239504A - A kind of deep learning algorithm for being used to recognize fraud text message - Google Patents

A kind of deep learning algorithm for being used to recognize fraud text message Download PDF

Info

Publication number
CN107239504A
CN107239504A CN201710327007.8A CN201710327007A CN107239504A CN 107239504 A CN107239504 A CN 107239504A CN 201710327007 A CN201710327007 A CN 201710327007A CN 107239504 A CN107239504 A CN 107239504A
Authority
CN
China
Prior art keywords
deep learning
short message
module
url
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710327007.8A
Other languages
Chinese (zh)
Inventor
邹福泰
张成伟
王祺文
俞汤达
张哲迪
李林森
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN201710327007.8A priority Critical patent/CN107239504A/en
Publication of CN107239504A publication Critical patent/CN107239504A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W12/00Security arrangements; Authentication; Protecting privacy or anonymity
    • H04W12/12Detection or prevention of fraud
    • H04W12/128Anti-malware arrangements, e.g. protection against SMS fraud or mobile malware
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W12/00Security arrangements; Authentication; Protecting privacy or anonymity
    • H04W12/12Detection or prevention of fraud
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/12Messaging; Mailboxes; Announcements
    • H04W4/14Short messaging services, e.g. short message services [SMS] or unstructured supplementary service data [USSD]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention provides a kind of deep learning algorithm for being used to recognize fraud text message, it is related to information security field, including deep learning module, interactive module, pretreatment module and comparison module, the interactive module is by the short message sending of acquisition to the pretreatment module, the pretreatment module extracts the characteristic vector of short message, the deep learning module forms deep learning model according to sample set, the comparison module compares the characteristic vector and the deep learning model of short message, the comparison module sends comparative result to the interactive module, comparative result is fed back to user by the interactive module, the short message includes short message text and/or URL.The present invention has high expansibility, by by algoritic module, to the processing of each several part neural network moduleization;Algorithm is reliable, using deep learning algorithm, automatic study sentence feature, and potential feature can be more excavated compared to shallow-layer algorithm.

Description

A kind of deep learning algorithm for being used to recognize fraud text message
Technical field
The present invention relates to information security field, more particularly, to a kind of deep learning algorithm for being used to recognize fraud text message.
Background technology
Mobile phone has been the indispensable instrument of people's daily life, is unequal in this case using the thing number of short message fraud Number, and also have the impetus further expanded.
Currently for fraud text message, domestic well-known safe mobile phone manufacturer is substantially simple right using database progress Than, or employ simple machine learning algorithm to recognize fraud text message, it there is no the method using deep learning To recognize fraud text message, and current preventing mobile phone swindle algorithm is simple, can not be effectively protected user.
The content of the invention
In view of the drawbacks described above of prior art, the technical problems to be solved by the invention are to provide a kind of with high expansion Property, algoritic module, can more excavate potential feature be used for recognize the deep learning algorithm of fraud text message.
The invention provides a kind of deep learning algorithm for being used to recognize fraud text message, including deep learning module, interaction Module, pretreatment module and comparison module, the interactive module by the short message sending of acquisition to the pretreatment module, it is described pre- Processing module extracts the characteristic vector of short message, and the deep learning module forms deep learning model, the ratio according to sample set Compared with the characteristic vector that module compares short message and the deep learning model, comparative result is sent to described and handed over by the comparison module Comparative result is fed back to user by mutual module, the interactive module, and the short message includes short message text and/or URL.
Further, the deep learning model includes short message text deep learning model and/or URL depth study mould Type.
Further, the forming process of the short message text deep learning model includes pretreatment module extraction short message Short message text be characterized vector, short message text characteristic vector is imported in DBN and forms short message text by the deep learning module Deep learning model.
Further, the forming process of the URL deep learning model includes the URL that the pretreatment module extracts short message Vector is characterized, URL characteristic vectors are imported and URL deep learning model is formed in DBN by the deep learning module.
Further, the characteristic vector of the pretreatment module extraction short message includes short message text characteristic vector and/or URL Characteristic vector.
Further, the generation type of the short message text characteristic vector is that the pretreatment module divides short message text From the isolated short message text is imported into export in Woed2vec and obtains short message text characteristic vector.
Further, the generation type of the URL characteristic vectors is that the pretreatment module separates URL, will be separated To URL using extracting rule obtain URL characteristic vectors.
Further, the comparison module compares the characteristic vector and the deep learning model of short message, including relatively shorter Believe Text eigenvector and short message text deep learning model and/or URL characteristic vectors and URL deep learning model.
Further, when comparing short message text characteristic vector with short message text deep learning model, by the short message text Characteristic vector imports the short message text deep learning grader of the deep learning module, the result after classification and the depth The threshold value of study module setting compares, and feedback result is to interactive module.
Further, when comparing URL characteristic vectors with URL deep learning model, the URL characteristic vectors are imported described The URL depth Study strategies and methods of deep learning module, the threshold value ratio that the result after classification is set with the deep learning module Compared with, and feedback result is to interactive module.
Compared with prior art, the beneficial effects of the invention are as follows:The present invention has high expansibility, by by algoritic module Change, to the processing of each several part neural network moduleization;Algorithm reliability, using deep learning algorithm, automatic study sentence feature, phase Potential feature can be more excavated than shallow-layer algorithm.
The technique effect of the design of the present invention, concrete structure and generation is described further below with reference to accompanying drawing, with It is fully understood from the purpose of the present invention, feature and effect.
Brief description of the drawings
Fig. 1 is a kind of module diagram of preferred embodiment of the invention;
Fig. 2 is a kind of short message text deep learning model product process schematic diagram of preferred embodiment of the invention;
Fig. 3 is a kind of URL deep learning model product process schematic diagram of preferred embodiment of the invention;
Fig. 4 is that a kind of short message text characteristic vector of preferred embodiment of the invention is compared with short message text deep learning model Schematic flow sheet;
Fig. 5 is that a kind of URL characteristic vectors of preferred embodiment of the invention and URL deep learning model are compared flow signal Figure.
Embodiment
Below in conjunction with accompanying drawing to a kind of preferred reality for being used to recognize the deep learning algorithm of fraud text message of the present invention Example is applied to be described in detail, but the present invention is not limited in the embodiment.Thoroughly understand in order that the public has to the present invention, Concrete details is described in detail in present invention below preferred embodiment.
Embodiment 1:
As shown in figure 1, the invention provides a kind of deep learning algorithm for being used to recognize fraud text message, including deep learning Module, interactive module, pretreatment module and comparison module, interactive module to pretreatment module, locate the short message sending of acquisition in advance The characteristic vector that module extracts short message is managed, deep learning module forms deep learning model according to sample set, and comparison module compares The characteristic vector of short message and deep learning model, comparison module send comparative result to interactive module, and interactive module will compare As a result user is fed back to, short message includes short message text and/or URL.
Deep learning model includes short message text deep learning model and/or URL deep learning model.
The short message text that the forming process of short message text deep learning model includes pretreatment module extraction short message is characterized Short message text characteristic vector is imported and short message text deep learning model is formed in DBN by vector, deep learning module.
The URL that the forming process of URL deep learning model includes pretreatment module extraction short message is characterized vector, depth Practise module and URL characteristic vectors are imported into formation URL deep learning model in DBN.
The characteristic vector that pretreatment module extracts short message includes short message text characteristic vector and/or URL characteristic vectors.
The generation type of short message text characteristic vector is that pretreatment module separates short message text, will be isolated short Export obtains short message text characteristic vector in this importing of message Woed2vec.
The generation type of URL characteristic vectors is that pretreatment module separates URL, and isolated URL is advised using extraction Then obtain URL characteristic vectors.
Comparison module compares the characteristic vector of short message and deep learning model, including compare short message text characteristic vector with it is short This deep learning of message model and/or URL characteristic vectors and URL deep learning model.
When comparing short message text characteristic vector with short message text deep learning model, short message text characteristic vector is imported deep The short message text deep learning grader of study module is spent, the threshold value ratio that the result after classification is set with deep learning module Compared with, and feedback result is to interactive module.
When comparing URL characteristic vectors with URL deep learning model, URL characteristic vectors are imported to the URL of deep learning module Deep learning grader, the result after classification is compared with the threshold value that deep learning module is set, and feedback result gives interaction mould Block.
Embodiment 2:
As shown in Fig. 2 short message text deep learning model product process is as follows:
Step S11, imports short message sample, into step S12;
Step S12, pretreatment module is by the isolated short message text characteristic vector of short message sample, into step S13;
Short message text characteristic vector is imported DBN formation short message text deep learning moulds by step S13, deep learning module Type.
As shown in figure 3, URL deep learning model product process is as follows:
Step S21, imports short message sample, into step S22;
Step S22, pretreatment module is by the isolated URL characteristic vectors of short message sample, into step S13;
URL characteristic vectors are imported DBN formation URL deep learning models by step S23, deep learning module.
As shown in figure 4, to be compared flow with short message text deep learning model as follows for short message text characteristic vector:
Step S31, imports short message sample, into step S32;
Step S32, imports export in Woed2vec by short message text and obtains short message text characteristic vector, into step S33;
Step S33, short message text characteristic vector imports the short message text deep learning grader of deep learning module, enters Step S34;
Step S34, compares the short message text deep learning grader that short message text characteristic vector imports deep learning module The threshold value that sorted result is set with deep learning module, and feedback result is to interactive module;If result of the comparison is determined Short message is suspicious fraud text message, then feeds back to client's short message for fraud text message, if comparative result determines that short message is normal short Letter, then keep silent, wait next short message.
As shown in figure 5, to be compared flow with URL deep learning model as follows for URL characteristic vectors:
Step S41, imports short message sample, into step S42;
Step S42, pretreatment module separates URL, by isolated URL using extracting rule obtain URL features to Amount, into step S43;
Step S43, URL characteristic vectors is imported the URL depth Study strategies and methods of deep learning module, into step S44;
Step S44, compares the sorted knot of URL depth Study strategies and methods that URL characteristic vectors import deep learning module The threshold value that fruit sets with deep learning module, and feedback result is to interactive module;If result of the comparison determines that short message is suspicious Fraud text message, then feed back to client's short message for fraud text message, if comparative result determines that short message is normal short message, keeps quiet It is silent, wait next short message.
In summary, the present invention has high expansibility, the modularization of algorithm, at each several part neural network module Reason;Algorithm reliability, using deep learning algorithm, automatic study sentence feature can more excavate potential feature compared to shallow-layer algorithm.
Preferred embodiment of the invention described in detail above.It should be appreciated that one of ordinary skill in the art without Need creative work just can make many modifications and variations according to the design of the present invention.Therefore, all technologies in the art Personnel are available by logical analysis, reasoning, or a limited experiment on the basis of existing technology under this invention's idea Technical scheme, all should be in the protection domain being defined in the patent claims.

Claims (10)

1. a kind of deep learning algorithm for being used to recognize fraud text message, it is characterised in that including deep learning module, interaction mould Block, pretreatment module and comparison module, the interactive module by the short message sending of acquisition to the pretreatment module, the pre- place The characteristic vector that module extracts short message is managed, the deep learning module forms deep learning model, the comparison according to sample set Module compares the characteristic vector and the deep learning model of short message, and the comparison module sends comparative result to the interaction Comparative result is fed back to user by module, the interactive module, and the short message includes short message text and/or URL.
2. the deep learning algorithm as claimed in claim 1 for being used to recognize fraud text message, it is characterised in that the deep learning Model includes short message text deep learning model and/or URL deep learning model.
3. the deep learning algorithm as claimed in claim 2 for being used to recognize fraud text message, it is characterised in that the short message text The short message text that the forming process of deep learning model includes pretreatment module extraction short message is characterized vector, the depth Short message text characteristic vector is imported and short message text deep learning model is formed in DBN by study module.
4. the deep learning algorithm as claimed in claim 2 for being used to recognize fraud text message, it is characterised in that the URL depth The URL that the forming process of learning model includes pretreatment module extraction short message is characterized vector, the deep learning module URL characteristic vectors are imported URL deep learning model is formed in DBN.
5. the deep learning algorithm as claimed in claim 1 for being used to recognize fraud text message, it is characterised in that the pretreatment mould The characteristic vector that block extracts short message includes short message text characteristic vector and/or URL characteristic vectors.
6. the deep learning algorithm as claimed in claim 5 for being used to recognize fraud text message, it is characterised in that the short message text The generation type of characteristic vector is that the pretreatment module separates short message text, and the isolated short message text is led Enter export in Woed2vec and obtain short message text characteristic vector.
7. the deep learning algorithm as claimed in claim 6 for being used to recognize fraud text message, it is characterised in that the URL features The generation type of vector is that the pretreatment module separates URL, and isolated URL is obtained into URL spies using extracting rule Levy vector.
8. the deep learning algorithm as claimed in claim 1 for being used to recognize fraud text message, it is characterised in that the comparison module Compare the characteristic vector and the deep learning model of short message, including compare short message text characteristic vector and short message text depth Practise model and/or URL characteristic vectors and URL deep learning model.
9. the deep learning algorithm as claimed in claim 8 for being used to recognize fraud text message, it is characterised in that compare short message text When characteristic vector is with short message text deep learning model, the short message text characteristic vector is imported into the deep learning module Short message text deep learning grader, the result after classification is compared with the threshold value that the deep learning module is set, and is fed back As a result interactive module is given.
10. the deep learning algorithm as claimed in claim 8 for being used to recognize fraud text message, it is characterised in that compare URL features When vector is with URL deep learning model, the URL characteristic vectors are imported to the URL depth study point of the deep learning module Class device, the result after classification is compared with the threshold value that the deep learning module is set, and feedback result is to interactive module.
CN201710327007.8A 2017-05-10 2017-05-10 A kind of deep learning algorithm for being used to recognize fraud text message Pending CN107239504A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710327007.8A CN107239504A (en) 2017-05-10 2017-05-10 A kind of deep learning algorithm for being used to recognize fraud text message

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710327007.8A CN107239504A (en) 2017-05-10 2017-05-10 A kind of deep learning algorithm for being used to recognize fraud text message

Publications (1)

Publication Number Publication Date
CN107239504A true CN107239504A (en) 2017-10-10

Family

ID=59984321

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710327007.8A Pending CN107239504A (en) 2017-05-10 2017-05-10 A kind of deep learning algorithm for being used to recognize fraud text message

Country Status (1)

Country Link
CN (1) CN107239504A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108021932A (en) * 2017-11-22 2018-05-11 北京奇虎科技有限公司 Data detection method, device and electronic equipment
CN109922444A (en) * 2017-12-13 2019-06-21 中国移动通信集团公司 A kind of refuse messages recognition methods and device
CN109982272A (en) * 2019-02-13 2019-07-05 北京航空航天大学 A kind of fraud text message recognition methods and device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103607705A (en) * 2013-12-04 2014-02-26 北京网秦天下科技有限公司 Junk message filtering method and engine
CN106161209A (en) * 2016-07-21 2016-11-23 康佳集团股份有限公司 A kind of method for filtering spam short messages based on degree of depth self study and system
CN106332024A (en) * 2016-08-31 2017-01-11 华为技术有限公司 Insecure short message recognition method and related equipment

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103607705A (en) * 2013-12-04 2014-02-26 北京网秦天下科技有限公司 Junk message filtering method and engine
CN106161209A (en) * 2016-07-21 2016-11-23 康佳集团股份有限公司 A kind of method for filtering spam short messages based on degree of depth self study and system
CN106332024A (en) * 2016-08-31 2017-01-11 华为技术有限公司 Insecure short message recognition method and related equipment

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108021932A (en) * 2017-11-22 2018-05-11 北京奇虎科技有限公司 Data detection method, device and electronic equipment
CN109922444A (en) * 2017-12-13 2019-06-21 中国移动通信集团公司 A kind of refuse messages recognition methods and device
CN109922444B (en) * 2017-12-13 2020-11-03 中国移动通信集团公司 Spam message identification method and device
CN109982272A (en) * 2019-02-13 2019-07-05 北京航空航天大学 A kind of fraud text message recognition methods and device
CN109982272B (en) * 2019-02-13 2020-08-28 北京航空航天大学 Fraud short message identification method and device

Similar Documents

Publication Publication Date Title
CN107885999B (en) Vulnerability detection method and system based on deep learning
CN107239504A (en) A kind of deep learning algorithm for being used to recognize fraud text message
CN109005145B (en) Malicious URL detection system and method based on automatic feature extraction
CN107943941B (en) Junk text recognition method and system capable of being updated iteratively
CN106446195A (en) News recommending method and device based on artificial intelligence
CN105279405A (en) Keypress behavior pattern construction and analysis system of touch screen user and identity recognition method thereof
CN1319331C (en) Method and system for detecting and discriminating counterfeit web page
CN104809069A (en) Source node loophole detection method based on integrated neural network
CN103258535A (en) Identity recognition method and system based on voiceprint recognition
CN105302884B (en) Webpage mode identification method and visual structure learning method based on deep learning
CN107104988B (en) IPv6 intrusion detection method based on probabilistic neural network
CN107644106A (en) The internuncial method of automatic mining business, terminal device and storage medium
CN112766166A (en) Lip-shaped forged video detection method and system based on polyphone selection
Fujii et al. HumanGAN: generative adversarial network with human-based discriminator and its evaluation in speech perception modeling
Alsubaei et al. Enhancing phishing detection: A novel hybrid deep learning framework for cybercrime forensics
Yusoff et al. Fraud detection in telecommunication industry using Gaussian mixed model
CN110049034A (en) A kind of real-time Sybil attack detection method of complex network based on deep learning
Luceri et al. Unmasking the web of deceit: Uncovering coordinated activity to expose information operations on twitter
CN109218721A (en) A kind of mutation video detecting method compared based on frame
CN107193900A (en) A kind of identifying system and its application method of suspicious SMS
CN102891838A (en) Method and device for detecting promotion content in question and answer club
CN110661795A (en) Vector-level threat information automatic production and distribution system and method
CN114155880A (en) Illegal voice recognition method and system based on GBDT algorithm model
Rozhon et al. Using lstm cells for sip dialogs mapping and security analysis
Jesmithaa et al. Detecting phishing attacks using Convolutional Neural Network and LSTM

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20171010