CN107239504A

CN107239504A - A kind of deep learning algorithm for being used to recognize fraud text message

Info

Publication number: CN107239504A
Application number: CN201710327007.8A
Authority: CN
Inventors: 邹福泰; 张成伟; 王祺文; 俞汤达; 张哲迪; 李林森
Original assignee: Shanghai Jiaotong University
Current assignee: Shanghai Jiaotong University
Priority date: 2017-05-10
Filing date: 2017-05-10
Publication date: 2017-10-10

Abstract

The invention provides a kind of deep learning algorithm for being used to recognize fraud text message, it is related to information security field, including deep learning module, interactive module, pretreatment module and comparison module, the interactive module is by the short message sending of acquisition to the pretreatment module, the pretreatment module extracts the characteristic vector of short message, the deep learning module forms deep learning model according to sample set, the comparison module compares the characteristic vector and the deep learning model of short message, the comparison module sends comparative result to the interactive module, comparative result is fed back to user by the interactive module, the short message includes short message text and/or URL.The present invention has high expansibility, by by algoritic module, to the processing of each several part neural network moduleization；Algorithm is reliable, using deep learning algorithm, automatic study sentence feature, and potential feature can be more excavated compared to shallow-layer algorithm.

Description

A kind of deep learning algorithm for being used to recognize fraud text message

Technical field

The present invention relates to information security field, more particularly, to a kind of deep learning algorithm for being used to recognize fraud text message.

Background technology

Mobile phone has been the indispensable instrument of people's daily life, is unequal in this case using the thing number of short message fraud Number, and also have the impetus further expanded.

Currently for fraud text message, domestic well-known safe mobile phone manufacturer is substantially simple right using database progress Than, or employ simple machine learning algorithm to recognize fraud text message, it there is no the method using deep learning To recognize fraud text message, and current preventing mobile phone swindle algorithm is simple, can not be effectively protected user.

The content of the invention

In view of the drawbacks described above of prior art, the technical problems to be solved by the invention are to provide a kind of with high expansion Property, algoritic module, can more excavate potential feature be used for recognize the deep learning algorithm of fraud text message.

The invention provides a kind of deep learning algorithm for being used to recognize fraud text message, including deep learning module, interaction Module, pretreatment module and comparison module, the interactive module by the short message sending of acquisition to the pretreatment module, it is described pre- Processing module extracts the characteristic vector of short message, and the deep learning module forms deep learning model, the ratio according to sample set Compared with the characteristic vector that module compares short message and the deep learning model, comparative result is sent to described and handed over by the comparison module Comparative result is fed back to user by mutual module, the interactive module, and the short message includes short message text and/or URL.

Further, the deep learning model includes short message text deep learning model and/or URL depth study mould Type.

Further, the forming process of the short message text deep learning model includes pretreatment module extraction short message Short message text be characterized vector, short message text characteristic vector is imported in DBN and forms short message text by the deep learning module Deep learning model.

Further, the forming process of the URL deep learning model includes the URL that the pretreatment module extracts short message Vector is characterized, URL characteristic vectors are imported and URL deep learning model is formed in DBN by the deep learning module.

Further, the characteristic vector of the pretreatment module extraction short message includes short message text characteristic vector and/or URL Characteristic vector.

Further, the generation type of the short message text characteristic vector is that the pretreatment module divides short message text From the isolated short message text is imported into export in Woed2vec and obtains short message text characteristic vector.

Further, the generation type of the URL characteristic vectors is that the pretreatment module separates URL, will be separated To URL using extracting rule obtain URL characteristic vectors.

Further, the comparison module compares the characteristic vector and the deep learning model of short message, including relatively shorter Believe Text eigenvector and short message text deep learning model and/or URL characteristic vectors and URL deep learning model.

Further, when comparing short message text characteristic vector with short message text deep learning model, by the short message text Characteristic vector imports the short message text deep learning grader of the deep learning module, the result after classification and the depth The threshold value of study module setting compares, and feedback result is to interactive module.

Further, when comparing URL characteristic vectors with URL deep learning model, the URL characteristic vectors are imported described The URL depth Study strategies and methods of deep learning module, the threshold value ratio that the result after classification is set with the deep learning module Compared with, and feedback result is to interactive module.

Compared with prior art, the beneficial effects of the invention are as follows：The present invention has high expansibility, by by algoritic module Change, to the processing of each several part neural network moduleization；Algorithm reliability, using deep learning algorithm, automatic study sentence feature, phase Potential feature can be more excavated than shallow-layer algorithm.

The technique effect of the design of the present invention, concrete structure and generation is described further below with reference to accompanying drawing, with It is fully understood from the purpose of the present invention, feature and effect.

Brief description of the drawings

Fig. 1 is a kind of module diagram of preferred embodiment of the invention；

Fig. 2 is a kind of short message text deep learning model product process schematic diagram of preferred embodiment of the invention；

Fig. 3 is a kind of URL deep learning model product process schematic diagram of preferred embodiment of the invention；

Fig. 4 is that a kind of short message text characteristic vector of preferred embodiment of the invention is compared with short message text deep learning model Schematic flow sheet；

Fig. 5 is that a kind of URL characteristic vectors of preferred embodiment of the invention and URL deep learning model are compared flow signal Figure.

Embodiment

Below in conjunction with accompanying drawing to a kind of preferred reality for being used to recognize the deep learning algorithm of fraud text message of the present invention Example is applied to be described in detail, but the present invention is not limited in the embodiment.Thoroughly understand in order that the public has to the present invention, Concrete details is described in detail in present invention below preferred embodiment.

Embodiment 1：

As shown in figure 1, the invention provides a kind of deep learning algorithm for being used to recognize fraud text message, including deep learning Module, interactive module, pretreatment module and comparison module, interactive module to pretreatment module, locate the short message sending of acquisition in advance The characteristic vector that module extracts short message is managed, deep learning module forms deep learning model according to sample set, and comparison module compares The characteristic vector of short message and deep learning model, comparison module send comparative result to interactive module, and interactive module will compare As a result user is fed back to, short message includes short message text and/or URL.

Deep learning model includes short message text deep learning model and/or URL deep learning model.

The short message text that the forming process of short message text deep learning model includes pretreatment module extraction short message is characterized Short message text characteristic vector is imported and short message text deep learning model is formed in DBN by vector, deep learning module.

The URL that the forming process of URL deep learning model includes pretreatment module extraction short message is characterized vector, depth Practise module and URL characteristic vectors are imported into formation URL deep learning model in DBN.

The characteristic vector that pretreatment module extracts short message includes short message text characteristic vector and/or URL characteristic vectors.

The generation type of short message text characteristic vector is that pretreatment module separates short message text, will be isolated short Export obtains short message text characteristic vector in this importing of message Woed2vec.

The generation type of URL characteristic vectors is that pretreatment module separates URL, and isolated URL is advised using extraction Then obtain URL characteristic vectors.

Comparison module compares the characteristic vector of short message and deep learning model, including compare short message text characteristic vector with it is short This deep learning of message model and/or URL characteristic vectors and URL deep learning model.

When comparing short message text characteristic vector with short message text deep learning model, short message text characteristic vector is imported deep The short message text deep learning grader of study module is spent, the threshold value ratio that the result after classification is set with deep learning module Compared with, and feedback result is to interactive module.

When comparing URL characteristic vectors with URL deep learning model, URL characteristic vectors are imported to the URL of deep learning module Deep learning grader, the result after classification is compared with the threshold value that deep learning module is set, and feedback result gives interaction mould Block.

Embodiment 2：

As shown in Fig. 2 short message text deep learning model product process is as follows：

Step S11, imports short message sample, into step S12；

Step S12, pretreatment module is by the isolated short message text characteristic vector of short message sample, into step S13；

Short message text characteristic vector is imported DBN formation short message text deep learning moulds by step S13, deep learning module Type.

As shown in figure 3, URL deep learning model product process is as follows：

Step S21, imports short message sample, into step S22；

Step S22, pretreatment module is by the isolated URL characteristic vectors of short message sample, into step S13；

URL characteristic vectors are imported DBN formation URL deep learning models by step S23, deep learning module.

As shown in figure 4, to be compared flow with short message text deep learning model as follows for short message text characteristic vector：

Step S31, imports short message sample, into step S32；

Step S32, imports export in Woed2vec by short message text and obtains short message text characteristic vector, into step S33；

Step S33, short message text characteristic vector imports the short message text deep learning grader of deep learning module, enters Step S34；

Step S34, compares the short message text deep learning grader that short message text characteristic vector imports deep learning module The threshold value that sorted result is set with deep learning module, and feedback result is to interactive module；If result of the comparison is determined Short message is suspicious fraud text message, then feeds back to client's short message for fraud text message, if comparative result determines that short message is normal short Letter, then keep silent, wait next short message.

As shown in figure 5, to be compared flow with URL deep learning model as follows for URL characteristic vectors：

Step S41, imports short message sample, into step S42；

Step S42, pretreatment module separates URL, by isolated URL using extracting rule obtain URL features to Amount, into step S43；

Step S43, URL characteristic vectors is imported the URL depth Study strategies and methods of deep learning module, into step S44；

Step S44, compares the sorted knot of URL depth Study strategies and methods that URL characteristic vectors import deep learning module The threshold value that fruit sets with deep learning module, and feedback result is to interactive module；If result of the comparison determines that short message is suspicious Fraud text message, then feed back to client's short message for fraud text message, if comparative result determines that short message is normal short message, keeps quiet It is silent, wait next short message.

In summary, the present invention has high expansibility, the modularization of algorithm, at each several part neural network module Reason；Algorithm reliability, using deep learning algorithm, automatic study sentence feature can more excavate potential feature compared to shallow-layer algorithm.

Preferred embodiment of the invention described in detail above.It should be appreciated that one of ordinary skill in the art without Need creative work just can make many modifications and variations according to the design of the present invention.Therefore, all technologies in the art Personnel are available by logical analysis, reasoning, or a limited experiment on the basis of existing technology under this invention's idea Technical scheme, all should be in the protection domain being defined in the patent claims.

Claims

1. a kind of deep learning algorithm for being used to recognize fraud text message, it is characterised in that including deep learning module, interaction mould Block, pretreatment module and comparison module, the interactive module by the short message sending of acquisition to the pretreatment module, the pre- place The characteristic vector that module extracts short message is managed, the deep learning module forms deep learning model, the comparison according to sample set Module compares the characteristic vector and the deep learning model of short message, and the comparison module sends comparative result to the interaction Comparative result is fed back to user by module, the interactive module, and the short message includes short message text and/or URL.

2. the deep learning algorithm as claimed in claim 1 for being used to recognize fraud text message, it is characterised in that the deep learning Model includes short message text deep learning model and/or URL deep learning model.

3. the deep learning algorithm as claimed in claim 2 for being used to recognize fraud text message, it is characterised in that the short message text The short message text that the forming process of deep learning model includes pretreatment module extraction short message is characterized vector, the depth Short message text characteristic vector is imported and short message text deep learning model is formed in DBN by study module.

4. the deep learning algorithm as claimed in claim 2 for being used to recognize fraud text message, it is characterised in that the URL depth The URL that the forming process of learning model includes pretreatment module extraction short message is characterized vector, the deep learning module URL characteristic vectors are imported URL deep learning model is formed in DBN.

5. the deep learning algorithm as claimed in claim 1 for being used to recognize fraud text message, it is characterised in that the pretreatment mould The characteristic vector that block extracts short message includes short message text characteristic vector and/or URL characteristic vectors.

6. the deep learning algorithm as claimed in claim 5 for being used to recognize fraud text message, it is characterised in that the short message text The generation type of characteristic vector is that the pretreatment module separates short message text, and the isolated short message text is led Enter export in Woed2vec and obtain short message text characteristic vector.

7. the deep learning algorithm as claimed in claim 6 for being used to recognize fraud text message, it is characterised in that the URL features The generation type of vector is that the pretreatment module separates URL, and isolated URL is obtained into URL spies using extracting rule Levy vector.

8. the deep learning algorithm as claimed in claim 1 for being used to recognize fraud text message, it is characterised in that the comparison module Compare the characteristic vector and the deep learning model of short message, including compare short message text characteristic vector and short message text depth Practise model and/or URL characteristic vectors and URL deep learning model.

9. the deep learning algorithm as claimed in claim 8 for being used to recognize fraud text message, it is characterised in that compare short message text When characteristic vector is with short message text deep learning model, the short message text characteristic vector is imported into the deep learning module Short message text deep learning grader, the result after classification is compared with the threshold value that the deep learning module is set, and is fed back As a result interactive module is given.

10. the deep learning algorithm as claimed in claim 8 for being used to recognize fraud text message, it is characterised in that compare URL features When vector is with URL deep learning model, the URL characteristic vectors are imported to the URL depth study point of the deep learning module Class device, the result after classification is compared with the threshold value that the deep learning module is set, and feedback result is to interactive module.