CN113034331A

CN113034331A - Android gambling application identification method and system based on multi-mode fusion

Info

Publication number: CN113034331A
Application number: CN202110490157.7A
Authority: CN
Inventors: 纪天啸; 胡燕林; 李致; 闵宗茹; 沈传年; 杨�一; 陈曲; 徐彦婷; 张超超; 王心丹
Original assignee: Shanghai Branch Of National Computer Network And Information Security Management Center
Current assignee: Shanghai Branch Of National Computer Network And Information Security Management Center
Priority date: 2021-05-06
Filing date: 2021-05-06
Publication date: 2021-06-25

Abstract

The invention discloses an android gambling application identification method and system based on multi-mode fusion, belonging to the technical field of android application security, wherein the identification method comprises the following specific processes; (1) obtaining website comments and downloading clue information of android gambling applications in gambling websites in batches; (2) finding an application download link ending with APK in a target website, and extracting suspected android application; (3) extracting a package name, an icon, a certificate, an IP address, a URL domain name and an email address of the android application installation package; (4) judging that the gambling application comprises an image model, a text model and a Multihead Attention fusion model through a multi-mode fused android gambling application recognition model; (5) storing the found basic information of the android application and the application installation package; the multi-mode recognition model can accurately recognize gambling applications in the android platform, and is beneficial to reducing network gambling illegal criminal activities.

Description

Android gambling application identification method and system based on multi-mode fusion

Technical Field

The invention relates to the technical field of android application security, in particular to an android gambling application identification method and system based on multi-mode fusion.

Background

Through retrieval, Chinese patent No. CN108052523A discloses a gambling website identification method and system based on a convolutional neural network, and the method can identify the webpage screenshot of a website to be predicted through a convolutional neural network model, judge whether the website is the gambling website, but only train and identify the image characteristics of the website; in recent years, with the rapid development of the internet and mobile communication industry, network gambling is used as an illegal criminal activity with a large number of participants, a wide spread channel and a large involved case scale, and is spread continuously in the world, so that a large amount of funds are leaked, and derived crimes seriously threaten the social security of China; the Android (Android) mobile application is used as one of important propagation carriers of network gambling information, has the characteristics of platform opening, no need of official or third-party application store installation and the like, and a large number of Android platform gambling applications are propagated in a mode of directly providing an installation package or providing download links in official websites or other information propagation channels; currently, gambling applications are one of harmful applications, android platform harmful application analysis mainly focuses on traditional network security fields such as malicious codes and behavior security, and has few researches on harmful discovery and content security of application contents; therefore, it becomes important to invent an android gambling application identification method and system based on multi-mode fusion;

the existing android application identification method usually focuses on network security problems such as malicious codes, behavior security and the like in the application, discovery and discrimination of application harmful contents and research on content information security are less, and content security related public training and test data sets are less; in addition, the android gambling application installation package contains a large amount of multi-mode information such as texts, pictures, certificates and the like, and the existing multi-mode fusion method has no pertinence; therefore, the android gambling application identification method and system based on multi-mode fusion are provided.

Disclosure of Invention

The invention aims to solve the defects in the prior art, and provides an android gambling application identification method and system based on multi-mode fusion.

In order to achieve the purpose, the invention adopts the following technical scheme:

an android gambling application identification method based on multi-mode fusion comprises the following specific processes:

(1) obtaining website comments and downloading clue information of android gambling applications in gambling websites in batches;

(2) finding an application download link ending with APK in a target website, and extracting suspected android application;

(3) extracting a package name, an icon, a certificate, an IP address, a URL domain name and an email address of the android application installation package;

(4) judging that the gambling application comprises an image model, a text model and a Multihead Attention fusion model through a multi-mode fused android gambling application recognition model;

(5) storing the found basic information of the android application and the application installation package;

(6) the front end shows the discovery of android gambling and the new discovery of android gambling applications.

Preferably, the image model determination process specifically includes:

s1: mapping icon pictures to feature f using VGGNet_IThe formula is as follows:

f_I＝CNN_vgg(I) (1)

s2: scaling the icon picture to 448 x 448 pixels and then obtaining the feature f of the last pooling layer_IThe output dimension is 512 × 14, wherein 14 × 14 is the number of picture segmentation regions, and 512 is the dimension of each region feature vector;

s3: each feature vector is converted into a new vector with the same dimension as the text vector by using a single-layer perceptron, and the formula is as follows:

v_I＝tanh(W_If_I+b_I) (2)

in the formula: v. of_IIs a matrix, the ith column of which is the region i of the picture feature vector;

s4: the gradient vanishing problem is solved by using 18-layer or 34-layer residual neural network.

Preferably, the residual neural network consists of two stacked layer building blocks, and the structure of the residual neural network is as follows:

H(x)＝F(x,{W_i})+x (3)

in the formula: x and H (x) are the input and output vectors of the building block; f (x, { W)_i}) represents a learned residual map;

wherein:

F(x)＝W₂δ(W₁x) (4)

in the formula: δ represents the activation function, W₁Represents the first connection weight, W₂Representing a second connection weight;

if the dimensions x and F do not match, the dimensions can be matched using a linear mapping W _ s:

H(x)＝F(x，{W_i})+W_δx (5)

the formula f (x) + x is implemented by a forward neural network with a shortcut connection, which is implemented by an identity map, the output will be added to the final output of the stack.

Preferably, the text model is specifically LSTM, the basic structure of LSTM is a memory unit that retains sequence states, and in each step, the LSTM unit obtains an input word vector x_tUpdating the memory cell c_tThen outputs a hidden state h_t(ii) a A door mechanism is used in the updating process; forgetting door f_tControlling c from the last state_t-1How much information is kept in; an input gate i_tControlling the current input x_tUpdating how much information to the memory unit; the output gate controls how much information enters the output, namely the hidden state, and the detailed updating process is as follows:

i_t＝σ(W_xix_t+W_hih_t-1+b_i) (6)

f_t＝σ(W_xfx_t+W_hfh_t-1+b_f) (7)

o_t＝σ(W_xox_t+W_hoh_t-1+b_o) (8)

c_t＝f_tc_t-1+i_ttanh(W_xcx_t+W_hch_t-1+b_c) (9)

h_t＝o_ttanh(c_t) (10)

in the formula: i, f, o and c are respectively an input gate, a forgetting gate, an output gate and a memory unit;

the word vector x_tAs input to the LSTM, the formula is as follows:

x_t＝W_eq_t,t∈{1,2，…T} (11)

h_t＝LSTM(x_t),t∈{1，2,…T} (12)

in the formula: q ═ q₁,…,q_T]Representing text, q_tIs a one-hot vector representation of the word at location t.

Preferably, the Multihead Attention fusion model is a multimodality fusion model based on an Attention mechanism, and the result output by the picture model and the text model is fused by an Attention mechanism, and the Attention mechanism is calculated as follows:

(1) calculating the weight of the query and each key through the similarity, wherein similar functions use dot products;

(2) dot product operation of which the factor

The adjustment function is realized, so that the dot product is not too large;

(3) normalizing the obtained weights by a softmax function;

(4) obtaining their weighted sum by similarity and value of corresponding key;

based on the above steps, the following formula is obtained:

in the formula: q is the feature vector of the picture, and V and K are the output of the text model.

Preferably, the multi-modal fusion needs to pass through a global average pooling layer, and the formula is as follows:

v_gap＝Global(v₁,v₂,…,v_n) (14)

finally, v obtained is processed_gapThe vector is directly input into the softmax layer for classification prediction, and the prediction result is as follows,

the purpose of introducing the cross entropy function is to evaluate the model, reflecting the true class y and the predicted class

The difference between them:

in the formula: i is an index number.

An android gambling application recognition system based on multi-mode fusion comprises a web crawler layer, an extraction and discrimination layer, a data storage layer and a result display layer;

the network crawler layer is used for crawling android gambling application clues in website contents and comment information and grabbing and finding android application download addresses;

the extraction discrimination layer is used for extracting basic information of the android application and carrying out android platform gambling application identification based on multi-mode fusion;

the extraction discrimination layer is used for storing the identified android application installation package and storing android application information;

the achievement display layer is used for displaying the android gambling application in a whole manner and displaying the new findings of the android gambling application.

Compared with the prior art, the invention has the beneficial effects that:

1. according to the android gambling application identification method based on multi-mode fusion, resource files containing a large number of different modes in an android platform application installation package, such as package names, icons, certificates and character strings (IP addresses, URL domain names and e-mail addresses), are extracted through a data crawler method, and then picture and text feature vectors in application are distinguished through a picture model and a text model, so that a foundation is laid for the establishment of a subsequent multi-mode identification model;

2. according to the android gambling application identification method based on multi-mode fusion, the characteristic vectors obtained by the picture model and the text model are subjected to fusion training to form a multi-mode identification model, compared with the traditional single-characteristic android gambling application identification model, the multi-mode identification model is wider in identification range and higher in identification precision, so that gambling applications in an android platform can be accurately and automatically identified, and network gambling illegal criminal activities can be reduced.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention.

Fig. 1 is an overall flowchart of an artificial intelligence eye disease screening service method according to the present invention;

FIG. 2 is a schematic overall structure diagram of an android gambling application recognition system based on multi-mode fusion, provided by the invention;

FIG. 3 is a schematic diagram illustrating a process of determining a picture model according to the present invention;

FIG. 4 is a schematic diagram of a residual neural network structure according to the present invention;

FIG. 5 is a schematic diagram of a fusion process of the Multihead Attention fusion model of the present invention;

FIG. 6 is a schematic illustration of a calculation process for the attention mechanism of the present invention;

FIG. 7 is a diagram of the ResNet-18 model prediction results of the present invention;

FIG. 8 is a diagram illustrating the model prediction results after ResNet-18 pre-training in accordance with the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments.

In the description of the present invention, it is to be understood that the terms "upper", "lower", "front", "rear", "left", "right", "top", "bottom", "inner", "outer", and the like, indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, are merely for convenience in describing the present invention and simplifying the description, and do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be construed as limiting the present invention.

Referring to FIGS. 1-8, 495 gambling-class applications are collected, of which 395 training sets, 50 validation sets, and 50 test sets, crawl from the application market to 446 non-gambling normal applications, of which 346 training sets, 50 validation sets, and 50 test sets, as shown in the following table:

	training set	Verification set	Test set	Total number of
					Gambling class	395	50	50	495
Non-gambling games	346	50	50	446

The text information that can be obtained by extracting the text information in the application is shown in the following table:

serial number	Domain name
		1	www.qhc25.com
2	agmbet.com
		3	api.383game7a1.com
4	api.yjgame1.com
		5	api.kgky8372.com
Serial number	Partial in-application text examples
		1	Full international, red-envelope fishing and AG video …
2	All-season color, dragon and tiger war, Shenlongbao Tibetan and fried golden flower …
		3	Lebo cash network, lottery 25, Wuwan …
4	Venice entertainment, Baijiale …
		5	Yongli international entertainment city …

10994 collected short texts were used as pre-training data for the text model, wherein the gambling text was 3425 and the non-gambling normal text was 7569, as shown in the following table:

during training, firstly, preprocessing a picture, and then transforming pixels from 0 to 255 to between-1 and 1 for regularization through scaling and clipping;

the embodiment provides an android gambling application identification method based on multi-mode fusion, which specifically comprises the following processes:

(3) information such as package names, icons, certificates and character strings (IP addresses, URL domain names and e-mail addresses) of the android application installation package is extracted, and part of android application information extraction examples are as follows:

The image model determination process is specifically as follows:

s1: mapping icon pictures to feature f using VGGNet_IThe formula is as follows:

f_I＝CNN_vgg(I) (1)

v_I＝tanh(W_If_I+b_I) (2)

The residual error neural network consists of two stacked layer building blocks, and the structure is as follows:

H(x)＝F(x,{W_i})+x (3)

wherein:

F(x)＝W₂δ(W₁x) (4)

H(x)＝F(x，{W_i})+W_δx (5)

It should be noted that the text model is specifically LSTM, and the basic structure of LSTM is a memory unit that retains sequence states, and in each step, the LSTM unit obtains an input word vector x_tUpdating the memory cell c_tThen outputs a hidden state h_t(ii) a A door mechanism is used in the updating process; forgetting door f_tControlling c from the last state_t-1How much information is kept in; an input gate i_tControlling the current input x_tUpdating how much information to the memory unit; the output gate controls how much information enters the output, namely the hidden state, and the detailed updating process is as follows:

i_t＝σ(W_xix_t+W_hih_t-1+b_i) (6)

f_t＝σ(W_xfx_t+W_hfh_t-1+b_f) (7)

o_t＝σ(W_xox_t+W_hoh_t-1+b_o) (8)

c_t＝f_tc_t-1+i_ttanh(W_xcx_t+W_hch_t-1+b_c) (9)

h_t＝o_ttanh(c_t) (10)

word vector x_tAs input to the LSTM, the formula is as follows:

x_t＝W_eq_t,t∈{1,2，…T} (11)

h_t＝LSTM(x_t),t∈{1，2,…T} (12)

It should be noted that the Multihead Attention fusion model is specifically a multimodality fusion model based on an Attention mechanism, and the result output by the picture model and the text model is fused by an Attention mechanism, and the calculation process of the Attention mechanism is as follows:

(2) dot product operation of which the factor

The adjustment function is realized, so that the dot product is not too large;

(3) normalizing the obtained weights by a softmax function;

(4) obtaining their weighted sum by similarity and value of corresponding key;

based on the above steps, the following formula is obtained:

The multi-modal fusion needs to pass through a global average pooling layer, and the formula is as follows:

v_gap＝Global(v₁,v₂,…,v_n) (14)

The difference between them:

in the formula: i is an index number.

The embodiment provides an android gambling application recognition system based on multi-mode fusion, which comprises a web crawler layer, an extraction and discrimination layer, a data storage layer and an achievement display layer;

the network crawler layer is used for crawling web content and android gambling application clues in the comment information, and grabbing and finding android application download addresses;

The results of the picture model on the test set are shown in the following table:

Model	Precision	Recall	F1
				CNN	0.61	0.86	0.71
ResNet-18	0.82	0.79	0.80
				ResNet-34	0.78	0.77	0.77
CNN(Pre-training)	0.69	0.78	0.73
				ResNet-18(Pre-training)	0.84	0.80	0.82
ResNet-34(Pre-training)	0.82	0.78	0.80

specifically, the ResNet network is obviously superior to the basic CNN network from the icon recognition result, and even though the network is pre-trained, the recognition effect of the basic CNN network is not greatly improved; it is worth noting that the model effect of ResNet-34 is worse than that of ResNet-18, but after the pre-trained model is used, the model effect is greatly improved, but still worse than that of ResNet-18;

as shown in fig. 7 and 8, from the perspective of the predicted results of the pictures, the basic CNN model can predict most of the results, but the predicted probability is only raised by a few percent from about fifty percent before training, while the predicted probability of the ResNet model can be as high as ninety percent.

The results of the text model on the test set are shown in the following table:

specifically, it can be seen that the LSTM has a significant improvement on the effect of the basic RNN model, and through pre-training of other gambling-type short texts, the basic RNN model is also greatly improved, but the effect is still inferior to that of the LSTM model, so that the LSTM text model subjected to pre-training is selected in the final model, but the overall effect of the text model is significantly inferior to that of the picture model.

The results of the fusion model on the test set are shown in the following table:

Model	Precision	Recall	F1
				ResNet	0.84	0.80	0.82
LSTM	0.80	0.81	0.80
				LSTM-ResNet-Concat	0.88	0.92	0.90
LSTM-ResNet-MHAT	0.90	0.93	0.91

in the final multi-modal feature fusion model, the image feature vectors and the text feature vectors are also subjected to connection operation, and then classified prediction is carried out on the image feature vectors and the text feature vectors through a full connection layer, a global average pooling layer and a softmax layer, and the classified prediction is used as a baseline model of an experiment for comparison; the experimental result shows that when the characteristics of a single mode are used, the model can learn that the gambling harmful application is different from the normal application, the effective identification is carried out, when the characteristics of multiple modes are used, the effect of the model is obviously improved, the Multihead Attention mechanism can obtain better results by paying Attention to partial areas in pictures instead of interference introduced by global pictures, and the identification of more harmful gambling applications is of great significance.

The working principle and the using process of the invention are as follows: when the multi-mode fusion-based android gambling application identification system is used, firstly, website comments and downloading thread information of an android gambling application in a gambling website are obtained in batches; then, discovering an application download link ending with the APK on a target website, and extracting suspected android applications; then extracting the package name, the icon, the certificate, the IP address, the URL domain name and the email address of the android application installation package; then, through a multi-mode fused android gambling application recognition model, judging that the gambling application comprises an image model, a text model and a Multihead Attention fusion model; then storing the found basic information of the android application and the application installation package; finally, the front end displays the found situation of the android gambling and the new found situation of the application of the android gambling; according to the invention, static resources in the application, such as package names, icons, certificates and character strings (IP addresses, URL domain names and e-mail addresses), are extracted through a crawler technology, and then characters, pictures and other characteristics of different modes are combined by applying a Multihead Attention multimodal fusion technology to construct a multimodality fused android gambling application identification model, so that the gambling application in an android platform can be accurately and automatically identified.

The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be considered to be within the technical scope of the present invention, and the technical solutions and the inventive concepts thereof according to the present invention should be equivalent or changed within the scope of the present invention.

Claims

1. An android gambling application identification method based on multi-mode fusion is characterized by comprising the following specific steps:

2. The method for identifying the android gambling application based on the multi-modal fusion as claimed in claim 1, wherein the image model decision process is specifically as follows:

s1: mapping icon pictures to feature f using VGGNet_IThe formula is as follows:

f_I＝CNN_vgg(I) (1)

v_I＝tanh(W_If_I+b_I) (2)

3. The identification method for the android gambling application based on multi-modal fusion is characterized in that the residual neural network consists of two stacked layer building blocks, and the structure of the residual neural network is as follows:

H(x)＝F(x，{W_i})+x (3)

wherein:

F(x)＝W₂δ(W₁x) (4)

H(x)＝F(x，{W_i})+W_δx (5)

4. The method as claimed in claim 1, wherein the text model is LSTM, the basic structure of LSTM is a memory unit with sequence state preserved, and in each step, LSTM unit obtains an input word vector x_tUpdating the memory cell c_tThen outputs a hidden state h_t(ii) a A door mechanism is used in the updating process; forgetting door f_tControlling c from the last state_t-1How much information is kept in; an input gate i_tControlling the current input x_tUpdating how much information to the memory unit; the output gate controls how much information enters the output, namely the hidden state, and the detailed updating process is as follows:

i_t＝σ(W_xix_t+W_hih_t-1+b_i) (6)

f_t＝σ(W_xfx_t+W_hfh_t-1+b_f) (7)

o_t＝σ(W_xox_t+W_hoh_t-1+b_o) (8)

c_t＝f_tc_t-1+i_ttanh(W_xcx_t+W_hch_t-1+b_c) (9)

h_t＝o_ttanh(c_t) (10)

the word vector x_tAs input to the LSTM, the formula is as follows:

x_t＝W_eq_t，t∈{1，2，...T} (11)

h_t＝LSTM(x_t)，t∈{1，2，...T} (12)

in the formula: q ═ q₁，...，q_T]Representing text, q_tIs a one-hot vector representation of the word at location t.

5. The method for identifying an android gambling application based on multi-modal fusion as claimed in claim 1, wherein the Multihead Attention fusion model is a multi-modal fusion model based on an Attention mechanism, and the result output by the picture model and the text model is fused by an Attention mechanism, and the Attention mechanism is calculated as follows:

(2) dot product operation due to

The adjustment function is realized, so that the dot product is not too large;

(3) normalizing the obtained weights by a softmax function;

(4) obtaining their weighted sum by similarity and value of corresponding key;

based on the above steps, the following formula is obtained:

6. The method for identifying an android gambling application based on multi-modal fusion as claimed in claim 5, wherein the multi-modal fusion needs to pass through a global average pooling layer, and the formula is as follows:

v_gap＝Global(v₁，v₂，...，v_n) (14)

v obtained is_gapThe vector is directly input into the softmax layer for classification prediction, and the prediction result is as follows,

The difference between them:

in the formula: i is an index number.

7. An android gambling application identification system based on multi-mode fusion is characterized by comprising a web crawler layer, an extraction discrimination layer, a data storage layer and a result display layer;