CN113691492A - Method, system, device and readable storage medium for determining illegal application program - Google Patents
Method, system, device and readable storage medium for determining illegal application program Download PDFInfo
- Publication number
- CN113691492A CN113691492A CN202110655002.4A CN202110655002A CN113691492A CN 113691492 A CN113691492 A CN 113691492A CN 202110655002 A CN202110655002 A CN 202110655002A CN 113691492 A CN113691492 A CN 113691492A
- Authority
- CN
- China
- Prior art keywords
- application program
- information
- application
- address
- suspected illegal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 52
- 238000013145 classification model Methods 0.000 claims abstract description 61
- 238000011161 development Methods 0.000 claims abstract description 36
- 238000004891 communication Methods 0.000 claims abstract description 33
- 238000012549 training Methods 0.000 claims description 33
- 244000035744 Hura crepitans Species 0.000 claims description 25
- 239000013598 vector Substances 0.000 claims description 20
- 238000012795 verification Methods 0.000 claims description 18
- 238000009434 installation Methods 0.000 claims description 15
- 238000004590 computer program Methods 0.000 claims description 11
- 238000004458 analytical method Methods 0.000 claims description 5
- 230000000694 effects Effects 0.000 abstract description 6
- 238000012544 monitoring process Methods 0.000 abstract description 3
- 206010000117 Abnormal behaviour Diseases 0.000 description 10
- 238000010586 diagram Methods 0.000 description 8
- 230000008569 process Effects 0.000 description 7
- 230000006870 function Effects 0.000 description 5
- 238000001514 detection method Methods 0.000 description 4
- 238000011835 investigation Methods 0.000 description 3
- 230000002265 prevention Effects 0.000 description 3
- 230000002159 abnormal effect Effects 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000006399 behavior Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000008520 organization Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 230000000717 retained effect Effects 0.000 description 2
- 206010044565 Tremor Diseases 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 239000000969 carrier Substances 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000007635 classification algorithm Methods 0.000 description 1
- 238000012790 confirmation Methods 0.000 description 1
- 230000006837 decompression Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/20—Network architectures or network communication protocols for network security for managing network security; network security policies in general
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/30—Authentication, i.e. establishing the identity or authorisation of security principals
- G06F21/44—Program or device authentication
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/26—Government or public services
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L2463/00—Additional details relating to network architectures or network communication protocols for network security covered by H04L63/00
- H04L2463/146—Tracing the source of attacks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Computer Security & Cryptography (AREA)
- Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Computer Hardware Design (AREA)
- Tourism & Hospitality (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Networks & Wireless Communication (AREA)
- Software Systems (AREA)
- Signal Processing (AREA)
- Development Economics (AREA)
- Educational Administration (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- Economics (AREA)
- General Health & Medical Sciences (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- General Business, Economics & Management (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The application provides a method, a system, a device and a readable storage medium for determining illegal application programs. The determination method comprises the following steps: collecting application program information of a software distribution website; classifying the application program information by adopting a classification model to obtain suspected illegal application programs; acquiring the communication address of the suspected illegal application program and the server, acquiring a server domain name or an IP address according to the communication address, and acquiring the related company information of the suspected illegal application program according to the server domain name or the IP address; and/or acquiring operation information and development information of the suspected illegal application program from the information of the suspected illegal application program; and determining the legality of the suspected illegal application program according to the associated company information and/or the operation information and the development information. Therefore, the illegal application program can be automatically identified and determined in advance, a target is provided for public safety monitoring, and the method has an actual combat effect.
Description
Technical Field
The present application relates to the field of artificial intelligence computing, and in particular, to a method, a system, an apparatus, and a readable storage medium for determining an illegal application program.
Background
With the rapid development of economy in China, the mobile internet becomes an essential part in economic life. However, the economic abnormal behavior performed by the mobile phone APP (Application) may cause serious damage to the property safety of the target group. The crowd-involved economic abnormal behaviors involve unspecified groups and a large number of victims. With the development of the contents of mobile internet technology, block chain technology, financial innovation and the like, mass-related economic abnormal behaviors are often gimmicks launched by one or a plurality of main abnormal actors by taking websites or application programs as carriers and innovating, sharing and earning and the like, and are guided or instigated by a plurality of actor sub-actors to be cooperatively propagated together.
Because of numerous software on the application program market, the systems relate to aspects in life, including shopping, videos, information, sports, financing, games and the like, and have important significance on how to distinguish which application programs are abnormal in behavior; the method aims at detecting the application programs with abnormal behaviors which are developed and maintained by companies or individuals, is an important research and judgment means for hitting illegal application programs, and can radically hit the illegal behaviors as long as a main body of a real society is found.
The prior art lacks a very effective prevention mechanism for economically-involved types of abnormal behavior through application media, and has no good technical means to assist in the study of relevant applications.
Disclosure of Invention
The technical problem that this application mainly solves is to provide a method, system, device and computer storage medium for determining illegal application program, can automatic identification confirm illegal application program in advance, have the actual combat effect for public safety monitoring and entity investigation provide the target.
In order to solve the technical problem, the application adopts a technical scheme that: provided is a method for determining an illegal application, the method comprising:
collecting application program information of a software distribution website;
classifying the application program information by adopting a classification model to obtain suspected illegal application programs;
acquiring the communication address of the suspected illegal application program and a server, acquiring a server domain name or an IP address according to the communication address, and acquiring the related company information of the suspected illegal application program according to the server domain name or the IP address; and/or acquiring operation information and development information of the suspected illegal application program from the information of the suspected illegal application program;
and determining the legality of the suspected illegal application program according to the associated company information and/or the operation information and the development information.
Wherein the collected information of the application program comprises: the application program name, the application program description information, the application program installation package and the company to which the application program belongs.
Wherein, prior to said classifying the application information using the classification model, the method further comprises: training the classification model;
the step of training the classification model specifically comprises:
collecting a plurality of application program samples, wherein each application program sample corresponds to a classification, and the application program samples are divided into training application program samples and verification application program samples;
vectorizing and representing the text information of the application program sample, and converting the text information into 255-dimensional word vectors;
performing model training on the word vectors corresponding to the training application program samples and the classified input initial classification model;
inputting the word vector corresponding to the verification application program sample into the trained classification model, comparing the output result of the classification model with the classification corresponding to the verification application program sample, and finishing the training of the classification model if the accuracy of the output result and the classification corresponding to the verification application program sample reaches a preset threshold value.
Wherein prior to said classifying the application information using the classification model, comprising:
and performing deduplication and merging on the collected application programs according to the names of the application programs, the information-abstract values of the description information of the application programs or the information-abstract values of the installation packages of the application programs.
The method comprises the following steps of classifying the application program information by adopting a classification model to obtain a suspected illegal application program, wherein the step of obtaining the suspected illegal application program comprises the following steps:
and calculating the classification of each application program and the accuracy of the classification model as the confidence coefficient of the classification through the classification model, and outputting the classification with the highest confidence coefficient as the classification of the application program.
The steps of obtaining the communication address of the suspected illegal application program and the server, obtaining a server domain name or an IP address according to the communication address, and obtaining the related company information of the suspected illegal application program according to the server domain name or the IP address comprise:
installing the application installation package into a sandbox;
installing a packet capturing software in the sandbox, starting a packet capturing program, and inputting a communication packet into a file;
running the application program through the sandbox, and operating the application program for preset time to enable the content of the data communication of the application program to be stored in the file;
stopping the running of the application program and stopping the packet capturing program;
copying the file to the outside of the sandbox, deleting the file in the sandbox, and preparing for the next application program detection;
unpacking the file, and recording a domain name request and an address request obtained by analysis so as to obtain the domain name and the IP address;
and associating the domain name and the IP address to an entity company according to the permission record information of the application program based on the domain name and the IP address.
The step of acquiring the operation information and the development information of the suspected illegal application program from the information of the suspected illegal application program comprises the following steps:
decompressing the application program installation package to obtain a code file with a specific coding format, and further performing decompiling on the code file with the coding format to obtain a source code file;
analyzing a manifest file of the source code file to acquire an application program of the application program and component information of the application program;
acquiring identity information of the certificate user according to the certificate information of the application program;
performing regular matching on the development information of the application program and the identity information of the certificate user to obtain sensitive information of the application program, and analyzing social principal information of the application program through the sensitive information and the administrative registration information of the application program;
and acquiring the operation information and the development information of the application program according to the social subject information.
In order to solve the above technical problem, another technical solution adopted by the present application is: there is provided a system for validating an illicit application, the system comprising:
the acquisition module is used for acquiring application program information of the software distribution website;
the classification module is used for classifying the application program information by adopting a classification model to obtain a suspected illegal application program;
the acquisition module is used for acquiring the communication address of the suspected illegal application program and the server, acquiring a server domain name or an IP address according to the communication address, and acquiring the related company information of the suspected illegal application program according to the server domain name or the IP address; and/or acquiring operation information and development information of the suspected illegal application program from the information of the suspected illegal application program;
and the determining module is used for determining the legality of the suspected illegal application program according to the associated company information and/or the operation information and the development information.
In order to solve the above technical problem, another technical solution adopted by the present application is: there is provided an apparatus for validating an application program, comprising a processor coupled to a memory, and a memory storing a computer program, the processor executing the computer program when in operation to implement the method described above.
In order to solve the above technical problem, the present application adopts another technical solution: there is provided a computer readable storage medium having stored thereon a computer program for execution by a processor to perform the method described above.
The beneficial effect of this application is: different from the situation of the prior art, the method includes the steps that application program information of a software distribution website is collected firstly, then classification is carried out on the application program information through a classification model, a suspected illegal application program is obtained, the communication address of the suspected illegal application program and a server is further obtained, the domain name or the IP address of the server is obtained through the communication address, and related company information of the suspected illegal application program is obtained according to the domain name or the IP address of the server; and/or acquiring operation information and development information of the suspected illegal application program from the information of the suspected illegal application program; and finally, determining the legality of the suspected illegal application program according to the associated company information and/or the operation information and the development information. Therefore, the method and the device can find an actual operation company for the application program, provide the target for public safety detection and entity investigation, and improve the labor efficiency and the timeliness of discovery.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings can be obtained by those skilled in the art without inventive efforts, wherein:
FIG. 1 is a flowchart of a method for determining an illegal application according to an embodiment of the present application;
fig. 2 is a flowchart illustrating a method for confirming an illegal application according to an embodiment of the present application;
FIG. 3 is a schematic diagram of classification model training in an embodiment of the present application;
FIG. 4 is a flowchart illustrating another method for confirming an illegal application according to an embodiment of the present application;
FIG. 5 is a flowchart illustrating another method for confirming an illegal application according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of a system for confirming an illegal application provided by an embodiment of the present application;
fig. 7 is a schematic structural diagram of an illegal application program determining apparatus according to an embodiment of the present application;
FIG. 8 is a schematic block diagram of an embodiment of a computer-readable storage medium provided herein.
Detailed Description
The present application will be described in further detail with reference to the following drawings and examples. It is to be noted that the following examples are only illustrative of the present application, and do not limit the scope of the present application. Likewise, the following examples are only some examples and not all examples of the present application, and all other examples obtained by a person of ordinary skill in the art without any inventive step are within the scope of the present application.
The terms "first", "second" and "third" in this application are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implying any indication of the number of technical features indicated. Thus, a feature defined as "first," "second," or "third" may explicitly or implicitly include at least one of the feature. In the description of the present application, "plurality" means at least two, e.g., two, three, etc., unless explicitly specifically limited otherwise. All directional indications (such as up, down, left, right, front, and rear … …) in the embodiments of the present application are only used to explain the relative positional relationship between the components, the movement, and the like in a specific posture (as shown in the drawings), and if the specific posture is changed, the directional indication is changed accordingly. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
Referring to fig. 1, fig. 1 is a flowchart of a method for determining an illegal application according to an embodiment of the present application, and as shown in fig. 1, the method for determining an illegal application includes the following steps:
step S1: collecting application program information of a software distribution website.
The application information includes an application name, application description information, an application installation package, and a company to which the application belongs.
Step S2: and classifying the application program information by adopting a classification model to obtain the suspected illegal application program.
Step S3: acquiring the communication address of the suspected illegal application program and the server, acquiring a server domain name or an IP address according to the communication address, and acquiring the related company information of the suspected illegal application program according to the server domain name or the IP address; and/or acquiring operation information and development information of the suspected illegal application program from the information of the suspected illegal application program.
In step S3, three schemes are included:
the first scheme is that the communication address of the suspected illegal application program and the server is obtained, the domain name or the IP address of the server is obtained according to the communication address, and the related company information of the suspected illegal application program is obtained according to the domain name or the IP address of the server.
And in the second scheme, the operation information and the development information of the suspected illegal application program are obtained from the information of the suspected illegal application program.
And acquiring the communication address of the suspected illegal application program and the server, acquiring a domain name or an IP address of the server according to the communication address, acquiring the information of the related company of the suspected illegal application program according to the domain name or the IP address of the server, and acquiring the operation information and the development information of the suspected illegal application program from the information of the suspected illegal application program.
In practical applications, the selection of the above three schemes can be determined according to the situation. For example, after the suspected illegal application is obtained in step S2, the information about the company of the suspected illegal application may be obtained by the first scheme, namely, sandbox detection, and the final validity of the suspected illegal application may be determined by the information about the company of the suspected illegal application. The operation information and the development information of the suspected illegal application program can be obtained in a second scheme, namely reverse engineering, and the final legality of the suspected illegal application program is determined through the operation information and the development information of the suspected illegal application program. And the third mode is that the related company information of the suspected illegal application program is obtained firstly, and under the condition that the related company information is low in accuracy and the legality of the suspected illegal application program cannot be judged well, the operation information and the development information of the suspected illegal application program are further obtained through reverse engineering so as to finally determine the legality of the suspected illegal application program by combining the related company information, the operation information and the development information of the suspected illegal application program.
Step S4: and determining the legality of the suspected illegal application program according to the associated company information and/or the operation information and the development information.
Therefore, the application program validity can be confirmed when the application program is on line, illegal application programs can be identified and determined automatically in advance, targets are provided for public safety monitoring and entity investigation, and the application program validity method has an actual combat effect.
In step S1: consider that the application download channels differ from system to system. For example, the application downloading and installing program of the IOS system is downloaded from an apple application mall, the application of the android system can be downloaded through a plurality of channels, and the applications can be directly installed and operated after being downloaded to a mobile phone end. Can gather at the whole net through the mode of crawler in the collection process, mainly gather 4 information: the application program name, the application program description information, the application program installation package and the company to which the application program belongs, but many application programs of the company to which the application program belongs are not attached during distribution, so the company corresponding to the application program cannot be collected normally. The collected application program information is arranged to form a data table 1 with the following structure:
table 1: collected application information registration
Application name | Application description information | Application installation package | Company to which application belongs |
X Y X Y | Description of functions, etc | Downloading addresses and installation packages | Company Ltd |
Circle application | Description of functions, etc | Downloading addresses and installation packages | Electron ltd |
Finance application program | Description of functions, etc | Downloading addresses and installation packages |
Step S2: before classification is performed by adopting a classification model, the model needs to be trained. Referring to fig. 2, fig. 2 is a flowchart illustrating a method for confirming an illegal application according to an embodiment of the present application. As shown in fig. 2, the training of the classification model before step S2 includes the following steps:
step S21: the method comprises the steps of collecting a plurality of application program samples, wherein each application program sample corresponds to a classification, and dividing the application program samples into training application program samples and verification application program samples.
In this step, the manner of collecting the application program sample may be the same as the collection manner in step S1, and is not described herein again.
After the application program samples are collected from the network, the application program samples can be identified and classified by combining departments with related experience.
Step S22: vectorizing the text information of the application program sample, and converting the text information into 255-dimensional word vectors.
In order to train the classification model, the text information description of the application program sample is firstly converted into a vector representation, and then the vector representation can be identified by the classification model machine. Word2vec (a tool for converting words into vector form) from google corporation can be used for vectorized representation. Word2vec is a Word vector trained based on a neural network structure, and finally words in a text can be converted into a smaller dimension vector, such as a 255 dimension vector, so that the speed can be improved in a subsequent classification algorithm.
After the word vector is trained, the text information such as the description of the application program can be completely converted into the word vector.
Step S23: and carrying out model training on the word vectors corresponding to the training application program samples and the classification input initial classification model.
Referring to fig. 3, fig. 3 is a schematic diagram illustrating a principle of classification model training in an embodiment of the present application. As shown in fig. 3, the initial classification model is built by tensierflow (google second generation machine learning system), and the training application samples are input into the initial classification model. Find out the convolution layer, pooling layer and softmax layer, and the corresponding parameters. The initial classification model compares a result value output by the initial classification model with a pre-marked application program classification target value through a back propagation algorithm, calculates errors, and performs back propagation on the errors through a neural network, so that the change of parameters of each layer is influenced, and finally the errors between the result and the target are optimal, namely the optimal values of the parameters of each layer, and the parameters are the results of training the classification model.
Step S24: and inputting the word vector corresponding to the verification application program sample into the trained classification model, comparing the output result of the classification model with the classification corresponding to the verification application program sample, and finishing the training of the classification model if the accuracy of the output result and the classification corresponding to the verification application program sample reach a preset threshold value.
After the classification model training is finished, the verification application program samples are input into the classification model to obtain each verified data classification result, and the data classification results are compared with the pre-marked verification application program sample classification results, so that the accuracy of the classification model can be detected. And if the accuracy reaches the preset threshold, finishing the training of the classification model, and if the accuracy does not reach the preset threshold, repeating the training of the step S23 and the step S24 until the accuracy reaches the preset threshold.
After the training of the classification model is completed, the application collected in step S1 can be classified by the classification model. However, some applications may be published through multiple channels, resulting in duplication of applications for many websites. For the accuracy and efficiency of the classification result, before inputting the collected application information into the classification model, the collected application information is deduplicated and combined according to the name of the application, the information-summary (MD 5) value of the application description information or the information-summary value of the application installation package.
Specifically, in an embodiment, deduplication is performed according to names of applications, one application is reserved if the names of the applications are the same, then whether description information of the applications with the same names is the same is determined, if the description information of the applications with the same names is different, the reserved description information of the applications and the description information of the deleted applications are merged together, and if the description information of the applications is the same, the description information of the deleted applications is deleted together.
In another embodiment, the MD5 value for each application descriptor may be calculated, and if the MD5 values of the application descriptors are the same, indicating that the application is the same, only one application is retained and additional applications are deleted. The removal of the application description information is the same as that described above, and is not described herein again.
In yet another embodiment, the value of MD5 may be calculated for each application installation package, and if MD5 values are the same, only one application is retained and additional applications are deleted. The removal of the application description information is the same as that described above, and is not described herein again.
After the re-duplication and combination, the name and the description information of each application program are different, so that a large amount of calculation time is saved.
The classification model is in actual classification, and the principle is similar to that in the model training stage. And after the collected application program information is subjected to duplication removal and combination, the application program is further subjected to vectorization representation, the vectorization information is further input into a trained classification model, the classification result is calculated, the calculated result is used as the classification label of the application program, and the product of the calculation result value and the model accuracy is used as the confidence coefficient of classification.
After the classification of the application programs is completed, for the suspected illegal application programs, an operation company or a development company of the type of the application program needs to be acquired. The present application is obtained through the aforementioned step S3, and as mentioned above, the first scheme mainly includes three schemes, where the first scheme mainly obtains the address of the application program communicating with the server (e.g. the backend server) by running the application program, and obtains the domain name or the IP address of the backend server from the communicating address, thereby providing a technical means for the associated social entity company.
Referring to fig. 4 in detail, fig. 4 is a flowchart illustrating another method for confirming an illegal application according to an embodiment of the present application. As shown in fig. 4, the first scheme of step S3 includes the following sub-steps:
step S311: and installing the application installation package into the sandbox.
The application installation package may be automatically installed into an ANDROID sandbox by program controlling the ANDROID sandbox.
Step S312: and installing packet capturing software in the sandbox, starting a packet capturing program, and inputting the communication packet into the file. The software may be tcpdump and the file may be a pcap file (datagram storage format).
Step S313: and running the application program through the sandbox, and operating the application program for preset time to enable the content of the data communication of the application program to be stored in the file.
Specifically, the mouse can be controlled to click and drag the application program for 30 seconds, so that the content of the application program data communication is saved in the pcap file by the tcpdump.
Step S314: and stopping the running of the application program and stopping the packet capturing program.
Specifically, the running of the application program can be stopped by controlling the sandbox, and the tcpdump packet capture program is stopped.
Step S315: the file is copied outside the sandbox and deleted in the sandbox in preparation for the next application detection.
Step S316: unpacking the file, and recording the domain name request and the address request obtained by analysis so as to obtain the domain name and the IP address.
The domain name request and the address request may be a dns request and an http request, respectively. After the dns request and the http request are obtained through resolution, many background domain names and requests url may exist, for example, a background domain name with a tremble sdk, a background domain name with a push sdk, and a domain name with an ali pay treasure sdk, and a domain name with a general purpose sdk may have a high frequency in the domain name, and a frequency of the background domain name of a real illegal application program is very low. Therefore, the dns request and the http request are obtained through analysis, each domain name is counted, the application programs with the occurrence frequency higher than the first preset threshold value are deleted, the application programs with the occurrence frequency lower than the second preset threshold value are reserved, and the application programs between the first preset threshold value and the second preset threshold value are screened out and provided for relevant personnel to determine. Or only one threshold may be set, applications with a frequency of occurrence above the threshold are deleted, and applications with a frequency of occurrence below or equal to the threshold remain.
Step S317: the domain name and the IP address are associated to the entity company based on the domain name and the IP address according to the licensing record information of the application program.
The docketing license information may include icp docketing information and whios information.
When registering, the related department of the application program of the icp filing information requires the information reported by the enterprise, the information is related to the domain name, and the domain name also needs to be filed in the related department, so that the entity company can be found through the domain name. When the Whois information is used for accessing a website by using a domain name, the domain name needs to be resolved into IP address resolution through dns, the resolution process needs to pay, the domain name can be resolved by the dns, and an enterprise needing to use the domain name pays money to an operator, so that the specific enterprise can be located through the information.
Both the icp docket information and the whois information can be queried through services on the internet.
Through the method, the relevant company information of the actual application program, such as the operating company information, can be extracted, and the whole steps are completed based on the automatic control of the ANDROID sandbox without human participation.
After the company detection of the application is completed, if the actual operating company is not found, for example, the domain name is not recorded, the whois associated company is abroad, etc. It can be detected by reverse engineering of the application whether the application has sensitive information, such as certificate information, package, phone number, ip address, url, etc. of the application, because these information are likely to be associated with the company: the certificate information contains company information, a special short name of the company in the package, a telephone number and an ip address and a domain name, which expose the company information. That is, the second scheme of the foregoing step S2 may be executed after the first scheme is completed, or only the second scheme may be executed independently. The second scheme of step S3 is specifically described as follows:
referring to fig. 5, fig. 5 is a flowchart illustrating another method for confirming an illegal application according to an embodiment of the present application. As shown in fig. 5, the second scheme of step S3 includes the following sub-steps:
step S321: and decompressing the application program installation package to obtain a code file with a specific coding format, and further performing decompiling on the code file with the coding format to obtain a source code file.
The application program installation package is reversely decompressed through the apktool, the installation package can be reversely converted, the obtained language file, such as a java file, is in a specific coding format, such as a dex coding format, and then the dex java package is reversely compiled through a dex2jar.
Step S322: and analyzing the manifest file of the source code file to acquire the application program and the component information of the application program.
The list (manifest. xml file) after reverse engineering is analyzed, and the package and main activity information of the application program can be obtained, wherein the package and main activity information comprises component information such as activity and service.
Step S323: and acquiring the identity information of the certificate user according to the certificate information of the application program.
Specifically, the identity information of the certificate user, such as a universal name, an organization department name, an organization name, an address, a province and the like, is acquired according to the certificate information of the application program.
Step S324: and carrying out regular matching on the development information of the application program and the identity information of the certificate user to obtain sensitive information of the application program, and analyzing the social subject information of the application program through the sensitive information and the administrative registration information of the application program.
The source file and the resource file are regularly matched, sensitive information such as ip addresses, domain names and telephones is obtained, and the sensitive information is used as a further clue to be analyzed with other data, such as administrative registration information such as industrial and commercial information and ip filing information, so that social entities with abnormal behaviors can be further analyzed and judged.
Step S325: and acquiring the operation information and the development information of the application program according to the social subject information.
Based on this, application operation or development corporate cues may be discovered based on application reverse engineering.
In conclusion, the whole process of the application can provide a prevention function for the application program with abnormal behaviors. On the one hand, whether the application program has abnormal behavior is judged through the new application program which is discovered continuously, on the other hand, an actual operation company can be found for the application program, and the labor efficiency and the timeliness of discovery are improved.
The embodiment of the application also provides a system for confirming the illegal application program, which is used for executing the confirming method. Referring to fig. 6 in detail, fig. 6 is a schematic structural diagram of a system for confirming an illegal application according to an embodiment of the present application. As shown in fig. 6, the confirmation system 60 of the present embodiment includes:
and the acquisition module 61 is used for acquiring the application program information of the software distribution website.
And the classification module 62 is configured to classify the application information by using a classification model to obtain a suspected illegal application.
An obtaining module 63, configured to obtain an address of communication between the suspected illegal application program and the server, obtain a server domain name or an IP address according to the address of communication, and obtain information about an associated company of the suspected illegal application program according to the server domain name or the IP address; and/or acquiring operation information and development information of the suspected illegal application program from the information of the suspected illegal application program.
And the determining module 64 is configured to determine the validity of the suspected illegal application according to the associated company information and/or the operation information and the development information.
Optionally, the collected information of the application program includes: the application program name, the application program description information, the application program installation package and the company to which the application program belongs.
Optionally, a training module 65 is further included for training the classification model.
The collecting module 61 further collects a plurality of application program samples, each corresponding to a category, and divides the application program samples into training application program samples and verification application program samples.
Optionally, the training module 65 performs vectorization representation on the text information of the application program sample, and converts the text information into 255-dimensional word vectors; performing model training on the word vectors corresponding to the training application program samples and the classified input initial classification model; inputting the word vector corresponding to the verification application program sample into the trained classification model, comparing the output result of the classification model with the classification corresponding to the verification application program sample, and finishing the training of the classification model if the accuracy of the output result and the classification corresponding to the verification application program sample reaches a preset threshold value.
Optionally, the system further includes a deduplication module 66, configured to perform deduplication and merging on the collected application programs according to names of the application programs, MD5 values of the application program description information, or MD5 values of the application program installation package.
Optionally, the classification module 62 calculates the class of each application and the accuracy of the classification model as the confidence of the classification through the classification model, and outputs the classification with the highest confidence as the classification of the application.
Optionally, the obtaining module 63 further installs the application installation package into a sandbox; installing a packet capturing software in the sandbox, starting a packet capturing program, and inputting a communication packet into a file; running the application program through the sandbox, and operating the application program for preset time to enable the content of the data communication of the application program to be stored in the file; stopping the running of the application program and stopping the packet capturing program; copying the file to the outside of the sandbox, deleting the file in the sandbox, and preparing for the next application program detection; unpacking the file, and recording a domain name request and an address request obtained by analysis so as to obtain the domain name and the IP address; and associating the domain name and the IP address to an entity company according to the permission record information of the application program based on the domain name and the IP address.
Optionally, the obtaining module 63 further performs inverse decompression on the application program installation package to obtain a code file in a specific coding format, and further performs inverse compilation on the code file in the coding format to obtain a source code file; analyzing a manifest file of the source code file to acquire an application program of the application program and component information of the application program; acquiring identity information of the certificate user according to the certificate information of the application program; performing regular matching on the development information of the application program and the identity information of the certificate user to obtain sensitive information of the application program, and analyzing social principal information of the application program through the sensitive information and the administrative registration information of the application program; and acquiring the operation information and the development information of the application program according to the social subject information.
The embodiment of the present application further provides a device for confirming an illegal application, which is used for executing the classification method described above. Referring to fig. 7, fig. 7 is a schematic structural diagram of a classification system based on SSVEP according to an embodiment of the present application. As shown in fig. 7, the SSVEP-based classification system includes a processor 610 and a memory 620, the memory 620 stores a computer program, the processor 610 is coupled to the memory 620, and the processor 610 executes the computer program when operating to implement the method for confirming the illegal application program in any of the above embodiments.
The processor 610 may also be referred to as a Central Processing Unit (CPU). The processor 610 may be an integrated circuit chip having signal processing capabilities. The processor 610 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor, but is not limited thereto.
Referring to fig. 8, fig. 8 is a schematic block diagram of an embodiment of a computer-readable storage medium provided in the present application, in which a computer program 410 is stored, and the computer program 410 can be executed by a processor to implement the method for confirming an illegal application in any of the above embodiments.
Optionally, the readable storage medium may be various media that can store program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, or may be a terminal device such as a computer, a server, a mobile phone, or a tablet.
In summary, the whole process of the application can provide a prevention function for the application program with abnormal behaviors. On one hand, whether the application program has abnormal behavior is judged through new application programs which are discovered continuously, an actual operation company can be found for the application program, and the manual efficiency and the timeliness of discovery are improved.
The above description is only an example of the present application and is not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings, or which are directly or indirectly applied to other related technical fields, are intended to be included within the scope of the present application.
Claims (10)
1. A method for determining an illegal application, the method comprising:
collecting application program information of a software distribution website;
classifying the application program information by adopting a classification model to obtain suspected illegal application programs;
acquiring the communication address of the suspected illegal application program and a server, acquiring a server domain name or an IP address according to the communication address, and acquiring the related company information of the suspected illegal application program according to the server domain name or the IP address; and/or acquiring operation information and development information of the suspected illegal application program from the information of the suspected illegal application program;
and determining the legality of the suspected illegal application program according to the associated company information and/or the operation information and the development information.
2. The method of claim 1, wherein the collected information of the application program comprises: the application program name, the application program description information, the application program installation package and the company to which the application program belongs.
3. The method of determining according to claim 1, wherein prior to said classifying the application information using a classification model, the method further comprises: training the classification model;
the step of training the classification model specifically comprises:
collecting a plurality of application program samples, wherein each application program sample corresponds to a classification, and the application program samples are divided into training application program samples and verification application program samples;
vectorizing and representing the text information of the application program sample, and converting the text information into 255-dimensional word vectors;
performing model training on the word vectors corresponding to the training application program samples and the classified input initial classification model;
inputting the word vector corresponding to the verification application program sample into the trained classification model, comparing the output result of the classification model with the classification corresponding to the verification application program sample, and finishing the training of the classification model if the accuracy of the output result and the classification corresponding to the verification application program sample reaches a preset threshold value.
4. The method of claim 2, wherein prior to said classifying the application information using the classification model, comprising:
and performing deduplication and merging on the collected application programs according to the names of the application programs, the information-abstract values of the description information of the application programs or the information-abstract values of the installation packages of the application programs.
5. The method of claim 1, wherein the step of classifying the application information using a classification model to obtain the suspected illegal application comprises:
and calculating the classification of each application program and the accuracy of the classification model as the confidence coefficient of the classification through the classification model, and outputting the classification with the highest confidence coefficient as the classification of the application program.
6. The method according to claim 2, wherein the step of obtaining the communication address of the suspected illegal application program and the server, obtaining a server domain name or an IP address according to the communication address, and obtaining the information about the company associated with the suspected illegal application program according to the server domain name or the IP address comprises:
installing the application installation package into a sandbox;
installing a packet capturing software in the sandbox, starting a packet capturing program, and inputting a communication packet into a file;
running the application program through the sandbox, and operating the application program for preset time to enable the content of the data communication of the application program to be stored in the file;
stopping the running of the application program and stopping the packet capturing program;
copying the file to the outside of the sandbox, deleting the file in the sandbox, and preparing for the next application program detection;
unpacking the file, and recording a domain name request and an address request obtained by analysis so as to obtain the domain name and the IP address;
and associating the domain name and the IP address to an entity company according to the permission record information of the application program based on the domain name and the IP address.
7. The method according to claim 2, wherein the step of obtaining the operation information and the development information of the suspected illegal application from the information of the suspected illegal application comprises:
decompressing the application program installation package to obtain a code file with a specific coding format, and further performing decompiling on the code file with the coding format to obtain a source code file;
analyzing a manifest file of the source code file to acquire an application program of the application program and component information of the application program;
acquiring identity information of the certificate user according to the certificate information of the application program;
performing regular matching on the development information of the application program and the identity information of the certificate user to obtain sensitive information of the application program, and analyzing social principal information of the application program through the sensitive information and the administrative registration information of the application program;
and acquiring the operation information and the development information of the application program according to the social subject information.
8. A system for validating an illicit application, the system comprising:
the acquisition module is used for acquiring application program information of the software distribution website;
the classification module is used for classifying the application program information by adopting a classification model to obtain a suspected illegal application program;
the acquisition module is used for acquiring the communication address of the suspected illegal application program and the server, acquiring a server domain name or an IP address according to the communication address, and acquiring the related company information of the suspected illegal application program according to the server domain name or the IP address; and/or acquiring operation information and development information of the suspected illegal application program from the information of the suspected illegal application program;
and the determining module is used for determining the legality of the suspected illegal application program according to the associated company information and/or the operation information and the development information.
9. An apparatus for determining an illegal application, comprising a processor coupled to a memory storing a computer program and a memory, wherein the processor executes the computer program when in operation to implement the method according to any of claims 1 to 7.
10. A computer-readable storage medium, on which a computer program is stored, the computer program being executable by a processor for implementing the method according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110655002.4A CN113691492B (en) | 2021-06-11 | 2021-06-11 | Method, system, device and readable storage medium for determining illegal application program |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110655002.4A CN113691492B (en) | 2021-06-11 | 2021-06-11 | Method, system, device and readable storage medium for determining illegal application program |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113691492A true CN113691492A (en) | 2021-11-23 |
CN113691492B CN113691492B (en) | 2023-04-07 |
Family
ID=78576521
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110655002.4A Active CN113691492B (en) | 2021-06-11 | 2021-06-11 | Method, system, device and readable storage medium for determining illegal application program |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113691492B (en) |
Citations (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110072262A1 (en) * | 2009-09-23 | 2011-03-24 | Idan Amir | System and Method for Identifying Security Breach Attempts of a Website |
CN104123491A (en) * | 2014-07-18 | 2014-10-29 | 广州金山网络科技有限公司 | Method and device for detecting whether application program installation package is tempered |
CN106529293A (en) * | 2016-11-09 | 2017-03-22 | 东巽科技(北京)有限公司 | Sample classification determination method for malware detection |
CN106650439A (en) * | 2016-09-30 | 2017-05-10 | 北京奇虎科技有限公司 | Suspicious application program detection method and device |
WO2017166560A1 (en) * | 2016-03-30 | 2017-10-05 | 福建联迪商用设备有限公司 | Method and system for installing program using digital signature |
CN107590156A (en) * | 2016-07-09 | 2018-01-16 | 北京至信普林科技有限公司 | A kind of polytypic method of text based on training set cyclic extension |
CN107688743A (en) * | 2017-08-14 | 2018-02-13 | 北京奇虎科技有限公司 | The determination method and system of a kind of rogue program |
CN108664792A (en) * | 2018-05-21 | 2018-10-16 | 中国科学技术大学 | A kind of source tracing method of Android malware |
CN110213234A (en) * | 2019-04-30 | 2019-09-06 | 深圳市腾讯计算机系统有限公司 | Developer's recognition methods, device, equipment and the storage medium of application file |
CN110851624A (en) * | 2018-07-25 | 2020-02-28 | 北京搜狗科技发展有限公司 | Information query method and related device |
CN110968869A (en) * | 2019-11-22 | 2020-04-07 | 上海交通大学 | Deep learning-based large-scale malicious software classification system and method |
CN111460449A (en) * | 2020-03-10 | 2020-07-28 | 北京邮电大学 | Application program identification method, system, storage medium and electronic device |
CN111950035A (en) * | 2020-06-18 | 2020-11-17 | 中国电力科学研究院有限公司 | Method, system, equipment and storage medium for protecting integrity of apk file |
CN112084489A (en) * | 2020-09-11 | 2020-12-15 | 北京天融信网络安全技术有限公司 | Suspicious application detection method and device |
CN112257032A (en) * | 2019-10-21 | 2021-01-22 | 国家计算机网络与信息安全管理中心 | Method and system for determining APP responsibility subject |
CN112434291A (en) * | 2019-08-26 | 2021-03-02 | 中移(苏州)软件技术有限公司 | Application program identification method and device, equipment and storage medium |
US20210149788A1 (en) * | 2019-11-18 | 2021-05-20 | Microsoft Technology Licensing, Llc | Software diagnosis using transparent decompilation |
-
2021
- 2021-06-11 CN CN202110655002.4A patent/CN113691492B/en active Active
Patent Citations (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110072262A1 (en) * | 2009-09-23 | 2011-03-24 | Idan Amir | System and Method for Identifying Security Breach Attempts of a Website |
CN104123491A (en) * | 2014-07-18 | 2014-10-29 | 广州金山网络科技有限公司 | Method and device for detecting whether application program installation package is tempered |
WO2017166560A1 (en) * | 2016-03-30 | 2017-10-05 | 福建联迪商用设备有限公司 | Method and system for installing program using digital signature |
CN107590156A (en) * | 2016-07-09 | 2018-01-16 | 北京至信普林科技有限公司 | A kind of polytypic method of text based on training set cyclic extension |
CN106650439A (en) * | 2016-09-30 | 2017-05-10 | 北京奇虎科技有限公司 | Suspicious application program detection method and device |
CN106529293A (en) * | 2016-11-09 | 2017-03-22 | 东巽科技(北京)有限公司 | Sample classification determination method for malware detection |
CN107688743A (en) * | 2017-08-14 | 2018-02-13 | 北京奇虎科技有限公司 | The determination method and system of a kind of rogue program |
CN108664792A (en) * | 2018-05-21 | 2018-10-16 | 中国科学技术大学 | A kind of source tracing method of Android malware |
CN110851624A (en) * | 2018-07-25 | 2020-02-28 | 北京搜狗科技发展有限公司 | Information query method and related device |
CN110213234A (en) * | 2019-04-30 | 2019-09-06 | 深圳市腾讯计算机系统有限公司 | Developer's recognition methods, device, equipment and the storage medium of application file |
CN112434291A (en) * | 2019-08-26 | 2021-03-02 | 中移(苏州)软件技术有限公司 | Application program identification method and device, equipment and storage medium |
CN112257032A (en) * | 2019-10-21 | 2021-01-22 | 国家计算机网络与信息安全管理中心 | Method and system for determining APP responsibility subject |
US20210149788A1 (en) * | 2019-11-18 | 2021-05-20 | Microsoft Technology Licensing, Llc | Software diagnosis using transparent decompilation |
CN110968869A (en) * | 2019-11-22 | 2020-04-07 | 上海交通大学 | Deep learning-based large-scale malicious software classification system and method |
CN111460449A (en) * | 2020-03-10 | 2020-07-28 | 北京邮电大学 | Application program identification method, system, storage medium and electronic device |
CN111950035A (en) * | 2020-06-18 | 2020-11-17 | 中国电力科学研究院有限公司 | Method, system, equipment and storage medium for protecting integrity of apk file |
CN112084489A (en) * | 2020-09-11 | 2020-12-15 | 北京天融信网络安全技术有限公司 | Suspicious application detection method and device |
Also Published As
Publication number | Publication date |
---|---|
CN113691492B (en) | 2023-04-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Ma et al. | Libradar: Fast and accurate detection of third-party libraries in android apps | |
CN104123493B (en) | The safety detecting method and device of application program | |
JP7073343B2 (en) | Security vulnerabilities and intrusion detection and repair in obfuscated website content | |
CN109492395B (en) | Method, device and storage medium for detecting malicious program | |
Ünver et al. | Android malware detection based on image-based features and machine learning techniques | |
CN109271788B (en) | Android malicious software detection method based on deep learning | |
CN111639337B (en) | Unknown malicious code detection method and system for massive Windows software | |
Zhu et al. | Android malware detection based on multi-head squeeze-and-excitation residual network | |
US8875303B2 (en) | Detecting pirated applications | |
JP2018516421A (en) | Network access operation identification method, server, and storage medium | |
US11580220B2 (en) | Methods and apparatus for unknown sample classification using agglomerative clustering | |
CN108734012A (en) | Malware recognition methods, device and electronic equipment | |
Shrivastava et al. | SensDroid: analysis for malicious activity risk of Android application | |
CN112330355B (en) | Method, device, equipment and storage medium for processing consumption coupon transaction data | |
Nguyen et al. | Detecting repackaged android applications using perceptual hashing | |
CN112688966A (en) | Webshell detection method, device, medium and equipment | |
CN109690571A (en) | Group echo system and method based on study | |
CN108399321B (en) | Software local plagiarism detection method based on dynamic instruction dependence graph birthmark | |
Namrud et al. | Deep-layer clustering to identify permission usage patterns of android app categories | |
Liu et al. | Using g features to improve the efficiency of function call graph based android malware detection | |
Akram et al. | DroidMD: an efficient and scalable android malware detection approach at source code level | |
Rafiq et al. | AndroMalPack: enhancing the ML-based malware classification by detection and removal of repacked apps for Android systems | |
CN112347457A (en) | Abnormal account detection method and device, computer equipment and storage medium | |
CN110990834A (en) | Static detection method, system and medium for android malicious software | |
Chau et al. | An entropy-based solution for identifying android packers |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |