CN114416600B - Application detection method and device, computer equipment and storage medium - Google Patents

Application detection method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN114416600B
CN114416600B CN202210314992.XA CN202210314992A CN114416600B CN 114416600 B CN114416600 B CN 114416600B CN 202210314992 A CN202210314992 A CN 202210314992A CN 114416600 B CN114416600 B CN 114416600B
Authority
CN
China
Prior art keywords
application
similarity
target
similar
applications
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210314992.XA
Other languages
Chinese (zh)
Other versions
CN114416600A (en
Inventor
卢扬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202210314992.XA priority Critical patent/CN114416600B/en
Publication of CN114416600A publication Critical patent/CN114416600A/en
Application granted granted Critical
Publication of CN114416600B publication Critical patent/CN114416600B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3604Software analysis for verifying properties of programs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/44Program or device authentication

Abstract

The application discloses an application detection method, an application detection device, computer equipment and a storage medium, and belongs to the technical field of computers. The method comprises the following steps: acquiring attribute information and downloading information of a plurality of applications, wherein the plurality of applications comprise target applications and reference applications; determining a first similar characteristic between the attribute information of the target application and the attribute information of the reference application, and a second similar characteristic between the download information of the target application and the download information of the reference application; combining the first similar features and the second similar features to obtain similar vectors, and classifying the similar vectors to obtain the similarity between the target application and the reference application; in a case where the similarity satisfies the similarity condition, it is determined that the target application belongs to an application type to which the reference application belongs. The method simultaneously considers the attribute information and the download information of the application, thereby enriching the dimensionality of the similarity degree between the applications and improving the accuracy of detecting the application type to which the application belongs.

Description

Application detection method and device, computer equipment and storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to an application detection method and apparatus, a computer device, and a storage medium.
Background
With the rapid development of computer technology, the application types are more and more abundant, and great convenience is brought to the life of people. However, these applications include non-secure applications, which may cause loss to the user if installed. Therefore, how to detect the unsafe application is of great significance.
In the related art, in order to detect whether a certain application belongs to an insecure application, an application certificate of the application needs to be acquired, and if the determined application certificate of the certain insecure application is the same as the application certificate of the application, the application is determined as the insecure application.
Disclosure of Invention
The embodiment of the application detection method and device, the computer equipment and the storage medium can improve the accuracy of application detection. The technical scheme comprises the following aspects.
In one aspect, an application detection method is provided, and the method includes:
acquiring attribute information and downloading information of a plurality of applications, wherein the attribute information is information for describing the applications, the downloading information is information related to downloading the applications, and the plurality of applications comprise target applications to be detected and reference applications of determined application types;
Determining a first similarity characteristic between the attribute information of the target application and the attribute information of the reference application, and a second similarity characteristic between the download information of the target application and the download information of the reference application;
combining the first similar features and the second similar features to obtain similar vectors between the target application and the reference application, and classifying the similar vectors to obtain the similarity between the target application and the reference application;
determining that the target application belongs to the application type to which the reference application belongs if the similarity between the target application and the reference application satisfies a similarity condition.
In another aspect, an application detection apparatus is provided, the apparatus including:
the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring attribute information and download information of a plurality of applications, the attribute information is information used for describing the applications, the download information is information related to downloading the applications, and the plurality of applications comprise target applications to be detected and reference applications of which the application types are determined;
the similarity determining module is used for determining a first similar characteristic between the attribute information of the target application and the attribute information of the reference application and a second similar characteristic between the downloading information of the target application and the downloading information of the reference application;
The similarity determining module is further configured to combine the first similar features and the second similar features to obtain similar vectors between the target application and the reference application, and classify the similar vectors to obtain a similarity between the target application and the reference application;
a type determining module, configured to determine that the target application belongs to the application type to which the reference application belongs if a similarity between the target application and the reference application satisfies a similarity condition.
Optionally, the attribute information and the download information each include a plurality of information items; the type determining module is configured to determine that the target application belongs to the application type to which the reference application belongs when the target application and the reference application have an association relationship and a similarity between the target application and the reference application reaches a second threshold, where the association relationship refers to having the same information item.
Optionally, the reference application comprises a first application and a second application, the target application and the first application do not have the association relationship therebetween, and the target application and the second application have the association relationship therebetween;
The similarity determining module is configured to determine a first similar characteristic between the attribute information of the target application and the attribute information of the second application, and a second similar characteristic between the download information of the target application and the download information of the second application;
the type determining module is configured to determine that the target application belongs to the application type to which the second application belongs when the similarity between the target application and the second application reaches the second threshold.
Optionally, the similarity determining module is further configured to determine a similarity between the first application and the second application, where an application type to which the first application belongs is already determined, an application type to which the second application belongs is not yet determined, and the first application and the second application have the association relationship therebetween;
the type determining module is further configured to determine that the second application belongs to the application type to which the first application belongs if the similarity between the first application and the second application reaches the second threshold.
Optionally, the number of the reference applications is multiple, and the application types of the multiple reference applications are the same;
The type determination module comprises:
a polymerization degree determining unit, configured to determine a polymerization degree between the target application and each two of the multiple reference applications based on a similarity degree between the target application and each two of the multiple reference applications, where the polymerization degree is positively correlated with the similarity degree between each two of the multiple reference applications;
a type determining unit, configured to determine that the target application belongs to an application type to which a plurality of the reference applications belong, if the aggregation degree reaches a first threshold.
Optionally, the attribute information includes information items of at least one first dimension, the first similar feature includes a first feature value of the at least one first dimension, the download information includes information items of at least one second dimension, and the second similar feature includes a second feature value of the at least one second dimension; the similarity determination module comprises:
a splicing unit, configured to combine the first eigenvalue in the at least one first dimension and the second eigenvalue in the at least one second dimension according to an order of the at least one first dimension and the at least one second dimension, so as to obtain the similarity vector between the target application and the reference application.
Optionally, the similarity determining module includes:
the classification unit is used for calling a similar vector classification model and classifying the similar vectors to obtain the similarity between the target application and the reference application;
wherein the apparatus further comprises a model training module configured to:
obtaining a sample similarity vector between a first sample application and a second sample application, and a sample similarity between the first sample application and the second sample application;
calling the similar vector classification model to classify the sample similar vectors to obtain the prediction similarity between the first sample application and the second sample application;
training the similarity vector classification model based on the prediction similarity and the sample similarity.
Optionally, the apparatus further comprises:
the receiving module is used for receiving an application detection request sent by a terminal, wherein the application detection request comprises attribute information of a third application;
a message sending module, configured to send a notification message to the terminal when any application that is the same as the attribute information of the third application is found in an application set, where the application set includes multiple applications that belong to the application type to which the reference application belongs, and the notification message is used to notify that the third application belongs to the application type corresponding to the application set.
Optionally, the similarity determining module includes at least one of:
a first determining unit, configured to determine, in a case that the attribute information includes an installation package identifier, a first similar feature between the installation package identifier of the target application and the installation package identifier of the reference application;
a second determining unit configured to determine, if the attribute information includes an application certificate, a first similar feature between the application certificate of the target application and the application certificate of the reference application;
a third determining unit, configured to determine, if the attribute information includes an application identifier, a first similar feature between the application identifier of the target application and the application identifier of the reference application;
a fourth determining unit, configured to determine, in a case that the attribute information includes an installation package size, a first similar feature between the installation package size of the target application and the installation package size of the reference application.
Optionally, the first determining unit is configured to perform at least one of the following:
determining the first similar characteristic based on an editing distance between the installation package identifiers of the target application and the reference application, wherein the editing distance refers to the number of characters required to be modified by modifying one installation package identifier into another installation package identifier;
Determining the first similar characteristic based on the lengths of the same character strings in the installation package identifications of the target application and the reference application;
determining the first similar characteristics based on the number of the same fields in the installation package identifiers of the target application and the reference application;
and determining the first similar characteristics based on the number of fields which belong to the same structure in the installation package identifiers of the target application and the reference application.
Optionally, the second determining unit is configured to:
determining the first similar feature based on a certificate heat degree of the application certificate under the condition that the target application is the same as the application certificate of the reference application, wherein the first similar feature is negatively related to the certificate heat degree, and the certificate heat degree refers to the number of applications with the application certificate;
when the application certificates of the target application and the reference application are different and the difference value of the certificate heat degrees is smaller than a third threshold value, determining the first similar characteristic based on the maximum value of the certificate heat degrees corresponding to the target application and the reference application, wherein the first similar characteristic is negatively related to the maximum value;
And determining a target numerical value as the first similar characteristic under the condition that the application certificates of the target application and the reference application are different and the difference value of the certificate popularity is not less than the third threshold value.
Optionally, the third determining unit is configured to perform at least one of the following:
determining the first similar characteristic based on the lengths of the same character strings in the application identifications of the target application and the reference application;
determining the first similar feature based on a classification result corresponding to the application identifier of the target application and the reference application, wherein the classification result corresponding to the application identifier represents the possibility that the application identifier belongs to each of a plurality of identifier types.
Optionally, the similarity determining module includes at least one of:
a fifth determining unit, configured to determine, when the download information includes a download domain name, a second similar feature between the download domain name of the target application and the download domain name of the reference application, where the download domain name is a domain name included in a download link;
a sixth determining unit, configured to determine, if the download information includes a device identifier of a target device, a second similar characteristic between the device identifier corresponding to the target application and the device identifier corresponding to the reference application, where the target device includes a target number of devices that are installed earlier in time among a plurality of devices in which the application is installed.
Optionally, the fifth determining unit is configured to determine the second similar characteristic based on the number of the same downloaded domain names of the target application and the reference application.
Optionally, the sixth determining unit is configured to:
determining a first numerical value as the second similar characteristic in the case that at least one identical device identification exists for the target application and the reference application;
determining a second value as the second similar characteristic in the absence of the same device identification for the target application and the reference application, the first value being greater than the second value.
In another aspect, a computer device is provided, the computer device comprising a processor and a memory, the memory having stored therein at least one computer program, the at least one computer program being loaded and executed by the processor to perform the operations performed by the application detection method according to the above aspect.
In another aspect, a computer-readable storage medium is provided, in which at least one computer program is stored, the at least one computer program being loaded and executed by a processor to implement the operations performed by the application detection method according to the above aspect.
In another aspect, a computer program product is provided, comprising a computer program, which is loaded and executed by a processor to perform the operations performed by the application detection method according to the above aspect.
According to the method, the device, the computer equipment and the storage medium provided by the embodiment of the application, the first similar feature and the second similar feature are mined from the two dimensions of the attribute information and the download information of the application respectively, then the similarity between the two applications is judged by using the similar vector obtained by combining the first similar feature and the second similar feature, and if the similarity of the two applications meets the similarity condition, the application types of the two applications are considered to be the same, so that the application type of the target application can be detected by means of the reference application by adopting the method. According to the method, the attribute information and the downloading information of the application are simultaneously considered, so that the dimensionality of the similarity degree between the applications is enriched, and the accuracy of detecting the application type of the application belongs to the method.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings required to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the description below are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings may be obtained according to the drawings without creative efforts.
FIG. 1 is a schematic diagram of an implementation environment provided by an embodiment of the present application;
fig. 2 is a flowchart of an application detection method provided in an embodiment of the present application;
FIG. 3 is a flow chart of another application detection method provided in an embodiment of the present application;
fig. 4 is a flowchart of application detection based on an application association network according to an embodiment of the present application;
fig. 5 is a flowchart of a pruning method provided in an embodiment of the present application;
FIG. 6 is a schematic diagram of one-degree diffusion provided by embodiments of the present application;
FIG. 7 is a flow chart of a diffusion method provided by an embodiment of the present application;
fig. 8 is a schematic diagram of an application association network according to an embodiment of the present application;
fig. 9 is a schematic diagram of another application association network provided in an embodiment of the present application;
fig. 10 is a schematic diagram of another application association network provided in an embodiment of the present application;
fig. 11 is a schematic diagram of another application association network provided in an embodiment of the present application;
fig. 12 is a schematic diagram of another application association network provided in an embodiment of the present application;
FIG. 13 is a flowchart of another application detection method provided in the embodiments of the present application;
FIG. 14 is a flowchart of another application detection method provided in the embodiments of the present application;
FIG. 15 is a schematic diagram of a download interface provided by an embodiment of the present application;
FIG. 16 is a schematic diagram of determining similarity vectors according to an embodiment of the present application;
FIG. 17 is a flowchart of another application detection method provided in the embodiments of the present application;
FIG. 18 is a flow chart of yet another application detection method provided by an embodiment of the present application;
FIG. 19 is a schematic structural diagram of an application detection apparatus according to an embodiment of the present application;
FIG. 20 is a schematic structural diagram of another application detection apparatus provided in the embodiments of the present application;
fig. 21 is a schematic structural diagram of a terminal according to an embodiment of the present application;
fig. 22 is a schematic structural diagram of a server according to an embodiment of the present application.
Detailed Description
To make the objects, technical solutions and advantages of the embodiments of the present application more clear, the embodiments of the present application will be further described in detail with reference to the accompanying drawings.
It will be understood that the terms "first," "second," and the like as used herein may be used herein to describe various concepts, which are not limited by these terms unless otherwise specified. These terms are only used to distinguish one concept from another. For example, a first application may be referred to as a second application, and similarly, a second application may be referred to as a first application, without departing from the scope of the present application.
For example, the at least one application may be an integer number of applications greater than or equal to one, such as one application, two applications, three applications, and the like. The plurality means two or more, and for example, the plurality of applications may be an integer number of applications of two or more, three or more, and the like. Each refers to each of at least one, for example, each application refers to each of a plurality of applications, and if the plurality of applications is 3 applications, each application refers to each of the 3 applications.
It should be noted that, in the embodiments of the present application, related data such as attribute information and download information are involved, when the above embodiments of the present application are applied to specific products or technologies, user permission or consent needs to be obtained, and the collection, use and processing of related data need to comply with related laws and regulations and standards of related countries and regions.
The application detection method provided by the embodiment of the application is executed by computer equipment. Optionally, the computer device is a server, and the server may be an independent physical server, or a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a CDN (Content Delivery Network), and a big data and artificial intelligence platform, which is not limited herein in this embodiment of the present application. Optionally, the computer device is a terminal, and the terminal may be a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, or a vehicle-mounted terminal, and the embodiment of the present application is not limited herein.
Fig. 1 is a schematic diagram of an implementation environment provided by an embodiment of the present application, and referring to fig. 1, the implementation environment includes: a server 101 and a terminal 102. The server 101 and the terminal 102 are connected through a wireless or wired network.
The server 101 is configured to detect an application type to which an application belongs based on attribute information and download information of the application by using the method provided in the embodiment of the present application, and then form an application set with a plurality of applications belonging to the same application type. The subsequent terminal 102 sends an application detection request for a certain application to the server 101, and if the server 101 finds any application with the same attribute information as the application in the application set, the terminal 102 is notified that the application belongs to the application type corresponding to the application set.
In one possible implementation, the terminal 102 has an application installed thereon, which is served by the server 101 and has a function of downloading the application or installing the application. Optionally, the application is an application in an operating system of the terminal 102, or an application provided by a third party. For example, the application is a browser or a resource download application, etc.
The application detection method provided by the embodiment of the application can be applied to any scene of application detection.
For example, in the context of detecting non-security applications. When it is required to detect whether the application a is an insecure application, the computer device obtains attribute information and download information of the application a, and then the computer device obtains attribute information and download information of the application B that has been determined to be an insecure application, and with the method provided by the embodiment of the present application, based on the attribute information and download information of the application a and the attribute information and download information of the application B, a similarity between the application a and the application B is determined, and if the similarity between the application a and the application B is sufficiently large, it can be considered that the application type to which the application a belongs is the same as the application type to which the application B belongs, so that the application a is determined to be an insecure application.
Fig. 2 is a flowchart of an application detection method provided in an embodiment of the present application, where the method is executed by a computer device, and referring to fig. 2, the method includes the following steps.
201. The computer device obtains attribute information and download information of a plurality of applications.
The attribute information of the application is information for describing the application, and for example, the attribute information includes information configured for the application in the process of developing the application, and the like. The download information of the application is information related to downloading the application, for example, the download information includes information acquired in the process of downloading the application, and the like.
The computer equipment acquires attribute information and downloading information of a plurality of applications, wherein the plurality of applications comprise target applications to be detected and reference applications of which the application types are determined. Optionally, the application type includes a secure application and a non-secure application.
202. The computer device determines a first similarity characteristic between the attribute information of the target application and the attribute information of the reference application, and a second similarity characteristic between the download information of the target application and the download information of the reference application.
After acquiring the attribute information and the download information of the target application and the attribute information and the download information of the reference application, the computer device compares the information of the target application and the information of the reference application with the same dimensionality from two different dimensionalities of the attribute information and the download information.
That is, the computer device obtains a first similar feature between the attribute information of the target application and the attribute information of the reference application by comparing the attribute information of the target application and the attribute information of the reference application, where the first similar feature represents a degree of similarity between the attribute information of the target application and the attribute information of the reference application. The computer equipment compares the target application with the downloading information of the reference application in the downloading dimension to obtain a second similarity characteristic between the downloading information of the target application and the downloading information of the reference application, wherein the second similarity characteristic represents the similarity degree between the downloading information of the target application and the downloading information of the reference application, so that the similarity degree between the target application and the reference application is mined in two different dimensions.
203. The computer device combines the first similar features and the second similar features to obtain a similar vector between the target application and the reference application.
After obtaining the first similar characteristic and the second similar characteristic, the computer device combines the first similar characteristic and the second similar characteristic to obtain a similar vector, and the similar vector comprises similar characteristics of attribute information dimensionality and similar characteristics of download information dimensionality.
Since the first similar feature and the second similar feature respectively represent the similarity degree between the information of the target application and the information of the reference application in different dimensions, the combined similar vector includes similar features between the information in different dimensions.
204. And the computer equipment classifies the similar vectors to obtain the similarity between the target application and the reference application.
Because the similarity vector contains similar features among information in different dimensions, the computer equipment classifies the similarity vector to obtain the similarity represented by the similarity vector, and the similarity is the similarity between the target application and the reference application. The similarity represents the similarity between the target application and the reference application, which is equivalent to performing similarity determination on the target application and the reference application based on the similarity vector, so as to obtain the similarity between the target application and the reference application.
205. The computer device determines that the target application belongs to the application type to which the reference application belongs, in a case where a similarity between the target application and the reference application satisfies a similarity condition.
In the embodiment of the present application, since the application type to which the reference application belongs is already determined, the reference application may be used as a reference standard, and if the similarity between the target application and the reference application satisfies a similarity condition, it is described that the attribute information of the target application is similar to the attribute information of the reference application, and the download information of the target application is also similar to the download information of the reference application, that is, the information in different dimensions is similar, it may be considered that the application type to which the target application belongs is the same as the application type to which the reference application belongs, and therefore the computer device determines that the target application belongs to the application type to which the reference application belongs, thereby implementing detection on the application type to which the target application belongs.
The similarity satisfying the similarity condition is greater than the similarity not satisfying the similarity condition, so that the application type to which the target application belongs is considered to be the same as the application type to which the reference application belongs when the similarity between the target application and the reference application is relatively large.
According to the method provided by the embodiment of the application, the first similar feature and the second similar feature are respectively mined from two dimensions of the attribute information and the download information of the application, then the similarity between the two applications is judged by using the similar vector obtained by combining the first similar feature and the second similar feature, if the similarity of the two applications meets the similarity condition, the application types of the two applications are considered to be the same, and therefore the application type of the target application can be detected by means of the reference application. The method simultaneously considers the attribute information and the downloading information of the application, so that the dimensionality of the similarity degree between the applications is enriched, and the accuracy of detecting the application type of the application belongs to the method.
Fig. 3 is a flowchart of another application detection method, which is executed by a computer device and is based on the above-mentioned embodiment of fig. 2, and this embodiment of the present application specifically illustrates that whether an association relationship exists between a target application and a reference application is also considered when detecting an application type to which the target application belongs. Referring to fig. 3, the method includes the following steps.
301. The computer device obtains attribute information and download information of a plurality of applications.
In the embodiment of the application, the attribute information and the download information both include a plurality of information items. Optionally, the attribute information includes a plurality of information items with different dimensions, for example, the plurality of information items include an installation package identifier, an application certificate, an application identifier, an installation package size, and the like, and optionally, the download information includes a plurality of information items with different dimensions, for example, the plurality of information items include a download domain name, a device identifier of the target device, and the like.
The installation Package identifier is also called a Package name (Package name) of the application, and is used for indicating the installation Package of the application. Optionally, the installation package identifier is an english character string, for example, the installation package identifier has a structure of "com. The application certificate is used for verifying the integrity of an installation package of an application, for example, the application certificate is a file obtained by performing digest extraction on the installation package and then encrypting the digest extraction. Alternatively, the application certificate may include information of an application developer, and the like. The application identifier is used to indicate an application, for example, the application identifier is an application name corresponding to the application, and different applications may have the same application name. The installation package size refers to the size of the installation package of the application, and for example, the installation package size of the application is 386703220 bytes. The download domain name refers to a domain name included in the download link, for example, if the download link of the target application is "http:/123456 abcde. The target devices corresponding to the application include a target number of devices whose installation time is earlier among the plurality of devices where the application is installed. For example, the target device corresponding to the application includes the first 5 devices whose installation time is earlier.
The plurality of applications comprise a target application, a first application and a second application, wherein the target application is an application to be detected, namely the application type of the target application is not determined, the application type of the first application is determined, and the application type of the second application is not determined.
Wherein, having an association relationship between two applications means that the two applications have the same information item. The first application and the second application have an association relationship, the target application and the first application do not have an association relationship, and the target application and the second application have an association relationship. That is, the first application and the second application have at least one identical information item, the target application and the first application do not have the same information item, and the target application and the second application have at least one identical information item.
In a possible implementation manner, the attribute information and the download information of the multiple applications are reported by the terminal, and the attribute information and the download information of the applications are obtained by the terminal in an installation package of the applications. Taking a target application as an example, after a terminal downloads an installation package of the target application, before the target application is installed through the installation package, the installation package is analyzed, so that attribute information and download information in the installation package are obtained, and then the attribute information and the download information of the target application are reported to a computer device, so that a subsequent computer device can detect the target application based on the attribute information and the download information of the target application.
In the related art, in order to determine that a certain application is detected, an installation package of the application needs to be acquired, and the application type to which the application belongs is detected by using the installation package of the application.
In the embodiment of the application, the application detection method only depends on the attribute information and the download information of the application, and the attribute information and the download information can be reported to the computer equipment very conveniently by the terminal when the application is installed, so that the acquisition difficulty of the attribute information and the download information is low, and the attribute information and the download information can be collected and recorded in the computer equipment more, so that the coverage of the application detection method is high, and the number of detectable applications is large.
In addition, because the method of the embodiment of the application does not depend on the installation package of the application, the time for recording the installation package of the application is not needed, and the attribute information and the download information reported by the terminal can be directly used for detection.
302. The computer device determines a similarity between the first application and the second application.
In the embodiment of the present application, it is considered that the probability of similarity between two applications having an association relationship is higher, and therefore, in order to reduce the amount of calculation, only the similarity between two applications having an association relationship is determined, and the similarity is further determined to determine whether the application types to which the two applications belong are the same. Since the application type to which the first application belongs has been determined, the application type to which the second application belongs has not been determined, and the first application and the second application have an association relationship therebetween, the computer device may determine a similarity between the first application and the second application, so as to determine whether the application type to which the second application belongs is the same as the application type to which the first application belongs according to the similarity.
The computer equipment determines a third similar feature between the attribute information of the first application and the attribute information of the second application and a fourth similar feature between the downloaded information of the first application and the downloaded information of the second application, combines the third similar feature and the fourth similar feature to obtain a similar vector between the first application and the second application, and classifies the similar vector between the first application and the second application to obtain a similarity between the first application and the second application. The process of determining the similarity between the first application and the second application is the same as the process of determining the similarity between the target application and the reference application in the above steps 202 to 204, and is not repeated herein.
303. The computer device determines that the second application belongs to the application type to which the first application belongs, in a case where a similarity between the first application and the second application reaches a second threshold.
The computer device judges whether the similarity between the first application and the second application reaches a second threshold value, if the similarity between the first application and the second application reaches the second threshold value, the similarity between the first application and the second application is high enough, the application types of the first application and the second application are considered to be the same, and therefore the second application is determined to belong to the application type of the first application. If the similarity between the first application and the second application does not reach the second threshold, it indicates that the similarity between the first application and the second application is not high enough, and the types of applications to which the first application and the second application belong may be considered to be different.
Optionally, the second threshold is a threshold preset by the computer device, for example, if the value range of the similarity between the first application and the second application is 0 to 1, the second threshold may be 0.7 or 0.8, and the like. For another example, if the similarity between the first application and the second application is 0 or 1, the second threshold may be 1.
Optionally, the application types include a first type and a second type, the first type being different from the second type, for example, the first type is a non-secure application, and the second type is a secure application. Wherein the first application is of a first type. If the similarity between the first application and the second application reaches a second threshold, determining that the application type of the second application is the same as the application type to which the first application belongs, namely that the second application also belongs to the first type, and if the similarity between the first application and the second application does not reach the second threshold, determining that the application type of the second application is different from the application type to which the first application belongs, namely that the second application belongs to the second type.
304. The computer device determines a first similarity characteristic between the attribute information of the target application and the attribute information of the second application, and a second similarity characteristic between the download information of the target application and the download information of the second application.
It should be noted that, in the embodiment of the present application, the application type to which the first application belongs is already determined, so that the first application is also the reference application, and after the computer device executes the above steps 302 to 303, the application type to which the second application belongs is also determined, so that the second application can also be used as the reference application. Moreover, since the target application has no association with the first application and the target application has an association with the second application, considering that the likelihood of similarity between the target application and the second application is higher, in order to reduce the amount of calculation, only the similarity between the target application having an association and the second application is determined.
305. And the computer equipment combines the first similar characteristics and the second similar characteristics to obtain similar vectors between the target application and the second application, and classifies the similar vectors to obtain the similarity between the target application and the second application.
306. The computer device determines that the target application belongs to the application type to which the second application belongs when the similarity between the target application and the second application reaches a second threshold.
The process of determining that the target application belongs to the application type to which the second application belongs in steps 304 to 306 is the same as the process of determining that the target application belongs to the application type to which the reference application belongs in steps 202 to 205, and details are not repeated here.
In the embodiment of the application, under the condition that the two applications have the association relationship and the similarity between the two applications reaches the second threshold, the application types of the two applications are determined to be the same, and the applications belonging to the same application type are mined from two different angles of the association relationship and the similarity, so that the condition for judging that the two applications belong to the same application type is severer, and the accuracy of application detection is improved.
And in the case of only determining the application type to which the first application belongs, determining the application type to which the second application belongs according to the association relationship and the similarity between the second application and the first application, and although the target application and the first application do not have the association relationship, determining the application type to which the target application belongs according to the association relationship and the similarity between the target application and the second application of which the application type has been determined, which is equivalent to directly determining the application type to which the second application having the association relationship belongs according to the first application and indirectly determining the application type to which the target application not having the association relationship belongs, thereby realizing diffused application detection, mining more applications belonging to the same application type, and improving the coverage capability of application detection.
It should be noted that, in the embodiment of the present application, only the similarity between two applications having an association relationship is determined as an example. In another embodiment, it may be considered whether the target application and the reference application have an association relationship, that is, the first application and the second application in the reference application are not distinguished, and based on the embodiment of fig. 2, the step 204 is replaced by the following steps: and determining that the target application belongs to the application type to which the reference application belongs if the target application and the reference application have an incidence relation and the similarity between the target application and the reference application reaches a second threshold value. That is, similar conditions in step 204 above are: the target application and the reference application have an association relationship, and the similarity between the target application and the reference application reaches a second threshold.
According to the method provided by the embodiment of the application, the first similar feature and the second similar feature are respectively mined from two dimensions of the attribute information and the download information of the application, then the similarity between the two applications is judged by using the similar vector obtained by combining the first similar feature and the second similar feature, if the similarity of the two applications meets the similarity condition, the application types of the two applications are considered to be the same, and therefore the application type of the target application can be detected by means of the reference application. The method simultaneously considers the attribute information and the downloading information of the application, so that the dimensionality of the similarity degree between the applications is enriched, and the accuracy of detecting the application type of the application belongs to the method.
And in addition, under the condition that the two applications have the association relationship and the similarity between the two applications reaches a second threshold value, the application types of the two applications are determined to be the same, and the applications belonging to the same application type are mined from two different angles of the association relationship and the similarity, so that the condition for judging that the two applications belong to the same application type is severer, and the rigidness and the accuracy of application detection are favorably improved.
And in the case of only determining the application type to which the first application belongs, determining the application type to which the second application belongs according to the association relationship and the similarity between the second application and the first application, and although the target application and the first application do not have the association relationship, determining the application type to which the target application belongs according to the association relationship and the similarity between the target application and the second application of which the application type has been determined, which is equivalent to directly determining the application type to which the second application having the association relationship belongs according to the first application and indirectly determining the application type to which the target application having no association relationship belongs, thereby realizing diffused application detection, mining more applications of the same type, and improving the coverage capability of application detection.
In addition, considering that the possibility of similarity between two applications with an association relationship is higher, only the similarity between the two applications with the association relationship is determined, and the similarity is further judged to determine whether the application types of the two applications are the same or not, so that the calculation amount for determining the similarity between the applications is reduced.
In addition, because the application detection method only depends on the attribute information and the download information, and the attribute information and the download information can be reported to the computer device very conveniently by the terminal when the application is installed, the acquisition difficulty of the attribute information and the download information is low, and the computer device is not required to record the installation package of the application, therefore, compared with a method for detecting by using the installation package of the application, the method has lower dependence degree on the installation package, and thus the feasibility of the method is higher.
The embodiment of fig. 3 illustrates the process of detecting applications according to the association relationship and similarity between the applications from the perspective of one target application. In practical applications, in order to improve the efficiency of application detection, an application association network may be constructed based on a plurality of applications, where the application association network includes a plurality of applications and association relationships and similarities between the applications. Fig. 4 is a flowchart of application detection based on an application association network according to an embodiment of the present application, where the method is executed by a computer device, and as shown in fig. 4, the method includes the following steps.
401. The computer device constructs an application association network.
The computer device acquires attribute information and download information of a plurality of applications, wherein the attribute information and the download information comprise a plurality of information items, and the plurality of applications comprise a first application of which the application type is determined to belong. The computer device creates an application association network based on the attribute information and the download information of the plurality of applications.
For each two applications of the plurality of applications, the computer device creates an association between the two applications where the two applications have the same information item. And then the computer equipment forms the application association network by the plurality of applications and the association relation among the applications.
Wherein the application association network is shown as the application association network 411 in fig. 4, and the application association network 411 is composed of nodes and edges. In the application association network 411, nodes are used to represent applications, for example, the nodes are download information and attribute information of the applications, an edge between two nodes is used to represent that the applications corresponding to the two nodes have an association relationship, and if there is no edge between two nodes, it represents that the applications corresponding to the two nodes have no association relationship.
Optionally, the plurality of information items included in the attribute information are an installation package identifier, an application certificate, an application identifier, and an installation package size, respectively, and the plurality of information items included in the download information are a download domain name and a device identifier of the target device, respectively. An association between two applications is created if any of the following exists between the two applications: (1) the two applications have the same installation package identification; (2) both applications have the same application certificate; (3) the two applications have the same application identification; (4) both applications have at least one same download domain name; (5) both applications have the same device identification with at least one identical target device.
402. The computer device determines the similarity between every two applications with the association relationship in the application association network.
In step 401, an application association network is created, but two applications have an association relationship therebetween, and are not represented as similar applications, so that the similarity between every two applications having an association relationship needs to be determined. For example, the similarity value is 0 or 1, if the similarity between two applications is 1, it indicates that the two applications are similar, and if the similarity between the two applications is 0, it indicates that the two applications are dissimilar.
403. And the computer equipment deletes the association relation between every two applications of which the similarity does not reach the second threshold value in the application association network.
The computer device judges whether the similarity between every two applications with the association relation reaches a second threshold value, if the similarity between the two applications reaches the second threshold value, the similarity between the two applications is high enough, and the association relation between the two applications is reserved. And if the similarity between the two applications does not reach the second threshold, indicating that the similarity between the two applications is not high enough, deleting the association relationship between the two applications. For example, if the similarity value is 0 or 1, the association relationship between the two applications with the similarity value of 0 is deleted, and the association relationship between the two applications with the similarity value of 1 is retained.
The process is also called "pruning", that is, edges with similarity not reaching the expectation in the associated network are pruned, and only edges between nodes with the same information items and high similarity are reserved. Fig. 7 is a flowchart of a pruning method provided in an embodiment of the present application, and as shown in fig. 5, for a similarity corresponding to each edge in an application association network, a computer device determines whether the similarity is equal to 1, if the similarity corresponding to the edge is equal to 1, the edge is retained, and if the similarity corresponding to the edge is equal to 0, the edge is deleted, so that the application association network is updated, and an updated application association network is obtained.
The updated application association network is shown as the application association network 413 in fig. 4, where only the edge between every two nodes with the similarity reaching the second threshold is reserved in the application association network 413, and the edge between every two nodes with the similarity not reaching the second threshold is deleted.
404. And the computer equipment performs diffusion by taking the first application as a starting point in the updated application associated network, and determines that the application types of the multiple diffused applications are the same as the application type of the first application.
In this embodiment of the present application, the updated application association network includes the first application, and if the application type to which the first application belongs is determined, the flooding is performed with the first application as a starting point. Diffusion in the embodiment of the present application refers to finding applications having an association relationship. For example, the first application is subjected to first degree diffusion and second degree diffusion, and the applications obtained by the first degree diffusion and the second degree diffusion are determined to be the same as the types of the applications to which the first application belongs. For ease of explanation, the first application will be referred to hereinafter as a target node in the application association network.
The one-degree diffusion is to use a target node as a diffusion center to search for a node having an association relation with the target node, the searched node is called a one-degree diffusion node of the target node, and the one-degree diffusion node of the target node is also a neighbor node of the target node. Fig. 6 is a schematic diagram of one-degree diffusion provided in the embodiment of the present application, and as shown in fig. 6, an application 1 is taken as a diffusion center, applications having an association relationship with the application 1 are found as an application 1-1, an application 1-2, an application 1-3, and an application 1-4, this process is one-degree diffusion of the application 1, and the application 1-1, the application 1-2, the application 1-3, and the application 1-4 are one-degree diffusion nodes of the application 1.
The second degree diffusion is that after a first degree diffusion node of a target node is determined, the first degree diffusion node is used as a new diffusion center to search for a node which is in an incidence relation with the first degree diffusion node, and the searched node is called as a second degree diffusion node of the target node. For example, in fig. 6, the neighbor nodes of application 1-1, application 1-2, application 1-3, and application 1-4 are continuously and respectively searched, and this process is the two-degree diffusion of application 1.
Fig. 7 is a flowchart of a diffusion method provided in an embodiment of the present application, and as shown in fig. 7, a target node is diffused, that is, a node having an association relationship with the target node is found to obtain a first-degree diffusion node of the target node, and the first-degree diffusion node is continuously diffused, that is, a node having an association relationship with the first-degree diffusion node is found to obtain a second-degree diffusion node of the target node, where the first-degree diffusion node and the second-degree diffusion node are diffusion results obtained by diffusing the target node. As shown in the application association network 414 in fig. 4, the bold nodes in the application association network 414 represent the first application, and the black nodes represent a plurality of applications obtained by diffusing from the first application as a starting point.
Since the diffusion process is performed based on the association relationship and the similarity between the applications, it is considered that the applications obtained by performing the first-degree diffusion and the second-degree diffusion on the first application have a higher degree of similarity on the attribute information and the downloaded information and have a higher possibility of belonging to the same application type, and thus the application type to which the application obtained by the diffusion belongs is determined as the application type to which the first application belongs. For example, if the first application belongs to a non-secure application, the application obtained by the diffusion is considered to belong to the non-secure application, so that the detection of the non-secure application is realized.
For convenience of explanation, the update process of the application association network is described below with reference to the application association networks shown in fig. 8 to 12.
Fig. 8 is a schematic diagram of an application association network that is created in step 401 and is provided in an embodiment of the present application, as shown in fig. 8, the application association network includes a plurality of different applications, and if installation package identifiers of two applications are the same, application certificates are the same, application identifiers are the same, download domain names are the same, or corresponding device identifiers are the same, the two applications are connected to each other, which indicates that an association relationship exists between the two applications. For example, application 1 and application 1-1 are interconnected, and the association between application 1 and application 1-1 is the same as the application certificate. As another example, the application 1-1 and the application 1-1-1 are connected to each other, and the association relationship between the application 1-1 and the application 1-1-1 is the same as the installation package identifier.
Fig. 9 is a schematic diagram of another application association network provided in an embodiment of the present application, where in the application association network shown in fig. 9, on the basis of the application association network in fig. 8, similarity between every two applications having an association relationship is further included.
Fig. 10 is a schematic diagram of another application-related network according to an embodiment of the present application, where in the application-related network shown in fig. 10, an edge with a similarity of 0 is modified to be a dotted line on the basis of the application-related network shown in fig. 9, and then the edge belonging to the dotted line needs to be deleted.
Fig. 11 is a schematic diagram of another application association network provided in an embodiment of the present application, where on the basis of the application association network in fig. 10, an edge belonging to a dotted line is deleted, so as to obtain an updated application association network shown in fig. 11, where only an edge with a similarity of 1 is reserved in the updated application association network.
Fig. 12 is a schematic diagram of another application correlation network provided in an embodiment of the present application, where on the basis of the application correlation network in fig. 11, the application 1 belonging to the non-secure application is subjected to first-degree diffusion to obtain applications 1-1, 1-3, and 1-4, and then second-degree diffusion is performed to obtain applications 1-1-2, 1-1-3, 1-3-2, 1-3-3, 1-4-1, and 1-4-3, where the types of the applications belonging to these applications are the same as the types of the applications belonging to the application 1, and then as shown in fig. 12, all the applications belonging to the non-secure application are thickened.
In the embodiment of the application, based on the structure of the application association network, more potential applications belonging to the same application type are found by diffusing the first application of the determined application type, so that the mining capability and the coverage capability of application detection are improved, and the efficiency and the accuracy of application detection are greatly improved.
Fig. 13 is a flowchart of another application detection method, which is executed by a computer device and is based on the above embodiment of fig. 2, and this embodiment specifically illustrates how to detect the application type to which the target application belongs when the number of reference applications is multiple. Referring to fig. 13, the method includes the following steps.
1301. The computer device obtains attribute information and download information of a plurality of applications.
The multiple applications comprise a target application and multiple reference applications, the application types of the multiple reference applications are determined, and the application types of the multiple reference applications are the same. For example, the plurality of reference applications are all non-secure applications.
Optionally, the multiple reference applications include applications that determine the application type by methods other than the method of the embodiment of the present application, for example, manually determining the application type to which the application belongs. The multiple reference applications also include applications of which the application types are determined by the method of the embodiment of the application, for example, for a certain application, if the application is determined to be the same as the application types to which the multiple reference applications belong by the method of the embodiment of the application, the application can be determined as a new reference application.
1302. The computer device determines a first similarity characteristic between the attribute information of the target application and the attribute information of the reference application, and a second similarity characteristic between the download information of the target application and the download information of the reference application.
1303. And the computer equipment combines the first similar characteristics and the second similar characteristics to obtain similar vectors between the target application and the reference application, and classifies the similar vectors to obtain the similarity between the target application and the reference application.
In the embodiment of the present application, the number of the reference applications is multiple, and for each reference application, the computer device determines the similarity between the target application and the reference application by using the methods in steps 1302 to 1303, so as to obtain the similarity between the target application and each reference application.
The process of determining the similarity between the target application and the reference application in steps 1302 to 1303 is the same as the process of determining the similarity in steps 202 to 204, and is not repeated here.
1304. The computer device determines a degree of aggregation between the target application and the plurality of reference applications based on a similarity between the target application and each two of the plurality of reference applications.
The computer device obtains a similarity between each two of the plurality of reference applications, obtains a similarity between the target application and each of the plurality of reference applications, and then determines a degree of aggregation between the target application and the plurality of reference applications based on the similarity between the target application and each two of the plurality of reference applications, the degree of aggregation representing a degree of aggregation between the target application and the plurality of reference applications.
Wherein the degree of polymerization is positively correlated with a degree of similarity between the target application and each two of the plurality of reference applications. For example, the degree of polymerization is an average of the degree of similarity between the target application and each two applications in the plurality of reference applications. Alternatively, the aggregation level between the target application and the plurality of reference applications may be determined by determining the Modularity (modeling) in a community mining algorithm such as Fast-Unfolding. Alternatively, the aggregation degree between the target application and the multiple reference applications may be determined by determining the clustering distance in a clustering algorithm such as K-means (K-means clustering).
1305. The computer device determines that the target application belongs to an application type to which the plurality of reference applications belong, in a case that the degree of polymerization reaches a first threshold.
The computer device judges whether the polymerization degree reaches a first threshold value, if the polymerization degree reaches the first threshold value, the polymerization degree between the target application and the multiple reference applications is high enough, namely the target application and the multiple reference applications are similar enough, the target application and the multiple reference applications can be considered to be the same in application type, and therefore the target application is determined to belong to the application type of the multiple reference applications. If the polymerization degree does not reach the first threshold, it indicates that the polymerization degree between the target application and the multiple reference applications is not high enough, that is, the target application and the multiple reference applications are not similar, and it can be considered that the types of applications to which the target application and the multiple reference applications belong are different.
Optionally, the first threshold is a threshold preset by the computer device, for example, if the value range of the polymerization degree is 0 to 1, the first threshold may be 0.7 or 0.8.
When the computer device determines that the target application belongs to the application types of the multiple reference applications, the target application can be determined as a new reference application, and then the determined multiple reference applications are continuously utilized to detect the application types of other applications.
According to the method provided by the embodiment of the application, the first similar feature and the second similar feature are respectively mined from two dimensions of the attribute information and the download information of the application, then the similarity between the two applications is judged by using the similar vector obtained by combining the first similar feature and the second similar feature, then the degree of polymerization between the target application and the multiple reference applications is determined by using the similarity, if the degree of polymerization is high enough, the application types of the target application and the multiple reference applications are considered to be the same, and therefore the application type of the target application can be detected by means of the multiple reference applications by adopting the method. The method enriches the dimensionality of the similarity degree between the applications, and simultaneously considers a plurality of different reference applications, so that the method is favorable for improving the accuracy of detecting the application types to which the applications belong.
Fig. 14 is a flowchart of another application detection method, which is executed by a computer device and is based on the embodiment of fig. 2, and the embodiment of the present application specifically illustrates how to determine the similarity based on the first similar characteristic and the second similar characteristic. Referring to fig. 14, the method includes the following steps.
1401. The computer device obtains attribute information and download information of a plurality of applications.
1402. The computer device determines a first similarity characteristic between the attribute information of the target application and the attribute information of the reference application, and a second similarity characteristic between the download information of the target application and the download information of the reference application.
The process of determining the first similar feature and the second similar feature in steps 1401 to 1402 is the same as the process of determining the first similar feature and the second similar feature in steps 201 to 202, and is not described in detail herein.
1403. The computer device combines the first similar features and the second similar features to obtain a similar vector between the target application and the reference application.
In one possible implementation, the attribute information includes information items of at least one first dimension, the first similar characteristics include first characteristic values in the at least one first dimension, the download information includes information items of at least one second dimension, and the second similar characteristics include second characteristic values in the at least one second dimension. The computer device combines the first eigenvalue in the at least one first dimension and the second eigenvalue in the at least one second dimension in the order of the at least one first dimension and the at least one second dimension to obtain a similarity vector between the target application and the reference application.
The information items of the at least one first dimension refer to different types of attribute information, for example, the information items of the at least one first dimension include installation package identification, application certificates, application identification, installation package size, and the like, a first feature value corresponds to each information item of the first dimension of the reference application, and the first feature value represents a degree of similarity between the information items of the first dimension. The information items of the at least one second dimension refer to different types of downloaded information, for example, the information items of the at least one second dimension include a downloaded domain name, a device identifier of the target device, and the like, a second feature value corresponds to each information item of the second dimension of the reference application, and the second feature value indicates a degree of similarity between the information items of the second dimension.
For example, the installation package identifier corresponds to a first feature value a, the application certificate corresponds to a first feature value b, the application identifier corresponds to a first feature value c, the installation package size corresponds to a first feature value d, the domain name is downloaded corresponds to a second feature value e, the device identifier corresponds to a second feature value f, the at least one first dimension and the at least one second dimension are in the order of (installation package identifier, application certificate, application identifier, installation package size, domain name is downloaded, device identifier), and then the similar vector obtained by combining the first feature value and the second feature value is (a, b, c, d, e, f).
1404. And calling the similar vector classification model by the computer equipment, classifying the similar vectors, and obtaining the similarity between the target application and the reference application.
The computer equipment inputs the similar vectors into the similar vector classification model after obtaining the similar vectors between the target application and the reference application, and the similar vector classification model classifies the similar vectors and outputs the similarity between the target application and the reference application.
In one possible implementation manner, the training process of the similarity vector classification model includes the following steps.
1414. The computer device obtains a sample similarity vector between the first sample application and the second sample application, and a sample similarity between the first sample application and the second sample application.
The process of determining the sample similarity vector between the first sample application and the second sample application is the same as the process of determining the similarity vector between the target application and the reference application, and is not described herein again.
Wherein the sample similarity between the first sample application and the second sample application is an actual similarity between the first sample application and the second sample application. For example, the similarity values are 0 and 1, 0 indicates dissimilarity, and 1 indicates similarity. The sample similarity between the first sample application and the second sample application is 1 if the application types to which the first sample application and the second sample application belong are the same, and the sample similarity between the first sample application and the second sample application is 0 if the application types to which the first sample application and the second sample application belong are not the same.
1424. And calling a similar vector classification model by the computer equipment, classifying the sample similar vectors, and obtaining the prediction similarity between the first sample application and the second sample application.
The computer device inputs the sample similarity vector into a similarity vector classification model, the similarity vector classification model classifies the sample similarity vector, and the predicted similarity between the first sample application and the second sample application is output.
1434. The computer device trains a similarity vector classification model based on the prediction similarity and the sample similarity.
In the embodiment of the application, a supervised learning mode is adopted to train the similar vector classification model. Because the sample similarity is the actual similarity, the prediction similarity is the similarity obtained by the prediction of the similar vector classification model, and the closer the prediction similarity is to the sample similarity, the higher the classification capability of the similar vector classification model is, i.e. the higher the accuracy is, the computer device trains the similar vector classification model based on the difference between the prediction similarity and the sample similarity, so that the difference between the prediction similarity obtained by the trained similar vector classification model and the sample similarity is reduced.
It should be noted that, in the embodiment of the present application, only one training process is taken as an example for description, and in practical applications, the method provided by the embodiment of the present application may be adopted to perform multiple iterative training until the similarity vector classification model tends to converge. Optionally, in a multiple iterative training process, the similarity vector classification model may be jointly trained by using positive samples and negative samples, so as to improve the accuracy of the similarity vector classification model. The positive sample indicates that the first sample application and the second sample application belong to the same application type, for example, the sample similarity is 1, and the negative sample indicates that the first sample application and the second sample application belong to different application types, for example, the sample similarity is 0.
In one possible implementation manner, the similarity vector classification model is XGBoost (optimized lifting tree model), which is one of lifting algorithms. The idea of Boosting is to integrate many weak classifiers into one strong classifier. In addition, the similarity vector classification model may also be a classifier such as a decision tree, a random forest, bayes, or the like, which is not limited in the embodiment of the present application.
1405. The computer device determines that the target application belongs to the application type to which the reference application belongs, in a case where a similarity between the target application and the reference application satisfies a similarity condition.
The process of step 1405 is the same as the process of step 205, and is not repeated herein.
According to the method provided by the embodiment of the application, the first similar feature and the second similar feature are mined from the two dimensions of the attribute information and the download information of the application respectively, then the similarity between the two applications is judged by using the similar vector obtained by combining the first similar feature and the second similar feature, and if the similarity of the two applications meets the similarity condition, the application types of the two applications are considered to be the same, so that the application type of the target application can be detected by means of the reference application. According to the method, the attribute information and the downloading information of the application are simultaneously considered, so that the dimensionality of the similarity degree between the applications is enriched, and the accuracy of detecting the application type of the application belongs to the method.
On the basis of the above embodiment, the attribute information of the application may include a plurality of information items, such as an installation package identifier, an application certificate, an application identifier, an installation package size, and the like, and then the first similar feature between the attribute information of the target application and the attribute information of the reference application is determined, including at least one of the following ways.
First way of determining the first similar feature: the attribute information includes an installation package identification, and the computer device determines a first similarity characteristic between the installation package identification of the target application and the installation package identification of the reference application.
In one possible implementation, the computer device determines a first similarity feature between the installation package identification of the target application and the installation package identification of the reference application, including at least one of the following.
(1) The computer device determines the first similar characteristic based on an editing distance between the installation package identifiers of the target application and the reference application, wherein the editing distance refers to the number of characters required to be modified by modifying one installation package identifier into another installation package identifier.
A first similar characteristic between the installation package identifications of the target application and the reference application is inversely related to the edit distance. Optionally, the computer device determines a first ratio between the edit distance and a maximum of the lengths of the installation package identifications of the target application and the reference application, and determines a first similarity feature between the installation package identifications of the target application and the reference application based on the first ratio. Wherein the first similar characteristic is inversely related to the first ratio. That is, the larger the first ratio, the smaller the first similar feature, and the smaller the first ratio, the larger the first similar feature. By determining the ratio between the editing distance and the maximum value, the influence caused by different lengths of installation package identifiers of different applications is avoided, and the accuracy of the first similar characteristic is improved. For example, the computer device determines a difference between a preset value and the first ratio as a first similar characteristic between the installation package identifications. For example, the predetermined value is 1.
For example, the computer device determines a first similar characteristic between installation package identifications using the following formula.
The first similarity feature between installation package identities = 1-edit distance/max (length of installation package identity of target application, length of installation package identity of reference application). Where max is taken to be the maximum value.
The edit distance, also called Levenshtein distance (Levenshtein), is a quantitative measure of the degree of difference between two strings, by determining how many characters need to be changed to change one string into another. For example, for the character string "love" and the character string "lpvek", it is necessary to modify "o" in the character string "love" to "p" and add one "k" to obtain the character string "lpvek", and the edit distance between the character string "love" and the character string "lpvek" is 2.
For example, in the present embodiment, if the installation package identifier of the target application is "com.im 0928.qinyu 20220101", and the installation package identifier of the reference application is "com.im 0828.qinyu 2022010523", the edit distance between the installation package identifier of the target application and the installation package identifier of the reference application is 4.
(2) The computer device determines a first similar characteristic based on the lengths of the same character strings in the installation package identifications of the target application and the reference application.
The first similar feature between the installation package identifications of the target application and the reference application positively correlates with the length of the same string. Optionally, the computer device determines a second ratio between the length of the similar character string and a maximum of the lengths of the installation package identifications of the target application and the reference application, and determines a first similar characteristic between the installation package identifications of the target application and the reference application based on the second ratio. Wherein the first similar characteristic is positively correlated with the second ratio. That is, the larger the second ratio is, the larger the first similar feature is, and the smaller the second ratio is, the smaller the first similar feature is. For example, the computer device determines the second ratio as a first similar characteristic between the installation package identifications.
For example, the computer device determines a first similarity characteristic between installation package identifications using the following formula.
The first similarity feature between installation package identifications = length/max of the same character string (length of installation package identification of target application, length of installation package identification of reference application). Where max is taken to be the maximum value.
The length of the same character string refers to the number of the same characters in the installation package identifiers of the target application and the reference application. For example, the same character in the string "love" and the string "lpvek" is [ l, v, e ].
For example, in the present embodiment, if the installation package identifier of the target application is "com.im 0928.qinyu 20220101", and the installation package identifier of the reference application is "com.im 0828.qinyu 2022010523", the same character string in the installation package identifiers of the target application and the reference application is [ com.im0,28,. qinyu2022010], and the length of the same character string is 22.
(3) The computer device determines the first similar characteristics based on the number of the same fields in the installation package identifiers of the target application and the reference application.
The first similar characteristic between the installation package identifications of the target application and the reference application is positively correlated with the number of the same fields. Optionally, the computer device determines a third ratio between the number of the same fields and a maximum value of the lengths of the installation package identifiers of the target application and the reference application, and determines a first similar characteristic between the installation package identifiers of the target application and the reference application based on the third ratio. Wherein the first similar characteristic is positively correlated with the third ratio. That is, the larger the third ratio is, the larger the first similar feature is, and the smaller the third ratio is, the smaller the first similar feature is. For example, the computer device determines the third ratio as a first similar characteristic between the installation package identifications.
For example, the computer device determines a first similar characteristic between installation package identifications using the following formula.
The first similarity feature = number of identical fields/max (length of installation package identification of target application, length of installation package identification of reference application). Where max is taken to be the maximum value.
The installation package identifier includes a plurality of fields, the fields are divided according to the ". multidot.m. in the installation package identifier, for example, the installation package identifier of the target application is" com.im0928.qinyu20220101 ", the fields in the installation package identifier of the target application include [ com, im0928, qinyu20220101], the installation package identifier of the reference application is" com.im0828. qinyu2022010523 ", the fields in the installation package identifier of the reference application include [ com, im0828, qinyu 2010523], and then the computer device compares whether the fields at each position in the two applications are the same in sequence, and since only the field" com "in the installation package identifiers of the target application and the reference application is the same, the number of the same fields in the installation package identifiers of the target application and the reference application is 1.
(4) The computer device determines the first similar characteristics based on the number of fields which belong to the same structure in the installation package identifiers of the target application and the reference application.
The installation package identifier includes a plurality of fields, and the structures of the different fields may be the same or different. The first similar characteristic between the installation package identifications of the target application and the reference application is positively correlated with the number of the fields of the same structure. Optionally, the computer device determines a fourth ratio between the number of fields of the same structure and a maximum value of the lengths of the installation package identifications of the target application and the reference application, and determines a first similar characteristic between the installation package identifications of the target application and the reference application based on the fourth ratio. Wherein the first similar characteristic is positively correlated with the fourth ratio. That is, the larger the fourth ratio is, the larger the first similar feature is, and the smaller the fourth ratio is, the smaller the first similar feature is. For example, the computer device determines the fourth ratio as a first similar characteristic between the installation package identifications.
For example, the computer device determines a first similarity characteristic between installation package identifications using the following formula.
The first similarity feature = number of fields of the same structure/max (length of installation package identification of target application, length of installation package identification of reference application). Where max is taken to be the maximum value.
In one possible implementation, the structure of the field includes a case structure, a alphanumeric structure, a numeric structure, a full pinyin structure, a pinyin initial abbreviation structure, and a pinyin initial abbreviation structure. Wherein the various structures are defined as follows:
(a) case structure means that the field consists of large and small letters, e.g. the field "NhGA" belongs to case structure. (b) By alphanumeric structure is meant that the field is composed of letters and numbers, e.g., the field "im 0928" belongs to an alphanumeric structure. (c) The number structure means that the field is composed of numbers, for example, the field "20220101" belongs to the number structure. (d) The full pinyin structure means that a field is formed by full pinyin of an application name, for example, the name of an application is 'parent fish', the installation package identifier of the application is 'com.im 0828. qinyu', the full pinyin 'qinyu' of the 'parent fish' is included, and the field 'qinyu' belongs to the full pinyin structure. (e) The pinyin initial consonant abbreviated structure means that a field is formed by pinyin initial consonant abbreviated structure of an application name, for example, the application name is "stone shopping", and an installation package of the application is marked as "net. (f) The pinyin-acronym structure means that a field is formed by the pinyin acronym for the application name, e.g., the application name "stone shopping", the installation package for the application is identified as "net.
By defining the structure body, the structure of each field in the installation package identification is abstracted. If the field does not hit the above structure, it remains unchanged, and if the field hits the structure, it is modified to that structure. For example, the installation package of the target application is identified as "com.im 0828.qinyu 20220105", the corresponding structure of each field of the target application is [ com, character number structure, full spelling structure number structure ], the installation package of the reference application is identified as "com.im 0928.qinyu 2020123", and the corresponding structure of each field of the reference application is [ com, character number structure, full spelling structure number structure ]. The corresponding structures of the fields of the target application and the reference application are all [ com, character number structure and full spelling structure number structure ], so that the number of the fields with the same structure in the installation package identification of the target application and the reference application is 3.
In the embodiment of the application, when the similar features on the identification dimension of the installation package are determined, the similar features on the double dimensions of the character content and the field structure are fully extracted, and the accuracy of the first similar features is improved.
Second way of determining the first similar feature: the attribute information includes an application certificate, and the computer device determines a first similarity feature between the application certificate of the target application and the application certificate of the reference application.
In one possible implementation, the computer device determines a first similar feature between the application certificate of the target application and the application certificate of the reference application, including the following cases.
(1) In a case where the target application is the same as the application certificate of the reference application, the computer device determines the first similar feature based on a certificate heat degree of the application certificate, the certificate heat degree referring to the number of applications having the application certificate.
The computer device obtains an application certificate of a target application and an application certificate of a reference application, if the application certificate of the target application is the same as the application certificate of the reference application, the computer device determines the certificate heat degree of the application certificate, and then determines a first similar characteristic between the application certificates of the target application and the reference application based on the certificate heat degree of the application certificate, wherein the first similar characteristic is negatively related to the certificate heat degree of the application certificate. For example, the computer device determines the reciprocal of the certificate heat as the first similar feature between the application certificates. For example, the first similarity characteristic between the application certificates of the target application and the reference application = 1/(certificate heat).
(2) And under the condition that the application certificates of the target application and the reference application are different and the difference value of the certificate heat degrees is smaller than a third threshold value, the computer equipment determines the first similar characteristic based on the maximum value of the certificate heat degrees corresponding to the target application and the reference application.
If the application certificate of the target application is not identical to the application certificate of the reference application, the computer device determines a certificate heat degree of the application certificate of the target application and a certificate heat degree of the application certificate of the reference application, respectively, and then determines a difference between the two certificate heat degrees, if the difference between the two certificate heat degrees is less than a third threshold, the computer device determines a maximum value of the two certificate heat degrees, and based on the maximum value, determines a first similar characteristic between the application certificates of the target application and the reference application.
Optionally, the first similar feature is inversely related to the maximum value. For example, the computer device determines an inverse of the maximum value as the first similar feature. For example, the first similarity feature =1/max between the application certificates of the target application and the reference application (the certificate heat corresponding to the target application, the certificate heat corresponding to the reference application). Where max is taken to be the maximum value.
Optionally, the third threshold is a threshold preset by the computer device, for example, the third threshold is 10.
(3) In the case that the application certificates of the target application and the reference application are not the same and the difference of the certificate heat degrees is not less than a third threshold value, the computer device determines the target numerical value as a first similar characteristic.
If the difference in the certificate popularity is not less than the third threshold, the target value of the computer device is determined to be the first similar characteristic, e.g., the target value is 0.
A third way of determining the first similar feature: the attribute information includes an application identification, and the computer device determines a first similarity characteristic between the application identification of the target application and the application identification of the reference application.
In one possible implementation, the computer device determines a first similarity feature between the application identification of the target application and the application identification of the reference application, including at least one of the following.
(1) The computer device determines a first similar characteristic based on the lengths of the same character strings in the application identifications of the target application and the reference application.
The first similar feature between the application identifications of the target application and the reference application positively correlates with the length of the same string. Optionally, the computer device determines a fifth ratio between the length of the same character string and a maximum of the lengths of the application identifications of the target application and the reference application, and determines a first similarity feature between the application identifications of the target application and the reference application based on the fifth ratio. Wherein the first similar characteristic is positively correlated with the fifth ratio. That is, the larger the fifth ratio is, the larger the first similar feature is, and the smaller the fifth ratio is, the smaller the first similar feature is. For example, the computer device determines the fifth ratio as the first similar characteristic between the application identifications.
For example, the computer device determines a first similar characteristic between application identifications using the following formula.
The first similar feature between application identities = length/max of the same character string (length of application identity of target application, length of application identity of reference application). Where max is taken to be the maximum value.
The length of the same character string refers to the number of the same characters in the application identifiers of the target application and the reference application. For example, in this embodiment, the application identifier of the target application is "web pan", and the application identifier of the reference application is "web pan", and then the number of the same characters in the application identifiers of the target application and the reference application is 2.
(2) The computer device determines a first similar feature based on the classification result of the target application corresponding to the application identification of the reference application.
The classification result corresponding to the application identifier represents the possibility that the application identifier belongs to each of the plurality of identifier types. For example, from the purpose of the application, the identification type corresponding to the application identification may include a video class, an online shopping class, a social class, a game class, and the like.
In one possible implementation manner, the computer device calls the application identifier classification model respectively, classifies the application identifier of the target application and the application identifier of the reference application, and obtains a classification result corresponding to the target application and a classification result corresponding to the reference application.
Optionally, the computer device determines a distance between the classification result corresponding to the target application and the classification result corresponding to the reference application, and determines the first similar feature based on the distance between the classification results, the first similar feature being negatively correlated with the distance. That is, the larger the distance between the classification results is, the smaller the first similar feature is, and the smaller the distance between the classification results is, the larger the first similar feature is.
Optionally, the computer device determines a difference between a preset value and the distance as the first similar characteristic. For example, the predetermined value is 1. Optionally, the classification result is a classification vector, and the classification vector includes probability values in multiple dimensions, where each dimension corresponds to an identifier type. The computer device determines a difference value between a probability value on each dimension in a classification result corresponding to the target application and a probability value on the same dimension in a classification result corresponding to the reference application, determines a distance between the classification result corresponding to the target application and the classification result corresponding to the reference application based on the determined difference value on each dimension, and the distance between the classification results is positively correlated with the difference value on each dimension. For example, the computer device determines a sum of squares of the difference values in each dimension, and based on the number of the plurality of dimensions, the sum of squares is squared to obtain the distance between the classification results.
For example, the classification result of the target application is [ video class probability ]
Figure DEST_PATH_IMAGE001
Probability of purchasing on-line
Figure DEST_PATH_IMAGE002
Social class probabilities
Figure DEST_PATH_IMAGE003
Probability of game class
Figure DEST_PATH_IMAGE004
]The reference applies a classification result of [ video class probability
Figure DEST_PATH_IMAGE005
Probability of purchasing on-line
Figure DEST_PATH_IMAGE006
Social class probabilities
Figure DEST_PATH_IMAGE007
Probability of game class
Figure DEST_PATH_IMAGE008
]. The first similarity characteristic between the application identification of the target application and the application identification of the reference application can be determined by using the following formula:
Figure DEST_PATH_IMAGE009
. Wherein P denotes a first similarity feature between the application identity of the target application and the application identity of the reference application.
Optionally, the application identifies the classification model as BERT (Bidirectional Encoder replication from transformations, a pre-trained language characterization model). In addition, the application identifier classification model may also be a model obtained by using other Natural Language Processing (NLP) classification algorithms, which is not limited in this embodiment of the present application.
A fourth way of determining the first similar feature: the attribute information includes an installation package size, and the computer device determines a first similarity characteristic between the installation package size of the target application and the installation package size of the reference application.
Since the installation package size is a numerical value, the computer device may determine the first similar characteristic based on a distance between the installation package size of the target application and the installation package size of the reference application. Wherein the first similar characteristic is inversely related to a distance between the installation package sizes. That is, the larger the distance between the sizes of the installation packages, the smaller the first similar feature, and the smaller the distance between the sizes of the installation packages, the larger the first similar feature.
Optionally, the computer device determines a difference between the installation package sizes of the target application and the reference application and a sum between the installation package size of the target application and the installation package size of the reference application, determines a sixth ratio between the difference and the sum, and determines a first similar characteristic between the installation package sizes based on the sixth ratio. Wherein the first similar characteristic is inversely related to the sixth ratio. That is, the larger the sixth ratio is, the larger the first similar feature is, and the smaller the sixth ratio is, the smaller the first similar feature is. For example, the computer device determines an absolute value of a product of the sixth ratio and a first predetermined value, and determines a difference between a second predetermined value and the absolute value as the first similar characteristic, for example, the first predetermined value is 2 and the second predetermined value is 1.
For example, the computer device determines a first similarity characteristic between installation package sizes using the following formula.
The first similar feature =1-2 abs (x-y)/(x + y). Wherein x represents the installation package size of the target application, y represents the installation package size of the reference application, and abs refers to the absolute value.
The embodiment of the application provides a determination mode of the first similar features on a plurality of different dimensions, enriches the dimensions of the first similar features, and excavates the features of finer granularity on the attribute information of the target application and the reference application, thereby being beneficial to improving the accuracy of determining the similarity degree between the target application and the reference application.
It should be noted that, in a possible implementation manner, the computer device splices the acquired first similar features in multiple different dimensions to obtain an overall first similar feature between the target application and the reference application, and then determines the similarity between the target application and the reference application by using the first similar feature obtained after splicing.
On the basis of the above embodiment, if the download information of the application may include a plurality of information items, such as a download domain name and a device identifier of the target device, etc., then determining the second similar characteristic between the download information of the target application and the download information of the reference application includes at least one of the following manners.
First way of determining the second similar feature: the download information includes a download domain name, and the computer device determines a second similarity characteristic between the download domain name of the target application and the download domain name of the reference application.
The download link comprises a link for directly downloading the application and a link for inducing a download interface of the download application, and the application can be downloaded in the download interface by triggering options or scanning codes and the like. Fig. 15 is a schematic diagram of a download interface provided in an embodiment of the present application, and as shown in fig. 15, the download interface includes an application name, an application icon, evaluation information of the application, and the like, and further includes an installation option 1501 of the application, and by triggering the installation option 1501, the application indicated by the application name can be downloaded.
In one possible implementation, the computer device determines the second similar characteristic based on a number of same downloaded domain names of the target application and the reference application.
The second similarity between the downloaded domain names of the target application and the reference application is positively correlated with the number of the same downloaded domain names. Optionally, the computer device determines a seventh ratio between the number of the same downloaded domain names and a maximum of the total number of the downloaded domain names of the target application and the reference application, and determines a second similar characteristic between the downloaded domain names of the target application and the reference application based on the seventh ratio. Wherein the second similar characteristic is positively correlated with the seventh ratio. That is, the larger the seventh ratio is, the larger the second similar characteristic is, and the smaller the seventh ratio is, the smaller the second similar characteristic is. For example, the computer device determines the seventh ratio as a second similar characteristic between the downloaded domain names.
For example, the computer device determines a second similarity characteristic between downloading domain names using the following formula.
Second similarity feature between downloaded domain names = number/max of same downloaded domain names (total number of downloaded domain names of target application, total number of downloaded domain names of reference application). Where max is taken to be the maximum value.
For example, the download domain names of the target application include [ domain1, domain2, domain4], the download domain names of the reference application include [ domain3, domain4], the same download domain name of the target application and the reference application is domain3, and the number of the same download domain name of the target application and the reference application is 1.
Second way of determining the second similar feature: the download information includes a device identification of the target device, and the computer device determines a second similarity characteristic between the device identification corresponding to the target application and the device identification corresponding to the reference application.
The application developer generally installs an application for testing after developing the application, and if target devices corresponding to two applications are the same, the probability that the developers corresponding to the two applications are the same is relatively high, and the probability that a plurality of applications developed by the same developer belong to the same application type is also relatively high, so that in the embodiment of the application, a second similar characteristic between device identifiers corresponding to the target application and a reference application is also considered, and the device identifier of the target device is used for indicating the target device.
In one possible implementation, the computer device determines the first numerical value as the second similar characteristic in the case that at least one identical device identification exists for the target application and the reference application; in case the target application and the reference application do not have the same device identification, a second value is determined as a second similar characteristic, wherein the first value is larger than the second value.
For example, if at least one identical device identity exists for the target application and the reference application, the second similar characteristic between the device identities is equal to 1, and if the identical device identity does not exist for the target application and the reference application, the second similar characteristic between the device identities is equal to 0.
According to the method and the device, a determination mode of the second similar characteristics on a plurality of different dimensions is provided, the dimensions of the second similar characteristics are enriched, the characteristics of finer granularity on the downloading information of the target application and the reference application are mined, and the accuracy of determining the similarity degree between the target application and the reference application is improved.
It should be noted that, in a possible implementation manner, the computer device splices the obtained second similar features in multiple different dimensions to obtain an overall second similar feature between the target application and the reference application, and then determines the similarity between the target application and the reference application by using the second similar feature obtained after splicing.
In the above two embodiments, the attribute information of the application includes information items in multiple dimensions, which are respectively an installation package identifier, an application certificate, an application identifier, and an installation package size, the first similar features include first similar features in the multiple dimensions, the download information of the application includes information items in the multiple dimensions, which are respectively a download domain name and a device identifier of the target device, and the second similar features include second similar features in the multiple dimensions, based on which, in the above embodiment of fig. 14, in step 1403, the computer device combines the first similar features in the multiple dimensions and the second similar features in the multiple dimensions to obtain a similar vector between the target application and the reference application.
Fig. 16 is a schematic diagram for determining a similarity vector according to an embodiment of the present application, where as shown in fig. 16, a first similar feature between attribute information of a target application and attribute information of a reference application includes: (1) installing a first similar characteristic between package identifications; (2) applying a first similarity feature between certificates; (3) applying a first similar feature between the identifications; (4) a first similar feature between package sizes is installed. The second similarity feature between the downloaded information of the target application and the reference application comprises: (5) downloading a second similar feature between the domain names; (6) a second similar characteristic between the device identifications. The computer device combines the six similar features to obtain a similarity vector between the target application and the reference application.
In the embodiment of the application, the characteristics of finer granularity of the target application and the reference application on the attribute information and the characteristics of finer granularity on the downloaded information are mined, and the dimensions of the first similar characteristic and the second similar characteristic are enriched, so that the similar vectors obtained by splicing take various aspects into consideration, the accuracy of determining the similarity between the target application and the reference application is favorably improved, and the application detection capability is further improved.
On the basis of the foregoing embodiment, the computer device may be a server, and the server determines, by using the method in the foregoing embodiment, a plurality of applications that are the same as the application type to which the reference application belongs, and forms an application set with the plurality of applications and the reference application, and then detects, based on the application set, a third application that the terminal requests to detect. Fig. 17 is a flowchart of another application detection method provided in an embodiment of the present application, where the method is executed by a computer device, and referring to fig. 17, the method includes the following steps.
1701. The terminal sends an application detection request to the server.
The application detection request comprises attribute information of a third application, and the application detection request is used for requesting the server to detect the application type of the third application.
In a possible implementation manner, the terminal responds to the installation request for the third application, analyzes the installation package of the third application to obtain attribute information in the installation package, and then sends an application detection request including the attribute information to the server.
For example, before installing the third application, the terminal needs to determine whether the third application belongs to a secure application or a non-secure application, if the third application belongs to the secure application, it indicates that the third application may be installed, and if the third application belongs to the non-secure application, it indicates that potential safety hazard may be caused by installing the third application, so that the terminal sends the application detection request to the server to request the server to detect whether the third application belongs to the non-secure application.
1702. And the server receives an application detection request sent by the terminal.
And the server receives the application detection request sent by the terminal and acquires the attribute information of the third application carried in the application detection request.
1703. And the server sends a notification message to the terminal under the condition that any application with the same attribute information as the third application is found in the application set.
The application set comprises a plurality of applications belonging to the application types of the reference application, the server searches for the application with the same attribute information as the third application in the application set, if a certain application with the same attribute information as the third application is found, the application type of the third application and the application is the same, namely the application type of the third application corresponding to the application set is found, the server sends a notification message to the terminal, and the notification message is used for notifying that the third application belongs to the application type corresponding to the application set.
And if the server does not find any application with the same attribute information as the third application in the application set, sending another notification message to the terminal, wherein the notification message is used for notifying that the third application does not belong to the application type corresponding to the application set.
For example, the application set includes a plurality of applications belonging to non-secure applications, the application detection request is used to request to detect whether a third application belongs to a non-secure application, if the server finds an application in the application set that has the same attribute information as the third application, it is indicated that the third application also belongs to a non-secure application, and the server notifies the terminal that the third application belongs to a non-secure application.
1704. And the terminal receives the notification message sent by the server.
In a possible implementation manner, the notification message is used to notify that the third application belongs to the non-secure application, and the terminal displays a prompt message, which is used to prompt that the third application belongs to the non-secure application, so as to prompt the user not to install the third application. Or the terminal stops installing the third application and displays an installation failure message which is used for prompting that the installation fails because the third application belongs to the non-safe application.
Fig. 18 is a flowchart of another application detection method provided in an embodiment of the present application, and as shown in fig. 18, the method includes the following steps.
1801. And the terminal displays a downloading interface of the application. As with the download interface 1811 in fig. 18, the download interface 1811 includes an application icon and an application name, as well as application-corresponding download options.
1802. And the terminal responds to the trigger operation of the downloading options in the downloading interface and downloads the corresponding application.
1803. After the application is downloaded, the terminal responds to the trigger operation of the installation option, analyzes the installation package of the application, and obtains the attribute information of the application.
1804. The terminal calls an application detection interface, sends an application detection request including attribute information to the server, displays a detection interface 1814, and displays the prompt message being detected in the detection interface 1814.
1805. The server queries for hits in the set of unsecured applications. The non-secure application set comprises a plurality of non-secure applications, the server searches the applications with the same attribute information as the applications in the non-secure application set, and if the applications are found, the server indicates that the non-secure application set is hit.
1806. In case of a hit, the server sends a non-secure notification message to the terminal, the non-secure notification message being used to notify the terminal that the application belongs to a non-secure application.
1807. And displaying the non-safety prompt message by the terminal. As shown in fig. 18, the terminal displays the non-secure prompt message in the prompt interface 1817, the non-secure prompt message is used to prompt that the application is detected as a non-secure application, and prompt to delete the installation package. In addition, the prompt interface 1817 includes a "do nothing" option and an "delete immediately" option. If the 'processing temporarily' option is triggered, the terminal continues to install the application, and if the 'delete immediately' option is triggered, the terminal stops installing the application and deletes the installation package of the application.
Fig. 19 is a schematic structural diagram of an application detection apparatus according to an embodiment of the present application. Referring to fig. 19, the apparatus includes:
an obtaining module 1901, configured to obtain attribute information and download information of multiple applications, where the attribute information is information used to describe an application, the download information is information related to a download application, and the multiple applications include a target application to be detected and a reference application that has already been determined to belong to an application type;
a similarity determining module 1902, configured to determine a first similar characteristic between the attribute information of the target application and the attribute information of the reference application, and a second similar characteristic between the download information of the target application and the download information of the reference application;
the similarity determining module 1902 is further configured to combine the first similar feature and the second similar feature to obtain a similar vector between the target application and the reference application, and classify the similar vector to obtain a similarity between the target application and the reference application;
a type determining module 1903, configured to determine that the target application belongs to the application type to which the reference application belongs if the similarity between the target application and the reference application satisfies the similarity condition.
The application detection device provided by the embodiment of the application respectively excavates the first similar feature and the second similar feature from two dimensions of the attribute information and the download information of the application, then judges the similarity between the two applications by using the similar vector obtained by combining the first similar feature and the second similar feature, and if the similarity of the two applications meets the similarity condition, the application types of the two applications are considered to be the same, so that the application type of the target application can be detected by means of the reference application. Due to the fact that attribute information and downloading information of the applications are considered at the same time, the dimensionality of the similarity degree between the applications is enriched, and therefore the accuracy of detecting the application types to which the applications belong is improved.
Alternatively, referring to fig. 20, the attribute information and the download information each include a plurality of information items; the type determining module 1903 is configured to determine that the target application belongs to the application type to which the reference application belongs when the target application and the reference application have an association relationship and the similarity between the target application and the reference application reaches a second threshold, where the association relationship refers to having the same information item.
Optionally, referring to fig. 20, the reference application includes a first application and a second application, the target application has no association relationship with the first application, and the target application has an association relationship with the second application;
a similarity determination module 1902, configured to determine a first similar characteristic between the attribute information of the target application and the attribute information of the second application, and a second similar characteristic between the download information of the target application and the download information of the second application;
a type determining module 1903, configured to determine that the target application belongs to the application type to which the second application belongs if the similarity between the target application and the second application reaches a second threshold.
Optionally, referring to fig. 20, the similarity determining module 1903 is further configured to determine a similarity between a first application and a second application, where an application type to which the first application belongs is already determined, an application type to which the second application belongs is not yet determined, and the first application and the second application have an association relationship;
The type determining module 1903 is further configured to determine that the second application belongs to the application type to which the first application belongs if the similarity between the first application and the second application reaches a second threshold.
Alternatively, referring to fig. 20, the number of the reference applications is plural, and the types of the applications to which the plural reference applications belong are the same;
a type determination module 1903, comprising:
a polymerization degree determining unit 1913, configured to determine, based on a similarity between each two applications in the target application and the multiple reference applications, a polymerization degree between the target application and the multiple reference applications, where the polymerization degree is positively correlated with the similarity between each two applications;
a type determining unit 1923, configured to determine that the target application belongs to the application types to which the plurality of reference applications belong, if the degree of polymerization reaches a first threshold.
Optionally, referring to fig. 20, the attribute information includes information items of at least one first dimension, the first similar feature includes a first feature value of the at least one first dimension, the download information includes information items of at least one second dimension, and the second similar feature includes a second feature value of the at least one second dimension; the similarity determination module 1902 includes:
A splicing unit 1912, configured to combine the first eigenvalue in the at least one first dimension and the second eigenvalue in the at least one second dimension according to an order of the at least one first dimension and the at least one second dimension, so as to obtain a similarity vector between the target application and the reference application.
Optionally, referring to fig. 20, the similarity determining module 1902 includes:
a classifying unit 1922, configured to invoke a similar vector classification model, and classify the similar vectors to obtain a similarity between the target application and the reference application;
wherein, the apparatus further comprises a model training module 1904, the model training module 1904 is configured to:
obtaining a sample similarity vector between a first sample application and a second sample application and a sample similarity between the first sample application and the second sample application;
calling a similar vector classification model, classifying the sample similar vectors, and obtaining the prediction similarity between the first sample application and the second sample application;
and training a similarity vector classification model based on the prediction similarity and the sample similarity.
Optionally, referring to fig. 20, the apparatus further comprises:
a receiving module 1905, configured to receive an application detection request sent by a terminal, where the application detection request includes attribute information of a third application;
A message sending module 1906, configured to send a notification message to the terminal when any application that is the same as the attribute information of the third application is found in the application set, where the application set includes multiple applications that belong to the application type to which the reference application belongs, and the notification message is used to notify that the third application belongs to the application type corresponding to the application set.
Optionally, referring to fig. 20, the similarity determining module 1902 includes at least one of:
a first determining unit 1932, configured to determine, when the attribute information includes an installation package identifier, a first similar feature between the installation package identifier of the target application and an installation package identifier of the reference application;
a second determining unit 1942 for determining a first similar feature between the application certificate of the target application and the application certificate of the reference application, in a case where the attribute information includes the application certificate;
a third determining unit 1952, for determining a first similar characteristic between the application identification of the target application and the application identification of the reference application in case the attribute information includes the application identification;
a fourth determining unit 1962 for determining a first similarity characteristic between the installation package size of the target application and the installation package size of the reference application in the case where the attribute information includes the installation package size.
Optionally, referring to fig. 20, a first determining unit 1932 for performing at least one of:
determining a first similar characteristic based on an editing distance between installation package identifiers of a target application and a reference application, wherein the editing distance refers to the number of characters required to be modified by modifying one installation package identifier into another installation package identifier;
determining a first similar characteristic based on the lengths of the same character strings in the installation package identifications of the target application and the reference application;
determining a first similar characteristic based on the number of the same fields in the installation package identifiers of the target application and the reference application;
and determining the first similar characteristics based on the number of fields which belong to the same structure in the installation package identifiers of the target application and the reference application.
Alternatively, referring to fig. 20, a second determination unit 1942, configured to:
under the condition that the application certificates of the target application and the reference application are the same, determining a first similar characteristic based on the certificate heat degree of the application certificate, wherein the first similar characteristic is in negative correlation with the certificate heat degree, and the certificate heat degree refers to the number of the applications with the application certificates;
under the condition that the application certificates of the target application and the reference application are different and the difference value of the certificate heat degrees is smaller than a third threshold value, determining a first similar characteristic based on the maximum value of the certificate heat degrees corresponding to the target application and the reference application, wherein the first similar characteristic is in negative correlation with the maximum value;
And determining the target numerical value as a first similar characteristic under the condition that the application certificates of the target application and the reference application are different and the difference value of the certificate heat degrees is not less than a third threshold value.
Optionally, referring to fig. 20, a third determining unit 1952, for performing at least one of:
determining a first similar characteristic based on the lengths of the same character strings in the application identifications of the target application and the reference application;
the first similar feature is determined based on a classification result of the target application corresponding to the application identifier of the reference application, the classification result corresponding to the application identifier representing a likelihood that the application identifier belongs to each of the plurality of identifier types.
Optionally, referring to fig. 20, the similarity determining module 1902 includes at least one of:
a fifth determining unit 1972, configured to determine, when the download information includes a download domain name, a second similar feature between the download domain name of the target application and the download domain name of the reference application, where the download domain name is a domain name included in the download link;
a sixth determining unit 1982, configured to determine, in a case where the download information includes a device identification of the target device, the target device including a target number of devices whose installation time is earlier among the plurality of devices where the application is installed, a second similar characteristic between the device identification corresponding to the target application and the device identification corresponding to the reference application.
Optionally, referring to fig. 20, a fifth determining unit 1972 for determining the second similar characteristic based on the number of the same downloaded domain names of the target application and the reference application.
Optionally, referring to fig. 20, a sixth determining unit 1982, configured to:
determining the first numerical value as a second similar characteristic under the condition that at least one identical device identification exists in the target application and the reference application;
in case the target application and the reference application do not have the same device identification, determining a second value as the second similar characteristic, the first value being greater than the second value.
It should be noted that: the application detection apparatus provided in the foregoing embodiment is only illustrated by dividing the functional modules, and in practical applications, the foregoing function allocation may be completed by different functional modules according to needs, that is, the internal structure of the computer device is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the application detection apparatus and the application detection method provided in the above embodiments belong to the same concept, and specific implementation processes thereof are described in detail in the method embodiments and are not described herein again.
An embodiment of the present application further provides a computer device, where the computer device includes a processor and a memory, where the memory stores at least one computer program, and the at least one computer program is loaded and executed by the processor, so as to implement the operations performed in the application detection method of the foregoing embodiment.
Optionally, the computer device is provided as a terminal. Fig. 21 shows a schematic structural diagram of a terminal 2100 according to an exemplary embodiment of the present application.
The terminal 2100 includes: a processor 2101 and a memory 2102.
The processor 2101 may include one or more processing cores, such as a 4-core processor, an 8-core processor, or the like. The processor 2101 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), a FPGA (Field Programmable Gate Array), or a PLA (Programmable Logic Array). The processor 2101 may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in a wake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 2101 may be integrated with a GPU (Graphics Processing Unit, image Processing interactor) which is responsible for rendering and drawing the content required to be displayed by the display screen. In some embodiments, the processor 2101 may also include an AI (Artificial Intelligence) processor to process computational operations related to machine learning.
Memory 2102 may include one or more computer-readable storage media, which may be non-transitory. Memory 2102 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in the memory 2102 is used to store at least one computer program for being possessed by the processor 2101 to implement the application detection methods provided by the method embodiments herein.
In some embodiments, the terminal 2100 may further optionally include: peripheral interface 2103 and at least one peripheral. The processor 2101, memory 2102 and peripheral interface 2103 may be connected by buses or signal lines. Each peripheral may be connected to peripheral interface 2103 by a bus, signal line, or circuit board. Optionally, the peripheral device comprises: at least one of radio frequency circuitry 2104, display screen 2105 and power supply 2106.
The peripheral interface 2103 may be used to connect at least one peripheral related to I/O (Input/Output) to the processor 2101 and the memory 2102. In some embodiments, the processor 2101, memory 2102 and peripheral interface 2103 are integrated on the same chip or circuit board; in some other embodiments, any one or both of the processor 2101, the memory 2102 and the peripheral interface 2103 may be implemented on separate chips or circuit boards, which is not limited by this embodiment.
The Radio Frequency circuit 2104 is used to receive and transmit RF (Radio Frequency) signals, also known as electromagnetic signals. The radio frequency circuitry 2104 communicates with a communication network and other communication devices via electromagnetic signals. The radio frequency circuit 2104 converts an electrical signal into an electromagnetic signal for transmission, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuitry 2104 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuitry 2104 may communicate with other devices via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: metropolitan area networks, various generation mobile communication networks (2G, 3G, 4G, and 5G), Wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the radio frequency circuit 2104 may further include NFC (Near Field Communication) related circuits, which are not limited in this application.
The display screen 2105 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 2105 is a touch display screen, the display screen 2105 also has the ability to capture touch signals on or over the surface of the display screen 2105. The touch signal may be input as a control signal to the processor 2101 for processing. At this point, the display 2105 may also be used to provide virtual buttons and/or virtual keyboards, also known as soft buttons and/or soft keyboards. In some embodiments, the display 2105 can be one, disposed on a front panel of the terminal 2100; in other embodiments, the display 2105 can be at least two, each disposed on a different surface of the terminal 2100 or in a folded design; in other embodiments, the display 2105 can be a flexible display disposed on a curved surface or a folded surface of the terminal 2100. Even the display screen 2105 may be arranged in a non-rectangular irregular figure, i.e. a shaped screen. The Display screen 2105 may be made of LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode), or other materials.
Power supply 2106 is used to power the various components in terminal 2100. The power source 2106 may be alternating current, direct current, disposable or rechargeable. When the power source 2106 comprises a rechargeable battery, the rechargeable battery can support wired charging or wireless charging. The rechargeable battery can also be used to support fast charge technology.
Those skilled in the art will appreciate that the configuration shown in fig. 21 is not intended to be limiting of terminal 2100, and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components may be used.
Optionally, the computer device is provided as a server. Fig. 22 is a schematic structural diagram of a server 2200 that may have a relatively large difference due to different configurations or performances, and may include one or more processors (CPUs) 2201 and one or more memories 2202, where the memory 2202 stores at least one computer program that is loaded by and executed by the processors 2201 to implement the methods provided by the foregoing method embodiments. Certainly, the server may further have a wired or wireless network interface, a keyboard, an input/output interface, and other components to facilitate input and output, and the server may further include other components for implementing functions of the device, which are not described herein again.
The embodiment of the present application further provides a computer-readable storage medium, where at least one computer program is stored in the computer-readable storage medium, and the at least one computer program is loaded and executed by a processor to implement the operations performed in the application detection method of the foregoing embodiment.
Embodiments of the present application further provide a computer program product, which includes a computer program, and the computer program is loaded and executed by a processor to implement the operations performed in the application detection method according to the foregoing embodiments. In some embodiments, a computer program according to embodiments of the present application may be deployed to be executed on one computer apparatus or on multiple computer apparatuses at one site, or on multiple computer apparatuses distributed at multiple sites and interconnected by a communication network, and the multiple computer apparatuses distributed at the multiple sites and interconnected by the communication network may constitute a block chain system.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the storage medium may be a read-only memory, a magnetic disk or an optical disk.
The above description is only an alternative embodiment of the present application and should not be construed as limiting the present application, and any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims (16)

1. An application detection method, characterized in that the method comprises:
acquiring attribute information and downloading information of a plurality of applications, wherein the attribute information is information for describing the applications, the downloading information is information related to downloading the applications, and the plurality of applications comprise target applications to be detected and reference applications of determined application types;
determining a first similar characteristic between the attribute information of the target application and the attribute information of the reference application, and a second similar characteristic between the download information of the target application and the download information of the reference application;
combining the first similar features and the second similar features to obtain similar vectors between the target application and the reference application, and classifying the similar vectors to obtain the similarity between the target application and the reference application;
Determining that the target application belongs to an application type to which the reference application belongs if a similarity between the target application and the reference application satisfies a similarity condition;
the attribute information and the download information both comprise a plurality of information items, the reference application comprises a first application and a second application, the target application and the first application do not have an association relationship, the target application and the second application have the association relationship, and the association relationship refers to information items with the same value;
the determining a first similarity characteristic between the attribute information of the target application and the attribute information of the reference application and a second similarity characteristic between the download information of the target application and the download information of the reference application comprises:
determining a first similar characteristic between the attribute information of the target application and the attribute information of the second application, and a second similar characteristic between the download information of the target application and the download information of the second application;
the determining that the target application belongs to the application type to which the reference application belongs in the case that the similarity between the target application and the reference application satisfies a similarity condition includes:
Determining that the target application belongs to an application type to which the second application belongs if the similarity between the target application and the second application reaches a second threshold.
2. The method of claim 1, wherein the attribute information comprises information items in a first dimension, the first similar feature comprises a first feature value in the first dimension, the download information comprises information items in a second dimension, and the second similar feature comprises a second feature value in the second dimension; the combining the first similar feature and the second similar feature to obtain a similar vector between the target application and the reference application includes:
and combining the first characteristic value in the first dimension and the second characteristic value in the second dimension according to the sequence of the first dimension and the second dimension to obtain the similar vector between the target application and the reference application.
3. The method of claim 1, wherein the classifying the similarity vector to obtain the similarity between the target application and the reference application comprises:
Calling a similar vector classification model, and classifying the similar vectors to obtain the similarity between the target application and the reference application;
the training process of the similarity vector classification model comprises the following steps:
obtaining a sample similarity vector between a first sample application and a second sample application, and a sample similarity between the first sample application and the second sample application;
calling the similar vector classification model to classify the sample similar vectors to obtain the prediction similarity between the first sample application and the second sample application;
training the similarity vector classification model based on the prediction similarity and the sample similarity.
4. The method according to claim 1, wherein the number of the reference applications is plural, and the application types of the plural reference applications are the same;
the determining that the target application belongs to the application type to which the reference application belongs in the case that the similarity between the target application and the reference application satisfies a similarity condition includes:
determining a degree of polymerization between the target application and each two of the plurality of reference applications based on a degree of similarity between the target application and each two of the plurality of reference applications, the degree of polymerization being positively correlated with the degree of similarity between each two of the plurality of reference applications;
Determining that the target application belongs to an application type to which a plurality of the reference applications belong, in a case that the degree of polymerization reaches a first threshold.
5. The method of claim 1, wherein prior to determining a first similarity characteristic between the attribute information of the target application and the attribute information of the second application and a second similarity characteristic between the download information of the target application and the download information of the second application, the method further comprises:
determining similarity between the first application and the second application, wherein the application type of the first application is determined, the application type of the second application is not determined, and the first application and the second application have the association relationship;
determining that the second application belongs to the application type to which the first application belongs if the similarity between the first application and the second application reaches the second threshold.
6. The method according to any one of claims 1 to 5, wherein after determining that the target application belongs to the application type to which the reference application belongs in the case that the similarity between the target application and the reference application satisfies a similarity condition, the method further comprises:
Receiving an application detection request sent by a terminal, wherein the application detection request comprises attribute information of a third application;
and sending a notification message to the terminal under the condition that any application with the same attribute information as the third application is found in an application set, wherein the application set comprises a plurality of applications belonging to the application types to which the reference application belongs, and the notification message is used for notifying that the third application belongs to the application type corresponding to the application set.
7. The method according to any one of claims 1 to 5, wherein the determining of the first similar feature between the attribute information of the target application and the attribute information of the reference application comprises at least one of:
the attribute information comprises an installation package identification, and a first similar characteristic between the installation package identification of the target application and the installation package identification of the reference application is determined;
the attribute information comprises an application certificate, and a first similar characteristic between the application certificate of the target application and the application certificate of the reference application is determined;
the attribute information comprises an application identifier, and a first similar characteristic between the application identifier of the target application and the application identifier of the reference application is determined;
The attribute information comprises installation package size, and a first similar characteristic between the installation package size of the target application and the installation package size of the reference application is determined.
8. The method of claim 7, wherein determining the first similarity between the installation package identifier of the target application and the installation package identifier of the reference application comprises at least one of:
determining the first similar characteristic based on an editing distance between the installation package identifiers of the target application and the reference application, wherein the editing distance refers to the number of characters required to be modified by modifying one installation package identifier into another installation package identifier;
determining the first similar characteristic based on the lengths of the same character strings in the installation package identifications of the target application and the reference application;
determining the first similar characteristics based on the number of the same fields in the installation package identifiers of the target application and the reference application;
and determining the first similar characteristics based on the number of fields which belong to the same structure in the installation package identifiers of the target application and the reference application.
9. The method of claim 7, wherein the determining a first similar characteristic between the application certificate of the target application and the application certificate of the reference application comprises:
Determining the first similar feature based on a certificate heat degree of the application certificate under the condition that the target application is the same as the application certificate of the reference application, wherein the first similar feature is negatively related to the certificate heat degree, and the certificate heat degree refers to the number of applications with the application certificate;
determining the first similar feature based on a maximum value of the certificate heat degrees corresponding to the target application and the reference application when the application certificates of the target application and the reference application are different and the difference value of the certificate heat degrees is smaller than a third threshold, wherein the first similar feature is negatively correlated with the maximum value;
and determining a target numerical value as the first similar characteristic under the condition that the application certificates of the target application and the reference application are different and the difference value of the certificate popularity is not less than the third threshold value.
10. The method of claim 7, wherein the determining a first similarity characteristic between the application identifier of the target application and the application identifier of the reference application comprises at least one of:
determining the first similar characteristic based on the lengths of the same character strings in the application identifications of the target application and the reference application;
Determining the first similar feature based on a classification result corresponding to the application identifier of the target application and the reference application, wherein the classification result corresponding to the application identifier represents a possibility that the application identifier belongs to each of a plurality of identifier types.
11. The method according to any one of claims 1-5, wherein determining a second similarity characteristic between the download information of the target application and the download information of the reference application comprises at least one of:
the downloading information comprises a downloading domain name, and a second similar characteristic between the downloading domain name of the target application and the downloading domain name of the reference application is determined, wherein the downloading domain name refers to a domain name contained in a downloading link;
the download information includes a device identification of a target device, a second similarity characteristic between the device identification corresponding to the target application and the device identification corresponding to the reference application is determined, and the target device includes a target number of devices with an earlier installation time among a plurality of devices for installing the application.
12. The method of claim 11, wherein determining a second similarity characteristic between the device identifier corresponding to the target application and the device identifier corresponding to the reference application comprises:
Determining a first numerical value as the second similar characteristic in the case that at least one identical device identification exists for the target application and the reference application;
determining a second value as the second similar characteristic in the absence of the same device identification for the target application and the reference application, the first value being greater than the second value.
13. The method of claim 11, wherein determining the second similar characteristic between the downloaded domain name of the target application and the downloaded domain name of the reference application comprises:
determining the second similar characteristic based on the number of the same downloaded domain names of the target application and the reference application.
14. An application detection apparatus, characterized in that the apparatus comprises:
the device comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring attribute information and downloading information of a plurality of applications, the attribute information is information used for describing the applications, the downloading information is information related to downloading the applications, and the plurality of applications comprise target applications to be detected and reference applications which are determined to belong to application types;
a similarity determination module, configured to determine a first similar feature between the attribute information of the target application and the attribute information of the reference application, and a second similar feature between the download information of the target application and the download information of the reference application;
The similarity determining module is further configured to combine the first similar features and the second similar features to obtain similar vectors between the target application and the reference application, and classify the similar vectors to obtain a similarity between the target application and the reference application;
a type determining module, configured to determine that the target application belongs to an application type to which the reference application belongs if a similarity between the target application and the reference application satisfies a similarity condition;
the attribute information and the download information both comprise a plurality of information items, the reference application comprises a first application and a second application, the target application and the first application do not have an association relationship, the target application and the second application have the association relationship, and the association relationship refers to information items with the same value;
the similarity determining module is configured to determine a first similar characteristic between the attribute information of the target application and the attribute information of the second application, and a second similar characteristic between the download information of the target application and the download information of the second application;
The type determining module is configured to determine that the target application belongs to the application type to which the second application belongs when the similarity between the target application and the second application reaches a second threshold.
15. A computer device, characterized in that it comprises a processor and a memory, in which at least one computer program is stored, which is loaded and executed by the processor to implement the application detection method according to any one of claims 1 to 13.
16. A computer-readable storage medium, in which at least one computer program is stored, which is loaded and executed by a processor, to implement the application detection method according to any one of claims 1 to 13.
CN202210314992.XA 2022-03-29 2022-03-29 Application detection method and device, computer equipment and storage medium Active CN114416600B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210314992.XA CN114416600B (en) 2022-03-29 2022-03-29 Application detection method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210314992.XA CN114416600B (en) 2022-03-29 2022-03-29 Application detection method and device, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN114416600A CN114416600A (en) 2022-04-29
CN114416600B true CN114416600B (en) 2022-06-28

Family

ID=81263879

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210314992.XA Active CN114416600B (en) 2022-03-29 2022-03-29 Application detection method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114416600B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115378824B (en) * 2022-08-24 2023-07-14 中国联合网络通信集团有限公司 Model similarity determination method, device, equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109992367A (en) * 2017-12-29 2019-07-09 广东欧珀移动通信有限公司 Application processing method and device, electronic equipment, computer readable storage medium
CN110209925A (en) * 2018-10-24 2019-09-06 腾讯科技(深圳)有限公司 Using method for pushing, device, computer equipment and storage medium
CN111507400A (en) * 2020-04-16 2020-08-07 腾讯科技(深圳)有限公司 Application classification method and device, electronic equipment and storage medium
CN111797239A (en) * 2020-09-08 2020-10-20 中山大学深圳研究院 Application program classification method and device and terminal equipment
CN112148305A (en) * 2020-10-28 2020-12-29 腾讯科技(深圳)有限公司 Application detection method and device, computer equipment and readable storage medium
CN112308131A (en) * 2020-10-29 2021-02-02 腾讯科技(深圳)有限公司 Sample rejection method, device, equipment and storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110598070B (en) * 2019-09-09 2022-01-25 腾讯科技(深圳)有限公司 Application type identification method and device, server and storage medium
CN110781066B (en) * 2019-10-29 2023-04-11 北京字节跳动网络技术有限公司 User behavior analysis method, device, equipment and storage medium
CN113761119A (en) * 2021-04-28 2021-12-07 腾讯科技(深圳)有限公司 State detection method and device, computer equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109992367A (en) * 2017-12-29 2019-07-09 广东欧珀移动通信有限公司 Application processing method and device, electronic equipment, computer readable storage medium
CN110209925A (en) * 2018-10-24 2019-09-06 腾讯科技(深圳)有限公司 Using method for pushing, device, computer equipment and storage medium
CN111507400A (en) * 2020-04-16 2020-08-07 腾讯科技(深圳)有限公司 Application classification method and device, electronic equipment and storage medium
CN111797239A (en) * 2020-09-08 2020-10-20 中山大学深圳研究院 Application program classification method and device and terminal equipment
CN112148305A (en) * 2020-10-28 2020-12-29 腾讯科技(深圳)有限公司 Application detection method and device, computer equipment and readable storage medium
CN112308131A (en) * 2020-10-29 2021-02-02 腾讯科技(深圳)有限公司 Sample rejection method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN114416600A (en) 2022-04-29

Similar Documents

Publication Publication Date Title
Joo et al. S-Detector: an enhanced security model for detecting Smishing attack for mobile computing
US8065731B1 (en) System and method for malware containment in communication networks
US11580222B2 (en) Automated malware analysis that automatically clusters sandbox reports of similar malware samples
CN111061874A (en) Sensitive information detection method and device
US20220066860A1 (en) System for resolution of technical issues using computing system-specific contextual data
CN113221032A (en) Link risk detection method, device and storage medium
CN114416600B (en) Application detection method and device, computer equipment and storage medium
CN107689975B (en) Cloud computing-based computer virus identification method and system
CN116956080A (en) Data processing method, device and storage medium
Thiyagarajan et al. Improved real‐time permission based malware detection and clustering approach using model independent pruning
CN111586695A (en) Short message identification method and related equipment
CN111563015A (en) Data monitoring method and device, computer readable medium and terminal equipment
KR101605783B1 (en) Malicious application detecting method and computer program executing the method
CN112214770B (en) Malicious sample identification method, device, computing equipment and medium
US20210360001A1 (en) Cluster-based near-duplicate document detection
CN111027065B (en) Leucavirus identification method and device, electronic equipment and storage medium
CN109993618A (en) Object search method, system and computer system, computer readable storage medium
CN112287952A (en) Virus clustering method, virus clustering device, storage medium and electronic device
CN109450853A (en) Malicious websites determination method, device, terminal and server
CN115001683A (en) Payment data security protection method and device, electronic equipment and storage medium
US20170171330A1 (en) Method for pushing information and electronic device
Li Research on Smartphone Trojan Detection Based on the Wireless Sensor Network
CN113204954A (en) Data detection method and device based on big data and computer readable storage medium
US9342795B1 (en) Assisted learning for document classification
Ulfath et al. Hybrid CNN-GRU framework with integrated pre-trained language transformer for SMS phishing detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40068474

Country of ref document: HK