CN113890866A - Illegal application software identification method, device, medium and electronic equipment - Google Patents

Illegal application software identification method, device, medium and electronic equipment Download PDF

Info

Publication number
CN113890866A
CN113890866A CN202111129799.0A CN202111129799A CN113890866A CN 113890866 A CN113890866 A CN 113890866A CN 202111129799 A CN202111129799 A CN 202111129799A CN 113890866 A CN113890866 A CN 113890866A
Authority
CN
China
Prior art keywords
domain name
illegal
domain
application software
suspected
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111129799.0A
Other languages
Chinese (zh)
Other versions
CN113890866B (en
Inventor
王肖斌
黄晓青
高华
傅强
蔡琳
梁彧
田野
王杰
杨满智
金红
陈晓光
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Eversec Beijing Technology Co Ltd
Original Assignee
Eversec Beijing Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Eversec Beijing Technology Co Ltd filed Critical Eversec Beijing Technology Co Ltd
Priority to CN202111129799.0A priority Critical patent/CN113890866B/en
Publication of CN113890866A publication Critical patent/CN113890866A/en
Application granted granted Critical
Publication of CN113890866B publication Critical patent/CN113890866B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/44Program or device authentication

Abstract

The embodiment of the application discloses a method, a device, a medium and electronic equipment for identifying illegal application software. The method comprises the following steps: extracting an interface access request in network flow data, and analyzing the interface access request to acquire domain name information of an interface; determining the domain name attribute of the domain name according to the domain name information; if the domain name attribute is suspected violation, determining that the domain name is suspected, and extracting domain name features of the suspected domain name as target features; according to the target characteristics and rule for generating illegal domain names, expanding the suspected domain names to obtain at least two derived domain names; and verifying whether the derived domain name is an illegal domain name, and if so, identifying illegal application software by adopting the illegal domain name. By the technical scheme, the detection rate and the recognition rate of the illegal application software can be improved.

Description

Illegal application software identification method, device, medium and electronic equipment
Technical Field
The embodiment of the application relates to the technical field of computer application, in particular to a method, a device, a medium and electronic equipment for identifying illegal application software.
Background
Currently, cases invading the legal interests of the people, which are implemented through Application software (APP), continue to be highly developed, and seriously disturb social security. In application software with numerous types and large quantity, illegal application software such as fraud-related application can be accurately identified, and the method has important significance for protecting the life and property safety of people.
At present, a general application large field of application software is mainly obtained through a Deep Packet Inspection (DPI) technology, so as to realize large identification of the application software. However, the general illegal application software does not strictly comply with the national application software filing development specification, and in many cases, the application type field cannot be acquired through the DPI technology. Moreover, fraud-related applications are often counterfeited into commercially compliant applications, such as games, instant messaging, etc., which results in a low identification rate of the offending application software of the above method.
Disclosure of Invention
The embodiment of the application provides a method, a device, a medium and electronic equipment for identifying illegal application software, and the purpose of improving the detection rate and accuracy of the illegal application software can be achieved.
In a first aspect, an embodiment of the present application provides a method for identifying an illegal application software, where the method includes:
extracting an interface access request in network flow data, and analyzing the interface access request to acquire domain name information of an interface;
determining the domain name attribute of the domain name according to the domain name information;
if the domain name attribute is suspected violation, determining that the domain name is suspected, and extracting domain name features of the suspected domain name as target features;
according to the target characteristics and rule for generating illegal domain names, expanding the suspected domain names to obtain at least two derived domain names;
and verifying whether the derived domain name is an illegal domain name, and if so, identifying illegal application software by adopting the illegal domain name.
In a second aspect, an embodiment of the present application provides an illegal application software identification device, where the device includes:
the domain name information acquisition module is used for extracting an interface access request in the network flow data and analyzing the interface access request to acquire domain name information of an interface;
the domain name attribute determining module is used for determining the domain name attribute of the domain name according to the domain name information;
a target feature determination module, configured to determine that the domain name is a suspected domain name if the domain name attribute is suspected violation, and extract a domain name feature of the suspected domain name as a target feature;
the domain name expansion module is used for expanding the suspected domain name to obtain at least two derived domain names according to the target characteristics and the rule of generating the illegal domain name;
and the illegal application software identification module is used for verifying whether the derived domain name is an illegal domain name, and if so, identifying the illegal application software by adopting the illegal domain name.
In a third aspect, an embodiment of the present application provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the method for identifying the illegal application software according to the embodiment of the present application.
In a fourth aspect, an embodiment of the present application provides an electronic device, which includes a memory, a processor, and a computer program stored on the memory and executable by the processor, where the processor implements the method for identifying the illegal application software according to the embodiment of the present application when executing the computer program.
According to the technical scheme provided by the embodiment of the application, under the condition that the domain name attribute is determined to be suspected violation, the domain name feature of the suspected domain name is extracted to serve as the target feature, the suspected domain name is expanded to obtain at least two derived domain names according to the target feature and the violation domain name generation rule, the compliance of the derived domain names is verified, and violation domain name recognition violation application software is adopted, so that violation application software is recognized. According to the method and the device, the suspected domain name is expanded according to the target characteristics and the rule for generating the illegal domain name, so that the deformation of the suspected domain name is predicted, more derived domain names are expanded from one suspected domain name, the condition that the illegal application software adopts the dynamic domain name to access resources can be effectively dealt with by executing the scheme, and the detection rate and the accuracy of the illegal application software are improved.
Drawings
Fig. 1 is a flowchart of an illegal application software identification method according to an embodiment of the present application;
FIG. 2 is a flowchart of another method for identifying an illegal application software according to the second embodiment of the present application;
fig. 3 is a schematic structural diagram of an illegal application software identification device according to a third embodiment of the present application;
fig. 4 is a schematic structural diagram of an electronic device according to a fifth embodiment of the present application.
Detailed Description
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the application and are not limiting of the application. It should be further noted that, for the convenience of description, only some of the structures related to the present application are shown in the drawings, not all of the structures.
Before discussing exemplary embodiments in more detail, it should be noted that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart may describe the steps as a sequential process, many of the steps can be performed in parallel, concurrently or simultaneously. In addition, the order of the steps may be rearranged. The process may be terminated when its operations are completed, but may have additional steps not included in the figure. The processes may correspond to methods, functions, procedures, subroutines, and the like.
Generally, the client and the server of the application software communicate by means of domain name or IP, and for the application software (compliant application) conforming to the relevant national legal standard, the client and the server usually communicate by directly accessing the main domain name to access the server resource. Since the application software (illegal application) which does not conform to the relevant national legal standard can seriously disturb social security, the illegal application becomes a key hit object of relevant departments, and once a server providing resource access service for the illegal application is detected, the relevant departments perform blocking treatment. In contrast, developers of illegal applications often adopt a mode of allocating dynamic domain names to illegal applications to avoid blocking of related departments, specifically, when users access server resources through illegal applications, the users directly access a main domain name to communicate with a server instead of directly accessing the main domain name like a compliant application, but first go to a specified interface to obtain an available server address list, and once the currently used domain name is blocked, new domain names are randomly generated for the illegal applications to use, so that a continuously available state is achieved. Inspired by the development thought of the illegal application developer, the illegal application software identification method can effectively deal with the condition that the illegal application software adopts the dynamic domain name for resource access, and has the advantages of high detection rate and high accuracy rate of the illegal application software.
Example one
Fig. 1 is a flowchart of a method for identifying an illegal application software according to an embodiment of the present application, which is applicable to identifying the illegal application software. The method can be executed by the illegal application software identification device provided by the embodiment of the application, and the device can be realized in a software and/or hardware mode and can be integrated in the electronic equipment running the system.
As shown in fig. 1, the method for identifying the illegal application software comprises the following steps:
s110, extracting an interface access request in the network flow data, and analyzing the interface access request to acquire domain name information of the interface.
The network traffic data is a data packet intercepted and transmitted and received in the network transmission through a packet capturing tool. The network traffic data may include an interface access request, where the interface access request is a request for obtaining a dynamic domain name. Generally, an interface access request is generated first when an illegal application requests resources from a server, a transit website is accessed based on the interface access request, and when the transit website is called, a domain name pool is returned, so that the illegal application can communicate with a background server based on domain names in the domain name pool to access background server resources.
The method includes the steps of extracting an interface access request in network flow data, analyzing the interface access request to obtain domain name information of an interface, specifically, capturing the interface access request by using a wireshark tool, obtaining response content of the interface access request, and extracting the domain name information of the interface from the response content. Illustratively, the interface access request captured by wireshark is:
GET/line/apUrls/getUrlplatformId=A03&domain0rl=https://47.75.215.58:808HTTP/1.1
Host:179.33.19.174:3389
Connection:KeepAlive
Accept-Encoding:gzip
User-Agent:okhttp/3.14.2
the response content corresponding to the interface access request is as follows:
HTTP/1.1 200
Content-Type:application/json;charset=UTF-8
Transfer-Encoding:chunked
Gate:Wed.03Jun 2020 08:54:10GMT
{ "code": 200, "message": this "operation success", "data": https:// m.5005559.com "}
The https:// m.5005559.com in the response content is domain name information.
Some interface access requests are encrypted, and the response content of the interface access requests cannot be directly obtained for the encrypted interface access requests. In an optional embodiment, the extracting an interface access request in network traffic data, and parsing the interface access request to obtain domain name information of an interface includes: if the interface access request is encrypted, interface access is carried out by utilizing a simulation interface access tool according to the interface access request so as to obtain an interface response; and extracting an interface access link included in the interface response content, and determining domain name information of the interface according to the interface access link.
The simulation interface access tool may be a postman, and specifically, the postman simulates application software to send a post request according to the interface access request to obtain a server response. And accessing the link by the interface in the obtained server response, and extracting the domain name information of the interface from the interface access link.
Optionally, whether the interface access request is encrypted is determined through traffic packet forwarding, and if the application software adopts an https protocol, the interface access request is encrypted. If the application software adopts the http protocol, the interface access request is unencrypted. The method for acquiring the domain name information aims at the condition that the interface access request is the encryption request, the domain name information is extracted from the encrypted interface access request, and the detection rate of illegal applications is improved.
And S120, determining the domain name attribute of the domain name according to the domain name information.
The domain name attribute refers to the compliance of the domain name. In practice, the domain name attribute is determined by the compliance of the domain name to the address. If the domain name points to an offending website, such as a fraud website, the domain name attribute of the domain name is an explicit violation. If the domain name points to a compliance website, such as a bank official website, the domain name attribute of the domain name is explicit compliance. And under the condition that the domain name pointing to the website attribute is undetermined, the domain name attribute of the domain name is suspected violation. And determining the domain name attribute of the domain name according to the domain name information, specifically determining the domain name attribute according to the domain name content.
In an alternative embodiment, the domain name attributes include an express compliance, an express violation, and a suspected violation; correspondingly, the determining the domain name attribute of the domain name according to the domain name information includes: if the domain name information is successfully matched with the domain name white list, the domain name attribute of the domain name is in definite compliance; if the domain name information is successfully matched with the domain name blacklist, the domain name attribute of the domain name is clear violation; and if the matching of the domain name information and the domain name white list fails and the matching of the domain name information and the domain name black list fails, the domain name attribute of the domain name is suspected violation.
The domain name white list is formed by domain names pointing to compliance websites, and the domain names in the domain name white list have definite compliance attributes. Correspondingly, the domain name blacklist is also added to the violation application software identification system in advance by related technicians, the domain name blacklist is composed of domain names pointing to violation websites, and domain names in the domain name blacklist have clear violation of the domain name attributes. Along with the development of illegal software identification work, the domain name white list and the domain name black list are also updated and expanded continuously.
And determining the domain name attribute of the domain name according to the domain name information. Further, the domain name content may be respectively matched with a domain name white list and a domain name black list, and if the domain name information is successfully matched with the domain name white list, it indicates that the domain name is included in the domain name white list, and the domain name attribute is an explicit compliance; if the domain name information is successfully matched with the domain name blacklist, the domain name is indicated to be included in the domain name blacklist, and the domain name attribute is clear violation; if the matching between the domain name information and the domain name white list fails and the matching between the domain name information and the domain name black list fails, the domain name is not recorded by the domain name white list or the domain name black list, the domain name attribute of the domain name is undetermined, and the domain name is determined as suspected violation. The domain name is qualified, the domain names are screened at the same time, and only the domain names with the domain name attribute suspected to be illegal are subjected to subsequent processing, so that the calculation amount is reduced, and the identification efficiency of illegal software is improved.
S130, if the domain name attribute is suspected to be illegal, determining that the domain name is suspected, and extracting the domain name feature of the suspected domain name as a target feature.
If the domain name attribute is suspected violation, the compliance of the website pointed by the domain name is uncertain, the domain name is also uncertain, and the domain name is determined as the suspected domain name. And extracting domain name features of the suspected domain name as target features for further processing.
In an alternative embodiment, the domain name characteristics include at least one of domain name pointing, domain name content, and domain name form. The domain name pointing can reflect a destination IP address and a destination port number of resource access based on the domain name, and multiple jumping situations can exist when illegal application accesses resources generally; the domain name content refers to the composition of a domain name, and comprises characters and symbols; the domain name form includes at least one of a combination manner between characters, a number of characters, and a type of characters. For example, the character type may be an english alphabet or a number, etc.
And S140, according to the target characteristics and the rule for generating the illegal domain name, expanding the suspected domain name to obtain at least two derived domain names.
The rule for generating the illegal domain name is used for predicting the illegal domain name, and the rule for generating the illegal domain name is determined according to the domain name characteristics of the illegal domain name. The number of the illegal domain name generation rules may be multiple, and specific contents of the illegal domain name generation rules are not limited herein and may be determined according to actual conditions. Optionally, the illegal domain name generation rule is a combination rule of characters and symbols, or a character content variation rule.
And expanding the suspected domain name according to the target feature and the illegal domain name generation rule, wherein the target feature is actually used as a reference, and the illegal domain name generation rule is utilized to expand the suspected domain name to obtain a derivative domain name. The derived domain name refers to a domain name obtained by expanding the suspected domain name according to the rule of violation generation, and the derived domain name is a variant of the suspected domain name with the target feature. The specific number of derived domain names is not limited herein, and is determined according to actual business requirements and usage of computing resources. It can be appreciated that the greater the number of derived domain names, the greater the probability of detecting a violating domain name, while the more computing resources are required to validate the derived domain names.
Illustratively, if the suspected domain name is https:// m.55591d.com/, the main content of the suspected domain name is a number, and the illegal domain name is expanded by using the character content variation rule in the illegal domain name generation rule to obtain a derivative domain name, which can be https:// m.5006659.com, https:// m.266503.com, https:// m.22633g.com, https:// m.5155039.com or ht539tps:// m.98b.com.
S150, verifying whether the derived domain name is an illegal domain name, and if so, identifying illegal application software by adopting the illegal domain name.
Verifying whether the derived domain names are illegal domain names, specifically, configuring DPI parameters according to the derived domain names, performing analog interface access on the derived domain names, verifying the compliance of page display contents through the existing page content verification algorithm after jumping to the corresponding page, and determining whether the derived domain names are illegal domain names according to the compliance of the page display contents. The page content verification algorithm may be a content identification algorithm, a keyword matching algorithm, an image identification algorithm, or the like, which is not the focus of research in the present application and is not described herein again.
After the illegal domain name is determined, tracing the application software according to the illegal domain name, and identifying the illegal application software. The illegal application software refers to software violating the relevant national laws and regulations, and comprises fraud-related software, pornographic software, phishing software and the like.
According to the technical scheme provided by the embodiment of the application, under the condition that the domain name attribute is determined to be suspected violation, the domain name feature of the suspected domain name is extracted to serve as the target feature, the suspected domain name is expanded to obtain at least two derived domain names according to the target feature and the violation domain name generation rule, the compliance of the derived domain names is verified, and violation domain name recognition violation application software is adopted, so that violation application software is recognized. According to the method and the device, the suspected domain name is expanded according to the target characteristics and the rule for generating the illegal domain name, so that the deformation of the suspected domain name is predicted, more derived domain names are expanded from one suspected domain name, the scheme can be executed to effectively deal with the condition that the illegal application software adopts the dynamic domain name for resource access, the detection rate of the illegal application software is improved, and the identification accuracy of the illegal application software is ensured by tracing the source of the application software according to the illegal domain name and identifying the illegal application software.
Example two
Fig. 2 is a flowchart of another method for identifying an illegal application software according to the second embodiment of the present application. The present embodiment is further optimized on the basis of the above-described embodiments. Specifically, the optimizing is to expand the suspected domain name according to the target feature and the rule for generating the illegal domain name to obtain at least two derived domain names, and includes: determining the rule for generating the illegal domain name according to a domain name blacklist and a domain name whitelist; generating a set number of candidate domain names based on the rule of generating the illegal domain name, and extracting the domain name characteristics of the candidate domain names; determining feature similarity between the target feature and the domain name feature of the candidate domain name by using a fuzzy matching principle; and selecting at least two derived domain names from the candidate domain names according to the feature similarity.
As shown in fig. 2, the method for identifying the illegal application software includes:
s210, extracting an interface access request in the network flow data, and analyzing the interface access request to acquire domain name information of the interface.
And S220, determining the domain name attribute of the domain name according to the domain name information.
And S230, if the domain name attribute is suspected violation, determining that the domain name is suspected, and extracting the domain name feature of the suspected domain name as a target feature.
S240, determining the rule for generating the illegal domain name according to the domain name blacklist and the domain name whitelist.
The domain names in the domain name blacklist are all illegal domain names, a forward guidance function is provided for predicting the illegal domain names, and the illegal domain name generation rule can be determined by statistically analyzing the characteristics of the domain names in the domain name blacklist in the aspects of content, form, direction and the like. According to the domain name blacklist, the specific characteristics that the illegal domain name is different from the illegal domain name of the compliance software can be determined, for example, for the illegal domain name with the domain name content taking the number as the main body, the dynamic domain name used by the same illegal application software is related in the domain name form, and the possibility of the same illegal application software is also high, and generally only the replacement of similar numbers is realized. Alternatively, the non-rule domain name is formed by randomly inserting some characters or symbols into a well-defined complete English word such as sex to change sex into saewwx. For such illegal domains, the dynamic domain used by the same illegal application software is likely to be a difference in the content or location of the inserted characters or symbols.
It will be appreciated that the offending domain may also be a variation of the compliance domain, as fishing software typically replaces characters in the compliance domain similarly, formally mimicking compliance software from the domain. By statistically analyzing the characteristics of the compliant domain names in the domain name white list in terms of content, form, direction and the like, the rule for generating the illegal domain name can be determined. Specifically, the rule for generating the illegal domain name may be a form of determining the illegal domain name by referring to a domain name white list, and performing similar replacement on character contents included in the domain name through the domain name white list.
And S250, generating a set number of candidate domain names based on the rule of generating the illegal domain name, and extracting the domain name characteristics of the candidate domain names.
The candidate domain name is generated according to the rule of generating the illegal domain name and is the domain name with the characteristic of the illegal domain name. The number of candidate domain names is not limited herein, and is specifically determined according to actual service requirements. And extracting domain name characteristics of the candidate domain name, specifically including extracting domain name content, domain name form and domain name pointing characteristics of the candidate domain name.
S260, determining feature similarity between the target feature and the domain name feature of the candidate domain name by using a fuzzy matching principle.
Fuzzy matching is a matching algorithm distinguished from precise matching, and fuzzy matching refers to giving rough degree of matching according to given conditions or requirements. The characteristic similarity reflects the similarity of the candidate domain name and the suspected domain name in the domain name characteristic dimension, and can be used for screening the candidate domain name.
The characteristic similarity between the target characteristic and the domain name characteristic of the candidate domain name is determined by utilizing the fuzzy matching principle, so that the candidate domain name can be screened according to the characteristic similarity in the subsequent process, the obtained screening result and the suspected domain name have common characteristics and differential characteristics simultaneously, and the domain name blacklist and the illegal domain name naming rule are expanded.
S270, selecting at least two derived domain names from the candidate domain names according to the feature similarity.
The derived domain name is generated among the candidate domain names, the derived domain name being a candidate domain name having the target feature. According to the feature similarity, the derived domain names are selected from the candidate domain names, for example, the candidate domain names can be sorted according to the sequence of the feature similarity from high to low, and the candidate domain names ranked in the set range, such as the top 30% or the top 50%, are selected as the derived domain names. Wherein the setting range can be determined according to the actual proportion of the illegal domain name in the derived domain name. The actual proportion of the illegal domain name in the derived domain name can be obtained by counting the verification result after the verification process of verifying whether the derived domain name is the illegal domain name is performed.
S280, verifying whether the derived domain name is an illegal domain name, and if so, identifying illegal application software by adopting the illegal domain name.
In an alternative embodiment, identifying the offending application software using the offending domain name includes: adding the illegal domain name into a domain name blacklist, and monitoring network traffic data according to the domain name blacklist; if any illegal domain name included in the domain name blacklist appears in the network traffic data, tracing the source of the application software according to the network traffic data, and marking the application software as illegal software.
And adding the illegal domain name into a domain name blacklist to expand the domain name blacklist, issuing a monitoring list according to the domain name blacklist, and designating the illegal domain name in the monitoring domain name blacklist. If the access record of the illegal domain name is monitored in the network flow data, tracing the source of the application software according to the network flow data, and marking the application software as the illegal software to identify the illegal application software. Specifically, the method and the device collect new domain names pushed out by the target website by monitoring the target website. And comparing the collected newly released domain name with a domain name blacklist to realize the identification of the illegal application software.
The target website refers to a website that returns to a domain name pool when being called. The domain name in the domain name pool pointed by the target website is the page display content of the application software, and the domain name pool can solve the problem that the application software cannot be used after the communication domain name of the application software is invalid. According to the method, network traffic data are monitored according to a domain name blacklist; when the illegal domain name appears in the network flow data, the application software is traced, the illegal application software is identified, and the identification accuracy of the illegal application software is guaranteed.
According to the technical scheme provided by the embodiment of the application, the rule for generating the illegal domain name is determined according to the domain name blacklist and the domain name whitelist; generating a set number of candidate domain names based on rule of illegal domain name generation, and extracting domain name features of the candidate domain names; determining feature similarity between the target feature and the domain name feature of the candidate domain name by using a fuzzy matching principle; and selecting at least two derived domain names from the candidate domain names according to the feature similarity. And verifying whether the derived domain name is an illegal domain name, and if so, identifying illegal application software by adopting the illegal domain name. According to the method and the device, the derivative domain name generated based on the rule of illegal domain name generation is used, so that the common characteristics between the derivative domain name and the suspected domain name are guaranteed, and the difference characteristics are kept.
EXAMPLE III
Fig. 3 is a device for identifying an illegal application software according to a third embodiment of the present application, which is applicable to the case of identifying an illegal application software. The device can be realized by software and/or hardware, and can be integrated in electronic equipment such as an intelligent terminal.
As shown in fig. 3, the apparatus may include: a domain name information acquisition module 310, a domain name attribute determination module 320, a target feature determination module 330, a domain name extension module 340, and an offending application identification module 350.
The domain name information obtaining module 310 is configured to extract an interface access request in network traffic data, and analyze the interface access request to obtain domain name information of an interface;
a domain name attribute determining module 320, configured to determine a domain name attribute of the domain name according to the domain name information;
a target feature determining module 330, configured to determine that the domain name is a suspected domain name if the domain name attribute is suspected violation, and extract a domain name feature of the suspected domain name as a target feature;
a domain name expansion module 340, configured to expand the suspected domain name according to the target feature and the rule for generating an illegal domain name to obtain at least two derived domain names;
and the illegal application software identification module 350 is used for verifying whether the derived domain name is an illegal domain name, and if so, identifying the illegal application software by using the illegal domain name.
According to the technical scheme provided by the embodiment of the application, under the condition that the domain name attribute is determined to be suspected violation, the domain name feature of the suspected domain name is extracted to serve as the target feature, the suspected domain name is expanded to obtain at least two derived domain names according to the target feature and the violation domain name generation rule, the compliance of the derived domain names is verified, and violation domain name recognition violation application software is adopted, so that violation application software is recognized. According to the method and the device, the suspected domain name is expanded according to the target characteristics and the rule for generating the illegal domain name, so that the deformation of the suspected domain name is predicted, more derived domain names are expanded from one suspected domain name, the condition that the illegal application software adopts the dynamic domain name to access resources can be effectively dealt with by executing the scheme, and the detection rate and the accuracy of the illegal application software are improved.
Optionally, the domain name extension module 340 includes: the illegal domain name generation rule determining sub-module is used for determining the illegal domain name generation rule according to a domain name blacklist and a domain name whitelist; the domain name feature extraction sub-module is used for generating a set number of candidate domain names based on the rule of generating the illegal domain name and extracting the domain name features of the candidate domain names; the characteristic similarity determining sub-module is used for determining the characteristic similarity between the target characteristic and the domain name characteristic of the candidate domain name by using a fuzzy matching principle; and the derived domain name determining sub-module is used for selecting at least two derived domain names from the candidate domain names according to the feature similarity.
Optionally, the domain name attribute includes an explicit compliance, an explicit violation, and a suspected violation; accordingly, the domain name attribute determining module 320 includes: a domain name attribute determination first sub-module, configured to determine that the domain name attribute of the domain name is in clear compliance if the domain name information is successfully matched with a domain name white list; the domain name attribute determination second sub-module is used for determining that the domain name attribute of the domain name is an explicit violation if the domain name information is successfully matched with the domain name blacklist; and the domain name attribute determination third sub-module is used for determining the domain name attribute of the domain name as suspected violation if the matching of the domain name information and the domain name white list fails and the matching of the domain name information and the domain name black list fails.
Optionally, the illegal application software identification module 350 includes: a domain name compliance verification sub-module and an illegal application software identification sub-module; the domain name compliance verification sub-module is specifically used for verifying whether the derived domain name is an illegal domain name; and the illegal application software identification submodule is specifically used for identifying the illegal application software by adopting the illegal domain name.
The illegal application software identification submodule comprises: the traffic data monitoring unit is used for adding the illegal domain name into a domain name blacklist and monitoring network traffic data according to the domain name blacklist; and the illegal application software identification unit is used for tracing the application software according to the network traffic data and marking the application software as illegal software if any illegal domain name included in the domain name blacklist is monitored to appear in the network traffic data.
Optionally, the domain name information obtaining module 310 includes: the interface response obtaining submodule is used for carrying out interface access by utilizing a simulation interface access tool according to the interface access request to obtain an interface response if the interface access request is subjected to encryption processing; and the domain name information determining submodule is used for extracting the interface access link included in the interface response content and determining the domain name information of the interface according to the interface access link.
Optionally, the domain name feature includes at least one of domain name direction, domain name content, and domain name form.
The illegal application software identification device provided by the embodiment of the invention can execute the illegal application software identification method provided by any embodiment of the invention, and has the corresponding performance module and beneficial effects of executing the illegal application software identification method.
Example four
A fourth embodiment of the present application further provides a storage medium containing computer-executable instructions, which when executed by a computer processor, are configured to perform a method for identifying an illegal application software, the method including:
extracting an interface access request in network flow data, and analyzing the interface access request to acquire domain name information of an interface;
determining the domain name attribute of the domain name according to the domain name information;
if the domain name attribute is suspected violation, determining that the domain name is suspected, and extracting domain name features of the suspected domain name as target features;
according to the target characteristics and rule for generating illegal domain names, expanding the suspected domain names to obtain at least two derived domain names;
and verifying whether the derived domain name is an illegal domain name, and if so, identifying illegal application software by adopting the illegal domain name.
Storage media refers to any of various types of memory electronics or storage electronics. The term "storage medium" is intended to include: mounting media such as CD-ROM, floppy disk, or tape devices; computer system memory or random access memory such as DRAM, DDR RAM, SRAM, EDO RAM, Lanbas (Rambus) RAM, etc.; non-volatile memory such as flash memory, magnetic media (e.g., hard disk or optical storage); registers or other similar types of memory elements, etc. The storage medium may also include other types of memory or combinations thereof. In addition, the storage medium may be located in the computer system in which the program is executed, or may be located in a different second computer system connected to the computer system through a network (such as the internet). The second computer system may provide the program instructions to the computer for execution. The term "storage medium" may include two or more storage media that may reside in different unknowns (e.g., in different computer systems connected by a network). The storage medium may store program instructions (e.g., embodied as a computer program) that are executable by one or more processors.
Of course, the storage medium provided in the embodiments of the present application includes computer-executable instructions, and the computer-executable instructions are not limited to the above-mentioned identification operation of the illegal application software, and may also perform related operations in the identification method of the illegal application software provided in any embodiment of the present application.
EXAMPLE five
An electronic device provided in the fifth embodiment of the present application may be integrated with the illegal application software identification device provided in the fifth embodiment of the present application, and the electronic device may be configured in a system or may be a device that executes part or all of the performance in the system. Fig. 4 is a schematic structural diagram of an electronic device according to a fifth embodiment of the present application. As shown in fig. 4, the present embodiment provides an electronic device 400, which includes: one or more processors 420; the storage device 410 is configured to store one or more programs, and when the one or more programs are executed by the one or more processors 420, the one or more processors 420 implement the method for identifying the illegal application software provided by the embodiment of the present application, the method includes:
extracting an interface access request in network flow data, and analyzing the interface access request to acquire domain name information of an interface;
determining the domain name attribute of the domain name according to the domain name information;
if the domain name attribute is suspected violation, determining that the domain name is suspected, and extracting domain name features of the suspected domain name as target features;
according to the target characteristics and rule for generating illegal domain names, expanding the suspected domain names to obtain at least two derived domain names;
and verifying whether the derived domain name is an illegal domain name, and if so, identifying illegal application software by adopting the illegal domain name.
Of course, those skilled in the art can understand that the processor 420 also implements the technical solution of the illegal application software identification method provided in any embodiment of the present application.
The electronic device 400 shown in fig. 4 is only an example, and should not bring any limitation to the performance and the scope of use of the embodiments of the present application.
As shown in fig. 4, the electronic device 400 includes a processor 420, a storage device 410, an input device 430, and an output device 440; the number of the processors 420 in the electronic device may be one or more, and one processor 420 is taken as an example in fig. 4; the processor 420, the storage device 410, the input device 430, and the output device 440 in the electronic apparatus may be connected by a bus or other means, and are exemplified by a bus 450 in fig. 4.
The storage device 410 is a computer-readable storage medium, and can be used to store software programs, computer-executable programs, and module units, such as program instructions corresponding to the illegal application software identification method in the embodiment of the present application.
The storage device 410 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, at least one application program required for performance; the storage data area may store data created according to the use of the terminal, and the like. Further, the storage 410 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, storage 410 may further include memory located remotely from processor 420, which may be connected via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input means 430 may be used to receive input numbers, character information, or voice information, and to generate key signal inputs related to user settings and performance control of the electronic device. The output device 440 may include a display screen, speakers, or other electronic equipment.
The illegal application software identification device, the medium and the electronic equipment provided in the embodiments can execute the illegal application software identification method provided in any embodiment of the application, and have corresponding performance modules and beneficial effects for executing the method. Technical details that are not described in detail in the above embodiments may be referred to in the illegal application software identification method provided in any embodiments of the present application.
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present application and the technical principles employed. It will be understood by those skilled in the art that the present application is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the application. Therefore, although the present application has been described in more detail with reference to the above embodiments, the present application is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present application, and the scope of the present application is determined by the scope of the appended claims.

Claims (10)

1. A method for identifying offending application software, the method comprising:
extracting an interface access request in network flow data, and analyzing the interface access request to acquire domain name information of an interface;
determining the domain name attribute of the domain name according to the domain name information;
if the domain name attribute is suspected violation, determining that the domain name is suspected, and extracting domain name features of the suspected domain name as target features;
according to the target characteristics and rule for generating illegal domain names, expanding the suspected domain names to obtain at least two derived domain names;
and verifying whether the derived domain name is an illegal domain name, and if so, identifying illegal application software by adopting the illegal domain name.
2. The method of claim 1, wherein expanding the suspected domain name to obtain at least two derivative domain names according to the target feature and an illegal domain name generation rule comprises:
determining the rule for generating the illegal domain name according to a domain name blacklist and a domain name whitelist;
generating a set number of candidate domain names based on the rule of generating the illegal domain name, and extracting the domain name characteristics of the candidate domain names;
determining feature similarity between the target feature and the domain name feature of the candidate domain name by using a fuzzy matching principle;
and selecting at least two derived domain names from the candidate domain names according to the feature similarity.
3. The method of claim 1, wherein the domain name attributes include an express compliance, an express violation, and a suspected violation; correspondingly, the determining the domain name attribute of the domain name according to the domain name information includes:
if the domain name information is successfully matched with the domain name white list, the domain name attribute of the domain name is in definite compliance;
if the domain name information is successfully matched with the domain name blacklist, the domain name attribute of the domain name is clear violation;
and if the matching of the domain name information and the domain name white list fails and the matching of the domain name information and the domain name black list fails, the domain name attribute of the domain name is suspected violation.
4. The method of claim 1, wherein identifying the offending application software using the offending domain name comprises:
adding the illegal domain name into a domain name blacklist, and monitoring network traffic data according to the domain name blacklist;
if any illegal domain name included in the domain name blacklist appears in the network traffic data, tracing the source of the application software according to the network traffic data, and marking the application software as illegal software.
5. The method of claim 1, wherein the extracting the interface access request from the network traffic data and parsing the interface access request to obtain domain name information of the interface comprises:
if the interface access request is encrypted, interface access is carried out by utilizing a simulation interface access tool according to the interface access request so as to obtain an interface response;
and extracting an interface access link included in the interface response content, and determining domain name information of the interface according to the interface access link.
6. The method of claim 1, wherein the domain name characteristics comprise at least one of domain name pointing, domain name content, and domain name form.
7. An illegal application software identification device, characterized in that the device comprises:
the domain name information acquisition module is used for extracting an interface access request in the network flow data and analyzing the interface access request to acquire domain name information of an interface;
the domain name attribute determining module is used for determining the domain name attribute of the domain name according to the domain name information;
a target feature determination module, configured to determine that the domain name is a suspected domain name if the domain name attribute is suspected violation, and extract a domain name feature of the suspected domain name as a target feature;
the domain name expansion module is used for expanding the suspected domain name to obtain at least two derived domain names according to the target characteristics and the rule of generating the illegal domain name;
and the illegal application software identification module is used for verifying whether the derived domain name is an illegal domain name, and if so, identifying the illegal application software by adopting the illegal domain name.
8. The apparatus of claim 7, wherein the domain name extension module comprises:
the illegal domain name generation rule determining sub-module is used for determining the illegal domain name generation rule according to a domain name blacklist and a domain name whitelist;
the domain name feature extraction sub-module is used for generating a set number of candidate domain names based on the rule of generating the illegal domain name and extracting the domain name features of the candidate domain names;
the characteristic similarity determining sub-module is used for determining the characteristic similarity between the target characteristic and the domain name characteristic of the candidate domain name by using a fuzzy matching principle;
and the derived domain name determining sub-module is used for selecting at least two derived domain names from the candidate domain names according to the feature similarity.
9. A computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, carries out the method for identifying an offending application software according to any one of claims 1-6.
10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method for identifying offending application software according to any of claims 1-6 when executing the computer program.
CN202111129799.0A 2021-09-26 2021-09-26 Illegal application software identification method, device, medium and electronic equipment Active CN113890866B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111129799.0A CN113890866B (en) 2021-09-26 2021-09-26 Illegal application software identification method, device, medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111129799.0A CN113890866B (en) 2021-09-26 2021-09-26 Illegal application software identification method, device, medium and electronic equipment

Publications (2)

Publication Number Publication Date
CN113890866A true CN113890866A (en) 2022-01-04
CN113890866B CN113890866B (en) 2024-03-12

Family

ID=79006732

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111129799.0A Active CN113890866B (en) 2021-09-26 2021-09-26 Illegal application software identification method, device, medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN113890866B (en)

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105530251A (en) * 2015-12-14 2016-04-27 深圳市深信服电子科技有限公司 Method and device for identifying phishing website
US20170295187A1 (en) * 2016-04-06 2017-10-12 Cisco Technology, Inc. Detection of malicious domains using recurring patterns in domain names
US10089661B1 (en) * 2016-12-15 2018-10-02 Amazon Technologies, Inc. Identifying software products to test
US20200092333A1 (en) * 2018-09-16 2020-03-19 Microsoft Technology Licensing, Llc Content policy based notification of application users about malicious browser plugins
CN111224948A (en) * 2019-11-29 2020-06-02 云深互联(北京)科技有限公司 Method, device, equipment and storage medium for discovering application
CN112000518A (en) * 2020-08-13 2020-11-27 深圳本地宝新媒体技术有限公司 Application program fault risk processing method, device and system, terminal and equipment
CN112131507A (en) * 2020-09-25 2020-12-25 成都知道创宇信息技术有限公司 Website content processing method, device, server and computer-readable storage medium
CN112667875A (en) * 2020-12-24 2021-04-16 恒安嘉新(北京)科技股份公司 Data acquisition method, data analysis method, data acquisition device, data analysis device, equipment and storage medium
CN112685072A (en) * 2020-12-31 2021-04-20 恒安嘉新(北京)科技股份公司 Method, device, equipment and storage medium for generating communication address knowledge base
CN112685255A (en) * 2020-12-30 2021-04-20 恒安嘉新(北京)科技股份公司 Interface monitoring method and device, electronic equipment and storage medium
CN113067820A (en) * 2021-03-19 2021-07-02 深圳市安络科技有限公司 Method, device and equipment for early warning abnormal webpage and/or APP
US20210256089A1 (en) * 2020-02-18 2021-08-19 BluBracket, Inc. Identifying and monitoring relevant enterprise data stored in software development repositories
CN113411322A (en) * 2021-06-16 2021-09-17 中国银行股份有限公司 Network traffic monitoring method and device for preventing financial fraud based on block chain
CN113407886A (en) * 2021-07-10 2021-09-17 广州数智网络科技有限公司 Network crime platform identification method, system, device and computer storage medium

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105530251A (en) * 2015-12-14 2016-04-27 深圳市深信服电子科技有限公司 Method and device for identifying phishing website
US20170295187A1 (en) * 2016-04-06 2017-10-12 Cisco Technology, Inc. Detection of malicious domains using recurring patterns in domain names
US10089661B1 (en) * 2016-12-15 2018-10-02 Amazon Technologies, Inc. Identifying software products to test
US20200092333A1 (en) * 2018-09-16 2020-03-19 Microsoft Technology Licensing, Llc Content policy based notification of application users about malicious browser plugins
CN111224948A (en) * 2019-11-29 2020-06-02 云深互联(北京)科技有限公司 Method, device, equipment and storage medium for discovering application
US20210256089A1 (en) * 2020-02-18 2021-08-19 BluBracket, Inc. Identifying and monitoring relevant enterprise data stored in software development repositories
CN112000518A (en) * 2020-08-13 2020-11-27 深圳本地宝新媒体技术有限公司 Application program fault risk processing method, device and system, terminal and equipment
CN112131507A (en) * 2020-09-25 2020-12-25 成都知道创宇信息技术有限公司 Website content processing method, device, server and computer-readable storage medium
CN112667875A (en) * 2020-12-24 2021-04-16 恒安嘉新(北京)科技股份公司 Data acquisition method, data analysis method, data acquisition device, data analysis device, equipment and storage medium
CN112685255A (en) * 2020-12-30 2021-04-20 恒安嘉新(北京)科技股份公司 Interface monitoring method and device, electronic equipment and storage medium
CN112685072A (en) * 2020-12-31 2021-04-20 恒安嘉新(北京)科技股份公司 Method, device, equipment and storage medium for generating communication address knowledge base
CN113067820A (en) * 2021-03-19 2021-07-02 深圳市安络科技有限公司 Method, device and equipment for early warning abnormal webpage and/or APP
CN113411322A (en) * 2021-06-16 2021-09-17 中国银行股份有限公司 Network traffic monitoring method and device for preventing financial fraud based on block chain
CN113407886A (en) * 2021-07-10 2021-09-17 广州数智网络科技有限公司 Network crime platform identification method, system, device and computer storage medium

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
ASAD MEHBOOB ET AL.: "Smart Fraud Detection Framework for Job Recruitments", SPRINGER *
吴;时镇军;: "防欺诈域名安全管理系统的研究和应用", 电信工程技术与标准化, no. 12 *
林海伦;李焱;王伟平;岳银亮;林政;: "高效的基于段模式的恶意URL检测方法", 通信学报, no. 1 *
高学勤;王涛;: "电子商务网站违法行为监管平台", 计算机系统应用, no. 08 *

Also Published As

Publication number Publication date
CN113890866B (en) 2024-03-12

Similar Documents

Publication Publication Date Title
US20210058354A1 (en) Determining Authenticity of Reported User Action in Cybersecurity Risk Assessment
CN112468520B (en) Data detection method, device and equipment and readable storage medium
CN112866023B (en) Network detection method, model training method, device, equipment and storage medium
CN108932426B (en) Unauthorized vulnerability detection method and device
CN109768992B (en) Webpage malicious scanning processing method and device, terminal device and readable storage medium
EP3852327A1 (en) Exception access behavior identification method and server
US10482240B2 (en) Anti-malware device, anti-malware system, anti-malware method, and recording medium in which anti-malware program is stored
CN109547426B (en) Service response method and server
CN114003903A (en) Network attack tracing method and device
CN113190838A (en) Web attack behavior detection method and system based on expression
CN114157568B (en) Browser secure access method, device, equipment and storage medium
Fu et al. Flowintent: Detecting privacy leakage from user intention to network traffic mapping
CN113704772A (en) Safety protection processing method and system based on user behavior big data mining
CN112422486B (en) SDK-based safety protection method and device
CN110955890B (en) Method and device for detecting malicious batch access behaviors and computer storage medium
US9904661B2 (en) Real-time agreement analysis
CN108650274B (en) Network intrusion detection method and system
CN107995167B (en) Equipment identification method and server
CN113890866B (en) Illegal application software identification method, device, medium and electronic equipment
CN114006735B (en) Data protection method, device, computer equipment and storage medium
CN113709513B (en) Equipment fingerprint processing method, user side, server, system and storage medium
CN113051876B (en) Malicious website identification method and device, storage medium and electronic equipment
KR102258965B1 (en) Method and device for classifying range of web attack types by using information on method field of http protocol and information on content-type field of http protocol
CN114969450A (en) User behavior analysis method, device, equipment and storage medium
CN112637171A (en) Data traffic processing method, device, equipment, system and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant