CN113890866B - Illegal application software identification method, device, medium and electronic equipment - Google Patents

Illegal application software identification method, device, medium and electronic equipment Download PDF

Info

Publication number
CN113890866B
CN113890866B CN202111129799.0A CN202111129799A CN113890866B CN 113890866 B CN113890866 B CN 113890866B CN 202111129799 A CN202111129799 A CN 202111129799A CN 113890866 B CN113890866 B CN 113890866B
Authority
CN
China
Prior art keywords
domain name
illegal
domain
application software
suspected
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111129799.0A
Other languages
Chinese (zh)
Other versions
CN113890866A (en
Inventor
王肖斌
黄晓青
高华
傅强
蔡琳
梁彧
田野
王杰
杨满智
金红
陈晓光
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Eversec Beijing Technology Co Ltd
Original Assignee
Eversec Beijing Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Eversec Beijing Technology Co Ltd filed Critical Eversec Beijing Technology Co Ltd
Priority to CN202111129799.0A priority Critical patent/CN113890866B/en
Publication of CN113890866A publication Critical patent/CN113890866A/en
Application granted granted Critical
Publication of CN113890866B publication Critical patent/CN113890866B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/44Program or device authentication

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Stored Programmes (AREA)

Abstract

The embodiment of the application discloses a method, a device, a medium and electronic equipment for identifying illegal application software. The method comprises the following steps: extracting an interface access request in network flow data, and analyzing the interface access request to acquire domain name information of an interface; determining domain name attributes of the domain name according to the domain name information; if the domain name attribute is suspected illegal, determining the domain name as a suspected domain name, and extracting domain name characteristics of the suspected domain name as target characteristics; expanding the suspected domain name according to the target characteristics and the rule for generating the illegal domain name to obtain at least two derivative domain names; and verifying whether the derived domain name is an illegal domain name, and if so, identifying illegal application software by adopting the illegal domain name. By executing the technical scheme provided by the application, the detection rate and the identification rate of illegal application software can be improved.

Description

Illegal application software identification method, device, medium and electronic equipment
Technical Field
The embodiment of the application relates to the technical field of computer application, in particular to a method, a device, a medium and electronic equipment for identifying illegal application software.
Background
At present, cases which are implemented by Application software (APP) and infringe legal interests of people are continuously and highly developed, and social security is seriously disturbed. In application software with numerous and huge quantity, the method can accurately identify illegal application software such as fraud-related application, and has important significance for protecting life and property safety of people.
Currently, general application large-class fields of application software are obtained mainly through DPI (Deep Packet Inspection ) technology, so that large-class identification of the application software is realized. However, general rule-breaking application software does not strictly follow the national application software record development specification, and in many cases, the application type field cannot be obtained through DPI technology. Moreover, the fraud-related applications are often imitated into the types of compliance applications on the market, such as games, instant messaging, etc., which results in a lower recognition rate of the illegal application software of the above method.
Disclosure of Invention
The embodiment of the application provides a method, a device, a medium and electronic equipment for identifying illegal application software, which can achieve the aim of improving the detection rate and accuracy of the illegal application software.
In a first aspect, an embodiment of the present application provides a method for identifying illegal application software, where the method includes:
extracting an interface access request in network flow data, and analyzing the interface access request to acquire domain name information of an interface;
determining domain name attributes of the domain name according to the domain name information;
if the domain name attribute is suspected illegal, determining the domain name as a suspected domain name, and extracting domain name characteristics of the suspected domain name as target characteristics;
expanding the suspected domain name according to the target characteristics and the rule for generating the illegal domain name to obtain at least two derivative domain names;
and verifying whether the derived domain name is an illegal domain name, and if so, identifying illegal application software by adopting the illegal domain name.
In a second aspect, an embodiment of the present application provides an apparatus for identifying illegal application software, where the apparatus includes:
the domain name information acquisition module is used for extracting an interface access request in the network flow data and analyzing the interface access request to acquire domain name information of an interface;
the domain name attribute determining module is used for determining the domain name attribute of the domain name according to the domain name information;
the target feature determining module is used for determining the domain name as a suspected domain name if the domain name attribute is suspected illegal, and extracting domain name features of the suspected domain name as target features;
the domain name expansion module is used for expanding the suspected domain name according to the target characteristics and the rule for generating the illegal domain name to obtain at least two derivative domain names;
and the illegal application software identification module is used for verifying whether the derivative domain name is an illegal domain name, and if yes, adopting the illegal domain name to identify the illegal application software.
In a third aspect, embodiments of the present application provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method for identifying offending application software as described in embodiments of the present application.
In a fourth aspect, an embodiment of the present application provides an electronic device, including a memory, a processor, and a computer program stored on the memory and executable by the processor, where the processor implements a method for identifying offending application software according to an embodiment of the present application when the processor executes the computer program.
According to the technical scheme provided by the embodiment of the application, under the condition that the domain name attribute is determined to be suspected illegal, the domain name characteristic of the suspected domain name is extracted to serve as the target characteristic, the suspected domain name is expanded according to the target characteristic and the illegal domain name generation rule to obtain at least two derivative domain names, the compliance of the derivative domain names is verified, and then the illegal domain name is adopted to identify the illegal application software, so that the identification of the illegal application software is realized. According to the method and the device for predicting the deformation of the illegal domain name, the suspected domain name is expanded according to the target characteristics and the rule for generating the illegal domain name, so that the prediction of the deformation of the suspected domain name is realized, more derivative domain names are expanded from one suspected domain name, and the situation that the dynamic domain name is adopted for resource access of the illegal application software can be effectively achieved by executing the method and the device, and the detection rate and the accuracy rate of the illegal application software are improved.
Drawings
FIG. 1 is a flowchart of a method for identifying offending application software according to an embodiment of the present application;
FIG. 2 is a flowchart of another method for identifying offending application software provided in accordance with a second embodiment of the present application;
fig. 3 is a schematic structural diagram of an illegal application software identification device according to a third embodiment of the present application;
fig. 4 is a schematic structural diagram of an electronic device according to a fifth embodiment of the present application.
Detailed Description
The present application is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the application and not limiting thereof. It should be further noted that, for convenience of description, only some, but not all of the structures related to the present application are shown in the drawings.
Before discussing exemplary embodiments in more detail, it should be mentioned that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart depicts steps as a sequential process, many of the steps may be implemented in parallel, concurrently, or with other steps. Furthermore, the order of the steps may be rearranged. The process may be terminated when its operations are completed, but may have additional steps not included in the figures. The processes may correspond to methods, functions, procedures, subroutines, and the like.
In general, a client and a server of application software communicate by means of domain name or IP, and for application software (compliance application) meeting the national legal standards, the client and the server generally communicate by directly accessing a main domain name to access server resources. Because application software (illegal application) which does not meet the national relevant legal regulations can seriously disturb social security, the illegal application becomes an important hit object of relevant departments, and once a server providing resource access service for the illegal application is detected, the relevant departments can carry out blocking treatment. In this way, the developer of the illegal application often adopts a mode of distributing dynamic domain names for the illegal application to avoid blocking of related departments, specifically, when the user accesses the server resources through the illegal application, the user directly accesses the main domain name to communicate with the server, rather than directly accessing the main domain name to communicate with the server like the illegal application, the user can firstly go to a designated interface to acquire an available server address list, and once the currently used domain name is blocked, new domain names are randomly generated for the illegal application, so that a continuously available state is achieved. Inspired and the development thought of the illegal application developer, the application provides a method for identifying the illegal application software, which can effectively cope with the condition that the illegal application software adopts a dynamic domain name to access resources, and has the advantages of high detection rate and high accuracy rate of the illegal application software.
Example 1
Fig. 1 is a flowchart of a method for identifying illegal application software according to an embodiment of the present application, where the embodiment is applicable to the situation of identifying illegal application software. The method can be executed by the illegal application software identification device provided by the embodiment of the application, and the device can be realized by software and/or hardware and can be integrated into an electronic device running the system.
As shown in fig. 1, the method for identifying the illegal application software includes:
s110, extracting an interface access request in the network flow data, and analyzing the interface access request to acquire domain name information of an interface.
The network flow data is intercepted by the packet grabbing tool and transmitted and received in the network transmission. An interface access request may be included in the network traffic data, wherein the interface access request is a request for obtaining a dynamic domain name. In general, when a resource is requested from a server by an offending application, an interface access request is generated first, a transit website is accessed based on the interface access request, when the transit website is called, a domain name pool is returned, and communication can be performed with a background server of the offending application based on a domain name in the domain name pool, so that the background server resource is accessed.
Extracting an interface access request in network traffic data, analyzing the interface access request to obtain domain name information of an interface, specifically, capturing the interface access request by utilizing a wireframe tool, obtaining response content of the interface access request, and extracting the domain name information of the interface from the response content. Illustratively, the interface access request captured by the wiretrap is:
GET/line/apUrls/getUrlplatformId=A03&domain0rl=https://47.75.215.58:808HTTP/1.1
Host:179.33.19.174:3389
Connection:KeepAlive
Accept-Encoding:gzip
User-Agent:okhttp/3.14.2
the response content corresponding to the interface access request is as follows:
HTTP/1.1 200
Content-Type:application/json;charset=UTF-8
Transfer-Encoding:chunked
Gate:Wed.03Jun 2020 08:54:10GMT
{ "code":200, "message": "," operation successful "," data ": https:// m.5005559.Com" }
Https:// m.5005559.Com in the response content is domain name information.
Some interface access requests are encrypted, and response contents of the interface access requests cannot be directly obtained for the encrypted interface access requests. In an optional embodiment, the extracting the interface access request in the network traffic data, and parsing the interface access request to obtain domain name information of the interface, includes: if the interface access request is encrypted, performing interface access by using an analog interface access tool according to the interface access request to obtain an interface response; and extracting an interface access link included in the interface response content, and determining domain name information of an interface according to the interface access link.
The simulation interface access tool is a tool for sending an interface access request by the simulation application software to acquire a server response, and for example, the simulation interface access tool may be a postman, and specifically, the postman is used to send a post request by the simulation application software according to the interface access request to acquire the server response. And extracting domain name information of the interface from the interface access link in the obtained server response.
Optionally, judging whether the interface access request is encrypted or not through the traffic packet, and if the application software adopts https protocol, encrypting the interface access request. If the application software adopts the http protocol, the interface access request is unencrypted. The domain name information acquisition method aims at the condition that the interface access request is an encryption request, so that the domain name information is extracted from the encrypted interface access request, and the detection rate of illegal application is improved.
S120, determining the domain name attribute of the domain name according to the domain name information.
Wherein the domain name attribute refers to the compliance of the domain name. In practice, the domain name attributes are determined by the compliance of the domain name to the address. If the domain name points to a offending website, such as a fraud website, the domain name attribute of the domain name is explicit offending. If the domain name points to a compliant website, such as a banking network, the domain name attribute of the domain name is definitely compliant. And if the domain name pointing to the website attribute is not determined, the domain name attribute of the domain name is suspected illegal. And determining the domain name attribute of the domain name according to the domain name information, and particularly determining the domain name attribute according to the domain name content.
In an alternative embodiment, wherein the domain name attribute comprises explicit compliance, explicit violations, and suspected violations; correspondingly, the determining the domain name attribute of the domain name according to the domain name information comprises the following steps: if the domain name information is successfully matched with the domain name white list, the domain name attribute of the domain name is definitely compliant; if the domain name information is successfully matched with the domain name blacklist, the domain name attribute of the domain name is clear violation; if the matching of the domain name information and the domain name white list fails and the matching of the domain name information and the domain name black list fails, the domain name attribute of the domain name is suspected illegal.
The domain name whitelist is added into the illegal application software identification system in advance by related technicians, the domain name whitelist is composed of domain names pointing to compliant websites, and domain names in the domain name whitelist have definite compliance in domain name attributes. Correspondingly, the domain name blacklist is also added into the offence application software identification system in advance by related technicians, the domain name blacklist is composed of domain names pointing to offence websites, and the domain names in the domain name blacklist have the domain name attributes of clear offences. With the development of illegal software identification work, the domain name whitelist and the domain name blacklist are updated and expanded continuously.
And determining the domain name attribute of the domain name according to the domain name information. Further, the domain name content can be respectively matched with a domain name white list and a domain name black list, if the domain name information is successfully matched with the domain name white list, the domain name is indicated to be already recorded in the domain name white list, and the domain name attribute is clear compliance; if the matching of the domain name information and the domain name blacklist is successful, the domain name information indicates that the domain name is already recorded in the domain name blacklist, and the domain name attribute is clear violation; if the matching of the domain name information and the domain name white list fails and the matching of the domain name information and the domain name black list fails, the domain name information and the domain name black list show that the domain name is not recorded by the domain name white list and the domain name attribute of the domain name is not determined, and the domain name information and the domain name black list are determined to be suspected illegal. Therefore, the qualitative of the domain name is realized, the screening of the domain name is realized, and only the domain name with the suspected illegal domain name attribute is subjected to subsequent processing, so that the calculated amount is reduced, and the recognition efficiency of illegal software is improved.
And S130, if the domain name attribute is suspected illegal, determining the domain name as a suspected domain name, and extracting domain name characteristics of the suspected domain name as target characteristics.
The domain name attribute is suspected illegal, the compliance of the website pointed by the domain name is uncertain, the compliance of the domain name is also uncertain, and the domain name is determined to be a suspected domain name. And extracting domain name characteristics of the suspected domain name as target characteristics for further processing.
In an alternative embodiment, the domain name feature includes at least one of a domain name direction, a domain name content, and a domain name form. The domain name points to a target IP address and a target port number which can reflect the resource access based on the domain name, and generally, when the resource access is carried out by the illegal application, a plurality of times of jumping exists; the domain name content refers to the composition of the domain name, and the domain name content comprises characters and symbols; the domain name form includes at least one of a combination manner between characters, the number of characters, and the type of characters. By way of example, the character type may be english letters or numbers, etc.
And S140, expanding the suspected domain name according to the target characteristics and the rule for generating the illegal domain name to obtain at least two derivative domain names.
The rule for generating the illegal domain name is a rule for predicting the illegal domain name, and the rule for generating the illegal domain name is determined according to domain name characteristics of the illegal domain name. The rule for generating the offending domain name may be a plurality of rule for generating the offending domain name, and the specific content of the rule for generating the offending domain name is not limited herein, and may be specifically determined according to actual situations. Optionally, the rule for generating the offending domain name is a rule of combining characters and symbols or a rule of modifying the content of the characters.
According to the target feature and the rule for generating the illegal domain name, the suspected domain name is expanded, in practice, the target feature is taken as a reference, and the suspected domain name is expanded by using the rule for generating the illegal domain name, so that the derivative domain name is obtained. The derived domain name is a domain name obtained by expanding a suspected domain name according to a rule for generating violations, and the derived domain name is a modification of the suspected domain name with target characteristics. The specific number of derivative domain names is not limited in this regard, and is determined according to actual service requirements and computing resource usage. It will be appreciated that the greater the number of derived domain names, the greater the probability of detecting a offending domain name, while the greater the computational resources required to verify the derived domain name.
For example, if the suspected domain name is https:// m.55591d.com/, the main content of the suspected domain name is a number, and the rule of character content modification in the rule of generating the offending domain name is utilized to expand the offending domain name to obtain a derivative domain name, which may be https:// m.5006659.Com, https:// m.266503.Com, https:// m.22633g.com, https:// m.5155039.com or https:// m.53998b.com.
And S150, verifying whether the derived domain name is an illegal domain name, and if so, identifying illegal application software by adopting the illegal domain name.
Verifying whether the derived domain names are illegal domain names, specifically, according to the derived domain name configuration DPI parameters, performing simulated interface access on the derived domain names, after skipping to the corresponding page, verifying the compliance of the page display content through the existing page content verification algorithm, and according to the compliance of the page display content, determining whether the derived domain names are illegal domain names. The page content verification algorithm may be a content recognition algorithm, a keyword matching algorithm, or an image recognition algorithm, which is not the focus of the research in this application and will not be described herein.
After determining the offending domain name, tracing the application software according to the offending domain name, and identifying the offending application software. The illegal application software refers to software which violates national relevant laws and regulations, and comprises fraud-related software, pornography software or phishing software and the like.
According to the technical scheme provided by the embodiment of the application, under the condition that the domain name attribute is determined to be suspected illegal, the domain name characteristic of the suspected domain name is extracted to serve as the target characteristic, the suspected domain name is expanded according to the target characteristic and the illegal domain name generation rule to obtain at least two derivative domain names, the compliance of the derivative domain names is verified, and then the illegal domain name is adopted to identify the illegal application software, so that the identification of the illegal application software is realized. According to the method and the device, the suspected domain name is expanded according to the target characteristics and the rule for generating the illegal domain name, so that prediction of deformation of the suspected domain name is achieved, more derivative domain names are expanded from one suspected domain name, the situation that dynamic domain names are adopted for resource access of illegal application software can be effectively achieved by executing the method and the device, the detection rate of the illegal application software is improved, and the identification accuracy of the illegal application software is guaranteed by tracing the application software according to the illegal domain name.
Example two
Fig. 2 is a flowchart of another method for identifying offending application software according to the second embodiment of the present application. The present embodiment is further optimized on the basis of the above embodiment. Specifically, according to the target feature and the rule for generating the offending domain name, the suspected domain name is expanded to obtain at least two derivative domain names, including: determining the rule for generating the illegal domain name according to the domain name blacklist and the domain name whitelist; generating a set number of candidate domain names based on the rule for generating the illegal domain names, and extracting domain name characteristics of the candidate domain names; determining the feature similarity between the target feature and the domain name feature of the candidate domain name by using a fuzzy matching principle; and selecting at least two derivative domain names from the candidate domain names according to the characteristic similarity.
As shown in fig. 2, the method for identifying the illegal application software includes:
s210, extracting an interface access request in the network flow data, and analyzing the interface access request to acquire domain name information of an interface.
S220, determining the domain name attribute of the domain name according to the domain name information.
And S230, if the domain name attribute is suspected illegal, determining the domain name as a suspected domain name, and extracting domain name characteristics of the suspected domain name as target characteristics.
S240, determining the rule for generating the illegal domain name according to the domain name blacklist and the domain name whitelist.
The domain names in the domain name blacklist are illegal domain names, the forward direction guiding function is provided for predicting the illegal domain names, and the rule for generating the illegal domain names can be determined by statistically analyzing the characteristics of the domain names in the domain name blacklist in terms of content, form, direction and the like. According to the blacklist of the domain names, the specific characteristics of the illegal domain names different from the illegal domain names of the compliance software can be determined, for example, the illegal domain names with the domain name content taking numbers as main bodies are related in the domain name form, and the probability of the same numbers is high, and the similar numbers are generally replaced. Alternatively, the offending domain name is formed by randomly inserting some characters or symbols into a complete english word having a definite meaning, such as sex, to modify the sex into saewwx. For such offending domain names, the dynamic domain name used by the same offending application software is most likely a difference in the content or location of the inserted characters or symbols.
It will be appreciated that the offending domain name may also be a variant of a compliant domain name, such as phishing software that often similarly replaces characters in the compliant domain name, formally mimicking the compliant software. The rule for generating the illegal domain name can be determined by statistically analyzing the characteristics of the legal domain name in the white list of the domain name in terms of content, form, direction and the like. Specifically, the rule for generating the offending domain name may be that the offending domain name is determined by referring to a domain name white list, and similar substitution is performed on character content included in the domain name through the domain name white list.
S250, generating a set number of candidate domain names based on the rule for generating the illegal domain names, and extracting domain name characteristics of the candidate domain names.
The candidate domain name is generated according to rule of generating rule of violating domain name, and domain name with feature of violating domain name is provided. The number of candidate domain names is not limited herein, and is specifically determined according to actual service requirements. Extracting domain name characteristics of the candidate domain name, and specifically comprises extracting domain name content, domain name form and domain name pointing characteristics of the candidate domain name.
And S260, determining the feature similarity between the target feature and the domain name feature of the candidate domain name by using the fuzzy matching principle.
The fuzzy matching is a matching algorithm different from the precise matching, and the fuzzy matching is to give a rough degree of matching according to given conditions or requirements. The feature similarity reflects the similarity degree of the candidate domain name and the suspected domain name in the domain name feature dimension, and the feature similarity can be used for screening the candidate domain name.
The feature similarity between the target feature and the domain name feature of the candidate domain name is determined by utilizing the fuzzy matching principle, so that the candidate domain name can be screened according to the feature similarity, and the obtained screening result and the suspected domain name have common features and meanwhile have different features, so that the domain name blacklist and the rule for naming the illegal domain name are expanded.
S270, selecting at least two derivative domain names from the candidate domain names according to the feature similarity.
Derived domain names are generated in the candidate domain names, the derived domain names being candidate domain names having the target feature. The derivative domain names are selected according to the feature similarity in the candidate domain names, and the candidate domain names can be ranked according to the sequence from high to low of the feature similarity, and the candidate domain names ranked in the set range of the first 30% or the first 50% are selected as the derivative domain names. The setting range can be determined according to the actual duty ratio of the illegal domain name in the derivative domain name. The actual duty ratio of the illegal domain name in the derivative domain name can be obtained by counting the verification result after the verification process of verifying whether the derivative domain name is the illegal domain name is executed.
And S280, verifying whether the derived domain name is an illegal domain name, and if so, identifying illegal application software by adopting the illegal domain name.
In an alternative embodiment, using the offending domain name to identify offending application software includes: adding the illegal domain name into a domain name blacklist, and monitoring network traffic data according to the domain name blacklist; if any illegal domain name included in the domain name blacklist appears in the network traffic data, tracing the source of the application software according to the network traffic data, and marking the application software as illegal software.
Adding the illegal domain name into a domain name blacklist to expand the domain name blacklist, issuing a monitoring list according to the domain name blacklist, and designating the illegal domain name in the monitoring domain name blacklist. And if the access record of the illegal domain name is monitored in the network traffic data, tracing the source of the application software according to the network traffic data, and marking the application software as the illegal software to identify the illegal application software. Specifically, the method and the device for collecting the domain names collect new domain names pushed by the target website by monitoring the target website. And comparing the collected newly pushed domain name with a domain name blacklist to realize the identification of illegal application software.
The target website refers to a website which returns a domain name pool when being called. The domain name in the domain name pool pointed by the target website is the page display content of the application software, and the domain name pool can solve the problem that the application software cannot be used after the communication domain name of the application software fails. According to the method, network traffic data are monitored according to the domain name blacklist; when the illegal domain name appears in the network traffic data, tracing the source of the application software, so that the identification of the illegal application software is realized, and the accuracy of the identification of the illegal application software is ensured.
According to the technical scheme provided by the embodiment of the application, the rule for generating the illegal domain name is determined according to the domain name blacklist and the domain name whitelist; generating a set number of candidate domain names based on rule for generating illegal domain names, and extracting domain name characteristics of the candidate domain names; determining the feature similarity between the target feature and the domain name feature of the candidate domain name by using a fuzzy matching principle; and selecting at least two derivative domain names from the candidate domain names according to the characteristic similarity. And verifying whether the derived domain name is an illegal domain name, and if so, identifying illegal application software by adopting the illegal domain name. According to the method and the device, the derivative domain name generated based on the rule for generating the illegal domain name ensures that the common characteristics exist between the derivative domain name and the suspected domain name and meanwhile the difference characteristics are maintained, and by verifying whether the derivative domain name is the illegal domain name, more potential illegal domain names can be mined, meanwhile, the blacklist of the domain name and the rule for naming the illegal domain name can be expanded, the effect of one against three is achieved, and the detection rate and the recognition efficiency of illegal software are improved.
Example III
Fig. 3 is a schematic diagram of an apparatus for identifying illegal application software according to a third embodiment of the present application, where the present embodiment is applicable to the case of identifying illegal application software. The apparatus may be implemented in software and/or hardware and may be integrated in an electronic device such as a smart terminal.
As shown in fig. 3, the apparatus may include: a domain name information acquisition module 310, a domain name attribute determination module 320, a target feature determination module 330, a domain name extension module 340, and a violation application recognition module 350.
The domain name information obtaining module 310 is configured to extract an interface access request in the network traffic data, and parse the interface access request to obtain domain name information of an interface;
a domain name attribute determining module 320, configured to determine a domain name attribute of the domain name according to the domain name information;
the target feature determining module 330 is configured to determine that the domain name is a suspected domain name if the domain name attribute is a suspected violation, and extract a domain name feature of the suspected domain name as a target feature;
the domain name expansion module 340 is configured to expand the suspected domain name according to the target feature and the rule for generating the offending domain name to obtain at least two derivative domain names;
and the offending application software identification module 350 is configured to verify whether the derived domain name is an offending domain name, and if yes, identify offending application software by using the offending domain name.
According to the technical scheme provided by the embodiment of the application, under the condition that the domain name attribute is determined to be suspected illegal, the domain name characteristic of the suspected domain name is extracted to serve as the target characteristic, the suspected domain name is expanded according to the target characteristic and the illegal domain name generation rule to obtain at least two derivative domain names, the compliance of the derivative domain names is verified, and then the illegal domain name is adopted to identify the illegal application software, so that the identification of the illegal application software is realized. According to the method and the device for predicting the deformation of the illegal domain name, the suspected domain name is expanded according to the target characteristics and the rule for generating the illegal domain name, so that the prediction of the deformation of the suspected domain name is realized, more derivative domain names are expanded from one suspected domain name, and the situation that the dynamic domain name is adopted for resource access of the illegal application software can be effectively achieved by executing the method and the device, and the detection rate and the accuracy rate of the illegal application software are improved.
Optionally, the domain name extension module 340 includes: the rule determining submodule is used for determining rule of generating the illegal domain name according to the domain name blacklist and the domain name whitelist; the domain name feature extraction sub-module is used for generating a set number of candidate domain names based on the rule for generating the illegal domain names and extracting domain name features of the candidate domain names; the feature similarity determining submodule is used for determining feature similarity between the target feature and the domain name feature of the candidate domain name by utilizing a fuzzy matching principle; and the derivative domain name determining submodule is used for selecting at least two derivative domain names from the candidate domain names according to the characteristic similarity.
Optionally, the domain name attribute includes explicit compliance, explicit violations, and suspected violations; accordingly, the domain name attribute determining module 320 includes: the first sub-module is used for determining the domain name attribute of the domain name to be definitely compliant if the domain name information is successfully matched with the domain name white list; the second sub-module is used for determining the domain name attribute of the domain name as clear rule violations if the domain name information is successfully matched with the domain name blacklist; and the third sub-module is used for determining the domain name attribute of the domain name as a suspected violation if the domain name information fails to match with the domain name white list and fails to match with the domain name black list.
Optionally, the offending application software identification module 350 includes: a domain name compliance verification sub-module and a violation application software identification sub-module; the domain name compliance verification sub-module is specifically used for verifying whether the derivative domain name is an illegal domain name or not; and the illegal application software identification sub-module is specifically used for identifying the illegal application software by adopting the illegal domain name.
A violation application software identification sub-module comprising: the traffic data monitoring unit is used for adding the illegal domain name into a domain name blacklist and monitoring network traffic data according to the domain name blacklist; and the illegal application software identification unit is used for tracing the application software according to the network traffic data and marking the application software as illegal software if any illegal domain name included in the domain name blacklist appears in the network traffic data.
Optionally, the domain name information obtaining module 310 includes: the interface response acquisition sub-module is used for carrying out interface access by using a simulation interface access tool according to the interface access request if the interface access request is subjected to encryption processing so as to acquire an interface response; and the domain name information determination submodule is used for extracting the interface access link included in the interface response content and determining the domain name information of the interface according to the interface access link.
Optionally, the domain name feature includes at least one of a domain name direction, a domain name content, and a domain name form.
The device for identifying the illegal application software provided by the embodiment of the invention can execute the method for identifying the illegal application software provided by any embodiment of the invention, and has the corresponding performance module and beneficial effects of executing the method for identifying the illegal application software.
Example IV
A fourth embodiment of the present application also provides a storage medium containing computer-executable instructions, which when executed by a computer processor, are for performing a method of identifying offending application software, the method comprising:
extracting an interface access request in network flow data, and analyzing the interface access request to acquire domain name information of an interface;
determining domain name attributes of the domain name according to the domain name information;
if the domain name attribute is suspected illegal, determining the domain name as a suspected domain name, and extracting domain name characteristics of the suspected domain name as target characteristics;
expanding the suspected domain name according to the target characteristics and the rule for generating the illegal domain name to obtain at least two derivative domain names;
and verifying whether the derived domain name is an illegal domain name, and if so, identifying illegal application software by adopting the illegal domain name.
Storage media refers to any of various types of memory electronic devices or storage electronic devices. The term "storage medium" is intended to include: mounting media such as CD-ROM, floppy disk or tape devices; computer system memory or random access memory such as DRAM, DDR RAM, SRAM, EDO RAM, lanbas (Rambus) RAM, etc.; nonvolatile memory such as flash memory, magnetic media (e.g., hard disk or optical storage); registers or other similar types of memory elements, etc. The storage medium may also include other types of memory or combinations thereof. In addition, the storage medium may be located in a computer system in which the program is executed, or may be located in a different second computer system connected to the computer system through a network (such as the internet). The second computer system may provide program instructions to the computer for execution. The term "storage medium" may include two or more storage media that may reside in different unknowns (e.g., in different computer systems connected by a network). The storage medium may store program instructions (e.g., embodied as a computer program) executable by one or more processors.
Of course, the storage medium containing the computer executable instructions provided in the embodiments of the present application is not limited to the above-described operation of identifying the offending application software, and may also perform the related operations in the method for identifying the offending application software provided in any embodiment of the present application.
Example five
The fifth embodiment of the present application provides an electronic device, in which the device for identifying illegal application software provided in the embodiments of the present application may be integrated, where the electronic device may be configured in a system, or may be a device that performs part or all of the performance in the system. Fig. 4 is a schematic structural diagram of an electronic device according to a fifth embodiment of the present application. As shown in fig. 4, the present embodiment provides an electronic device 400, which includes: one or more processors 420; a storage device 410, configured to store one or more programs that, when executed by the one or more processors 420, cause the one or more processors 420 to implement a method for identifying offending application software provided in an embodiment of the present application, the method comprising:
extracting an interface access request in network flow data, and analyzing the interface access request to acquire domain name information of an interface;
determining domain name attributes of the domain name according to the domain name information;
if the domain name attribute is suspected illegal, determining the domain name as a suspected domain name, and extracting domain name characteristics of the suspected domain name as target characteristics;
expanding the suspected domain name according to the target characteristics and the rule for generating the illegal domain name to obtain at least two derivative domain names;
and verifying whether the derived domain name is an illegal domain name, and if so, identifying illegal application software by adopting the illegal domain name.
Of course, those skilled in the art will appreciate that the processor 420 also implements the technical solution of the method for identifying offending application software provided in any embodiment of the present application.
The electronic device 400 shown in fig. 4 is merely an example and should not be construed as limiting the capabilities and scope of use of embodiments of the present application.
As shown in fig. 4, the electronic device 400 includes a processor 420, a storage device 410, an input device 430, and an output device 440; the number of processors 420 in the electronic device may be one or more, one processor 420 being taken as an example in fig. 4; the processor 420, the storage device 410, the input device 430, and the output device 440 in the electronic device may be connected by a bus or other means, as exemplified by connection via a bus 450 in fig. 4.
The storage device 410 is used as a computer readable storage medium for storing a software program, a computer executable program, and a module unit, such as program instructions corresponding to the method for identifying illegal application software in the embodiment of the present application.
The storage device 410 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, at least one application program required for performance; the storage data area may store data created according to the use of the terminal, etc. In addition, the storage 410 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage device. In some examples, storage device 410 may further include memory located remotely from processor 420, which may be connected via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input device 430 may be used to receive input numeric, character information, or voice information, and to generate key signal inputs related to user settings and performance control of the electronic device. The output device 440 may include an electronic device such as a display screen, a speaker, etc.
The device, the medium and the electronic equipment for identifying the illegal application software provided in the embodiment can execute the method for identifying the illegal application software provided in any embodiment of the application, and have the corresponding performance module and beneficial effects of executing the method. Technical details not described in detail in the above embodiments may be found in the method for identifying offending application software provided in any embodiment of the present application.
Note that the above is only a preferred embodiment of the present application and the technical principle applied. Those skilled in the art will appreciate that the present application is not limited to the particular embodiments described herein, but is capable of numerous obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the present application. Therefore, while the present application has been described in connection with the above embodiments, the present application is not limited to the above embodiments, but may include many other equivalent embodiments without departing from the spirit of the present application, the scope of which is defined by the scope of the appended claims.

Claims (8)

1. A method for identifying offending application software, the method comprising:
extracting an interface access request in network flow data, and analyzing the interface access request to acquire domain name information of an interface;
determining domain name attributes of the domain name according to the domain name information;
if the domain name attribute is suspected illegal, determining the domain name as a suspected domain name, and extracting domain name characteristics of the suspected domain name as target characteristics;
expanding the suspected domain name according to the target characteristics and the rule for generating the illegal domain name to obtain at least two derivative domain names;
verifying whether the derived domain name is an illegal domain name, if so, identifying illegal application software by adopting the illegal domain name;
according to the target feature and rule for generating the illegal domain name, the suspected domain name is expanded to obtain at least two derivative domain names, including:
determining the rule for generating the illegal domain name according to the domain name blacklist and the domain name whitelist;
generating a set number of candidate domain names based on the rule for generating the illegal domain names, and extracting domain name characteristics of the candidate domain names;
determining the feature similarity between the target feature and the domain name feature of the candidate domain name by using a fuzzy matching principle;
and selecting at least two derivative domain names from the candidate domain names according to the characteristic similarity.
2. The method of claim 1, wherein the domain name attributes include explicit compliance, explicit violations, and suspected violations; correspondingly, the determining the domain name attribute of the domain name according to the domain name information comprises the following steps:
if the domain name information is successfully matched with the domain name white list, the domain name attribute of the domain name is definitely compliant;
if the domain name information is successfully matched with the domain name blacklist, the domain name attribute of the domain name is clear violation;
if the matching of the domain name information and the domain name white list fails and the matching of the domain name information and the domain name black list fails, the domain name attribute of the domain name is suspected illegal.
3. The method of claim 1, wherein identifying the offending application software using the offending domain name comprises:
adding the illegal domain name into a domain name blacklist, and monitoring network traffic data according to the domain name blacklist;
if any illegal domain name included in the domain name blacklist appears in the network traffic data, tracing the source of the application software according to the network traffic data, and marking the application software as illegal software.
4. The method of claim 1, wherein extracting the interface access request from the network traffic data, parsing the interface access request to obtain domain name information of the interface, comprises:
if the interface access request is encrypted, performing interface access by using an analog interface access tool according to the interface access request to obtain an interface response;
and extracting an interface access link included in the interface response content, and determining domain name information of an interface according to the interface access link.
5. The method of claim 1, wherein the domain name features include at least one of domain name pointing, domain name content, and domain name form.
6. An apparatus for identifying offending application software, said apparatus comprising:
the domain name information acquisition module is used for extracting an interface access request in the network flow data and analyzing the interface access request to acquire domain name information of an interface;
the domain name attribute determining module is used for determining the domain name attribute of the domain name according to the domain name information;
the target feature determining module is used for determining the domain name as a suspected domain name if the domain name attribute is suspected illegal, and extracting domain name features of the suspected domain name as target features;
the domain name expansion module is used for expanding the suspected domain name according to the target characteristics and the rule for generating the illegal domain name to obtain at least two derivative domain names;
the illegal application software identification module is used for verifying whether the derivative domain name is an illegal domain name, and if yes, the illegal domain name is adopted to identify the illegal application software;
wherein, the domain name extension module includes:
the rule determining submodule is used for determining rule of generating the illegal domain name according to the domain name blacklist and the domain name whitelist;
the domain name feature extraction sub-module is used for generating a set number of candidate domain names based on the rule for generating the illegal domain names and extracting domain name features of the candidate domain names;
the feature similarity determining submodule is used for determining feature similarity between the target feature and the domain name feature of the candidate domain name by utilizing a fuzzy matching principle;
and the derivative domain name determining submodule is used for selecting at least two derivative domain names from the candidate domain names according to the characteristic similarity.
7. A computer readable storage medium having stored thereon a computer program, which when executed by a processor implements the method of identifying offending application software as claimed in any of claims 1-5.
8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of identifying offending application software as claimed in any of claims 1-5 when the computer program is executed by the processor.
CN202111129799.0A 2021-09-26 2021-09-26 Illegal application software identification method, device, medium and electronic equipment Active CN113890866B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111129799.0A CN113890866B (en) 2021-09-26 2021-09-26 Illegal application software identification method, device, medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111129799.0A CN113890866B (en) 2021-09-26 2021-09-26 Illegal application software identification method, device, medium and electronic equipment

Publications (2)

Publication Number Publication Date
CN113890866A CN113890866A (en) 2022-01-04
CN113890866B true CN113890866B (en) 2024-03-12

Family

ID=79006732

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111129799.0A Active CN113890866B (en) 2021-09-26 2021-09-26 Illegal application software identification method, device, medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN113890866B (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105530251A (en) * 2015-12-14 2016-04-27 深圳市深信服电子科技有限公司 Method and device for identifying phishing website
US10089661B1 (en) * 2016-12-15 2018-10-02 Amazon Technologies, Inc. Identifying software products to test
CN111224948A (en) * 2019-11-29 2020-06-02 云深互联(北京)科技有限公司 Method, device, equipment and storage medium for discovering application
CN112000518A (en) * 2020-08-13 2020-11-27 深圳本地宝新媒体技术有限公司 Application program fault risk processing method, device and system, terminal and equipment
CN112131507A (en) * 2020-09-25 2020-12-25 成都知道创宇信息技术有限公司 Website content processing method, device, server and computer-readable storage medium
CN112667875A (en) * 2020-12-24 2021-04-16 恒安嘉新(北京)科技股份公司 Data acquisition method, data analysis method, data acquisition device, data analysis device, equipment and storage medium
CN112685072A (en) * 2020-12-31 2021-04-20 恒安嘉新(北京)科技股份公司 Method, device, equipment and storage medium for generating communication address knowledge base
CN112685255A (en) * 2020-12-30 2021-04-20 恒安嘉新(北京)科技股份公司 Interface monitoring method and device, electronic equipment and storage medium
CN113067820A (en) * 2021-03-19 2021-07-02 深圳市安络科技有限公司 Method, device and equipment for early warning abnormal webpage and/or APP
CN113411322A (en) * 2021-06-16 2021-09-17 中国银行股份有限公司 Network traffic monitoring method and device for preventing financial fraud based on block chain
CN113407886A (en) * 2021-07-10 2021-09-17 广州数智网络科技有限公司 Network crime platform identification method, system, device and computer storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10178107B2 (en) * 2016-04-06 2019-01-08 Cisco Technology, Inc. Detection of malicious domains using recurring patterns in domain names
US10972507B2 (en) * 2018-09-16 2021-04-06 Microsoft Technology Licensing, Llc Content policy based notification of application users about malicious browser plugins
US20210256089A1 (en) * 2020-02-18 2021-08-19 BluBracket, Inc. Identifying and monitoring relevant enterprise data stored in software development repositories

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105530251A (en) * 2015-12-14 2016-04-27 深圳市深信服电子科技有限公司 Method and device for identifying phishing website
US10089661B1 (en) * 2016-12-15 2018-10-02 Amazon Technologies, Inc. Identifying software products to test
CN111224948A (en) * 2019-11-29 2020-06-02 云深互联(北京)科技有限公司 Method, device, equipment and storage medium for discovering application
CN112000518A (en) * 2020-08-13 2020-11-27 深圳本地宝新媒体技术有限公司 Application program fault risk processing method, device and system, terminal and equipment
CN112131507A (en) * 2020-09-25 2020-12-25 成都知道创宇信息技术有限公司 Website content processing method, device, server and computer-readable storage medium
CN112667875A (en) * 2020-12-24 2021-04-16 恒安嘉新(北京)科技股份公司 Data acquisition method, data analysis method, data acquisition device, data analysis device, equipment and storage medium
CN112685255A (en) * 2020-12-30 2021-04-20 恒安嘉新(北京)科技股份公司 Interface monitoring method and device, electronic equipment and storage medium
CN112685072A (en) * 2020-12-31 2021-04-20 恒安嘉新(北京)科技股份公司 Method, device, equipment and storage medium for generating communication address knowledge base
CN113067820A (en) * 2021-03-19 2021-07-02 深圳市安络科技有限公司 Method, device and equipment for early warning abnormal webpage and/or APP
CN113411322A (en) * 2021-06-16 2021-09-17 中国银行股份有限公司 Network traffic monitoring method and device for preventing financial fraud based on block chain
CN113407886A (en) * 2021-07-10 2021-09-17 广州数智网络科技有限公司 Network crime platform identification method, system, device and computer storage medium

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Smart Fraud Detection Framework for Job Recruitments;Asad Mehboob et al.;Springer;全文 *
电子商务网站违法行为监管平台;高学勤;王涛;;计算机系统应用(第08期);全文 *
防欺诈域名安全管理系统的研究和应用;吴;时镇军;;电信工程技术与标准化(第12期);全文 *
高效的基于段模式的恶意URL检测方法;林海伦;李焱;王伟平;岳银亮;林政;;通信学报(第S1期);全文 *

Also Published As

Publication number Publication date
CN113890866A (en) 2022-01-04

Similar Documents

Publication Publication Date Title
CN110324311B (en) Vulnerability detection method and device, computer equipment and storage medium
CN113098870B (en) Phishing detection method and device, electronic equipment and storage medium
KR102071160B1 (en) Application Information Methods and Devices for Risk Management
CN108932426B (en) Unauthorized vulnerability detection method and device
US10958657B2 (en) Utilizing transport layer security (TLS) fingerprints to determine agents and operating systems
CN109768992B (en) Webpage malicious scanning processing method and device, terminal device and readable storage medium
US10313322B2 (en) Distinguishing human-generated input from programmatically-generated input
CN110324416B (en) Download path tracking method, device, server, terminal and medium
CN112511459B (en) Traffic identification method and device, electronic equipment and storage medium
US20220141252A1 (en) System and method for data filtering in machine learning model to detect impersonation attacks
CN113190838A (en) Web attack behavior detection method and system based on expression
CN112565226A (en) Request processing method, device, equipment and system and user portrait generation method
CN114157568B (en) Browser secure access method, device, equipment and storage medium
CN113709513B (en) Equipment fingerprint processing method, user side, server, system and storage medium
CN114826946A (en) Unauthorized access interface detection method, device, equipment and storage medium
CN110955890B (en) Method and device for detecting malicious batch access behaviors and computer storage medium
CN113704772A (en) Safety protection processing method and system based on user behavior big data mining
US9904662B2 (en) Real-time agreement analysis
CN112887289A (en) Network data processing method and device, computer equipment and storage medium
CN113890866B (en) Illegal application software identification method, device, medium and electronic equipment
CN107995167B (en) Equipment identification method and server
CN112738068B (en) Network vulnerability scanning method and device
KR20190093984A (en) Method and system for evaluating security effectiveness between device
CN114417198A (en) Phishing early warning method, phishing early warning device, phishing early warning system
CN112637171A (en) Data traffic processing method, device, equipment, system and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant