CN113612639A - Method and device for analyzing and predicting file downloading behavior based on website access record - Google Patents

Method and device for analyzing and predicting file downloading behavior based on website access record Download PDF

Info

Publication number
CN113612639A
CN113612639A CN202110871515.9A CN202110871515A CN113612639A CN 113612639 A CN113612639 A CN 113612639A CN 202110871515 A CN202110871515 A CN 202110871515A CN 113612639 A CN113612639 A CN 113612639A
Authority
CN
China
Prior art keywords
access
behavior
file downloading
website
file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110871515.9A
Other languages
Chinese (zh)
Other versions
CN113612639B (en
Inventor
翟欣虎
秦益飞
杨正权
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Yianlian Network Technology Co ltd
Original Assignee
Jiangsu Yianlian Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Yianlian Network Technology Co ltd filed Critical Jiangsu Yianlian Network Technology Co ltd
Priority to CN202110871515.9A priority Critical patent/CN113612639B/en
Publication of CN113612639A publication Critical patent/CN113612639A/en
Application granted granted Critical
Publication of CN113612639B publication Critical patent/CN113612639B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/147Network analysis or design for predicting network behaviour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/145Network analysis or design involving simulating, designing, planning or modelling of a network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/06Protocols specially adapted for file transfer, e.g. file transfer protocol [FTP]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The application provides a method for analyzing and predicting file downloading behaviors based on access records, which comprises the following steps: acquiring a website access record of at least one user accessing a target website; grouping the target website access records according to users to obtain personal access records corresponding to each user, and extracting a characteristic sequence before file downloading; grouping the personal access records according to time periods to obtain time period access records corresponding to each time period, and extracting a non-file downloading characteristic sequence; inputting the characteristic sequence before file downloading and the non-file downloading characteristic sequence into a trained first neural network model, and predicting the occurrence probability of the file downloading behavior of the target website user. The method comprises the steps of analyzing website access records of users in a target website, extracting a characteristic sequence before file downloading and a non-file downloading characteristic sequence from the website access records, enabling a neural network model to learn a file downloading behavior mode of the users, and training the neural network model to predict the occurrence probability of the file downloading behavior of the users in the target website.

Description

Method and device for analyzing and predicting file downloading behavior based on website access record
Technical Field
The application relates to the technical field of network security audit, in particular to a method and a device for analyzing and predicting file downloading behaviors based on website access records.
Background
With the increasing popularization of networks, novel network law violation and range behaviors for implementing crimes by using the networks are increasing day by day, and network security audit is to strengthen and standardize the prevention work of internet security technology, ensure internet network security and information security, make the internet out of deposit healthily and orderly and maintain national security, social order and public interests.
The method for detecting, analyzing and controlling the downloading behavior of the user and the files downloaded by the user are important parts in network security audit, generally, the most accurate data for recording the downloading behavior is the downloading record on the terminal equipment used by the user, but an operator cannot obtain the data on the terminal equipment used by the user through a simple method, so the most practical method is that after the user accesses an operator server, the user website access record generated by the server is analyzed to obtain the downloading behavior data of the user.
However, the existing TCP/IP protocol does not have a clear definition for the operation of the downloading behavior, and the recording modes of the downloading behaviors of the application websites are not unified, so that it is difficult for an operator to judge whether the user has the downloading behavior when performing the user download audit.
In addition, currently, the file downloading behavior of the user identified through the website access record is judged according to the name of the requested resource in the website access record, for example, when the name of the file suffix in the name of the requested resource is a keyword such as doc, pdf, zip, rar, jpg, and the like, the request can be regarded as the file downloading behavior, but the false alarm rate of the statistical method is very high, and the detected file downloading quantity is much larger than the actual file downloading quantity of the user.
For the above situation, a screening rule is further superimposed, for example, the fixed request resource size exceeds a certain threshold to be regarded as a file downloading behavior, but the problem of high false alarm rate still exists. Because there is no criterion for determining the threshold for the size of the requested resource, even if a small resource is requested, it may be a file download activity, and a resource that exceeds the threshold may still not be a download activity.
Disclosure of Invention
In a first aspect, the embodiment of the application provides a method for analyzing and predicting file downloading behaviors based on website access records, which is characterized in that a neural network model learns file downloading behavior patterns of a user by analyzing website access records of the user in a website and extracting a feature sequence before file downloading and a non-file downloading feature sequence from the website access records, and the neural network is trained to predict the occurrence probability of the file downloading behaviors of the user in the website.
Specifically, the method comprises the following steps:
acquiring a website access record of at least one user accessing a target website, wherein the network access record records the access behavior of the user, the network access record comprises a URL (uniform resource locator) address, and the access behavior comprises a file downloading behavior and a non-file downloading behavior;
grouping the target website access records according to users to obtain personal access records corresponding to each user, arranging the personal access records according to a time positive sequence, and extracting a plurality of continuous website access records before the file downloading action as a characteristic sequence before the file downloading;
grouping the personal access records according to time periods to obtain time period access records corresponding to each time period, and extracting continuous multiple access records from the time period access records which do not contain the file downloading behavior as non-file downloading characteristic sequences;
inputting the characteristic sequence before file downloading and the non-file downloading characteristic sequence into a trained first neural network model, and predicting the occurrence probability of the file downloading behavior of the target website user.
The first neural network model comprises a recurrent neural network, the characteristic sequence before file downloading and the characteristic sequence after non-file downloading are input into the recurrent neural network, the characteristic information and the sequence information of each node in the characteristic sequence before file downloading and the characteristic sequence after non-file downloading are recorded, and the characteristic information and the sequence information are converted into an information matrix.
A user may have a series of associated access actions before downloading a file, and therefore, it is necessary to analyze a plurality of consecutive website access records before a website access record of all file downloading actions. However, it is single to predict the file downloading behavior only according to the pre-file downloading feature sequence and the non-pre-file downloading feature sequence, so more dimensional features need to be added to improve the accuracy of the final prediction.
Thus, the method further comprises:
extracting additional characteristic vectors corresponding to the access behaviors according to the personal access records, wherein the additional characteristic vectors comprise all-day distribution characteristic vectors, cycle characteristic vectors, type distribution characteristic vectors and adjacent characteristic vectors, and generating file downloading additional characteristic vectors and non-file downloading additional characteristic vectors;
inputting the pre-file-download feature sequence, the non-file-download feature sequence, the file-download additional feature vector and the non-file-download additional feature vector into a trained second neural network model, and predicting the probability of occurrence of file download behavior of the target website user.
The all-day distribution characteristic vector is the proportion of the access behaviors in all time periods in the all-day to the total number of the behaviors; the periodic feature vector is a maximum time interval in which the access behavior occurs periodically; the type distribution feature vector is the proportion of the access behaviors in the total number of behaviors; the neighboring feature vector is the number of access behaviors.
In addition, in order to mark the access behavior of the website access record quickly, the method further comprises the following steps: carrying out access behavior marking on the website access record, wherein the access behavior marking at least comprises a file downloading behavior; and establishing a corresponding relation between the access behaviors and URL addresses, wherein one access behavior corresponds to one or more URL addresses. Specifically, according to the access behavior with the maximum similarity between the URL address in the website access record and the character string of the URL address in the corresponding relationship, the access behavior mark is performed on the website access record.
Wherein the second neural network model comprises the recurrent neural network, a convolutional neural network, a density layer connected to the recurrent neural network and the convolutional neural network;
and inputting the additional characteristics of the file downloading behaviors and the additional characteristics of the non-file downloading behaviors into the convolutional neural network for feature extraction, and fusing the output results of the convolutional neural network and the cyclic neural network by the density layer and predicting the probability of the next file downloading behaviors of the target website user.
In a second aspect, an embodiment of the present application is based on the same concept, and further provides a device for analyzing and predicting a file downloading behavior based on a website access record, where the device implements the method for analyzing and predicting a file downloading behavior based on a website access record, and the device includes:
an acquisition module: the system comprises a website access record, a network access record and a server, wherein the website access record is used for acquiring a website access record of at least one user accessing a target website, the network access record records the access behavior of the user, the network access record comprises a URL (uniform resource locator) address, and the access behavior comprises a file downloading behavior and a non-file downloading behavior;
a first extraction module: the system is used for grouping the target website access records according to users to obtain personal access records corresponding to each user, arranging the personal access records according to a time positive sequence, and extracting a plurality of continuous website access records before the file downloading action as a file pre-downloading characteristic sequence;
a second extraction module: the personal access records are grouped according to time periods to obtain time period access records corresponding to each time period, and a plurality of continuous access records are extracted from the time period access records which do not contain the file downloading behavior and serve as non-file downloading characteristic sequences;
a prediction module: and the characteristic sequence before file downloading and the non-file downloading characteristic sequence are input into a trained first neural network model, and the probability of occurrence of the file downloading behavior of the target website user is predicted.
In a third aspect, an embodiment of the present application provides an electronic device, including a memory and a processor, where the memory stores a computer program, and the processor is configured to execute the computer program to perform the method for predicting file downloading behavior based on website access record analysis as described above.
In a fourth aspect, an embodiment of the present application provides a computer program product, where the computer program product includes: a program or instructions which, when run on a computer, causes the computer to perform a method of predicting file download behavior based on website visitation record analysis as described above.
In a fifth aspect, embodiments of the present application provide a readable storage medium, in which a computer program is stored, the computer program including program code for controlling a process to execute the process, the process including a method for predicting file download behavior based on website visitation record analysis as described in any of the above embodiments.
According to the method for analyzing and predicting the file downloading behavior based on the website access record, the website access record generated by the user in the target website is analyzed, the characteristic sequence before file downloading and the non-file downloading characteristic sequence are extracted from the website access record, the neural network is made to learn the file downloading behavior mode of the user, and the probability of the next file downloading behavior of the user in the target website of the first neural network model is trained. Particularly, since a user may have a series of associated access behaviors before downloading a file, the cyclic neural network is adopted in the embodiment of the present application to learn the pre-file-download feature sequence and the non-file-download feature sequence, and learning the non-linear features of the feature sequence based on the memory and parameter sharing of the cyclic neural network is more advantageous.
It is worth mentioning that the method does not simply predict whether the downloading behavior occurs or not through the resource name and the resource size, and does not predict only according to file downloading forward progress of website access records as a feature sequence, but extracts an all-day distribution feature vector, a period feature vector, a type distribution feature vector and an adjacent feature vector as additional feature vectors according to website access records generated by users in a target website, and combines the behavior feature sequence and the additional feature vectors to train a second neural network model so as to improve the accuracy rate of predicting the next file downloading behavior of the users in the target website.
The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below to provide a more thorough understanding of the application.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
FIG. 1 is a flow diagram of a method for predicting file download behavior based on website visitation record analysis according to an embodiment of the present application;
FIG. 2 is a schematic structural diagram of a second neural network model in accordance with an embodiment of the present application;
FIG. 3 is a block diagram of an apparatus for analyzing and predicting file downloading behavior based on website access records according to an embodiment of the present application;
fig. 4 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present application.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the following exemplary embodiments do not represent all implementations consistent with one or more embodiments of the present specification. Rather, they are merely examples of apparatus and methods consistent with certain aspects of one or more embodiments of the specification, as detailed in the claims which follow.
It should be noted that: in other embodiments, the steps of the corresponding methods are not necessarily performed in the order shown and described herein. In some other embodiments, the method may include more or fewer steps than those described herein. Moreover, a single step described in this specification may be broken down into multiple steps for description in other embodiments; multiple steps described in this specification may be combined into a single step in other embodiments.
Implement method
The embodiment provides a method for analyzing and predicting file downloading behaviors based on website access records, which comprises the steps of analyzing website access records of users in a website, extracting a characteristic sequence before file downloading and a non-file downloading characteristic sequence from the website access records, enabling a neural network model to learn a file downloading behavior mode of the users, and training the neural network model to predict the occurrence probability of the next file downloading behavior of the users in the website.
Referring to fig. 1, fig. 1 is a flowchart of a method for predicting file downloading behavior based on website access record analysis according to an embodiment of the present application.
As shown in FIG. 1, the method includes steps S1-S4:
step S1: the method comprises the steps of obtaining a website access record of at least one user accessing a target website, wherein the network access record records the access behavior of the user, the network access record comprises a URL (uniform resource locator) address, and the access behavior comprises a file downloading behavior and a non-file downloading behavior.
Usually, according to the needs of network security auditors, a certain website or a certain class of websites is taken as a target website, and website access records generated by all users in the target website for a long time are collected.
After the website access records are collected, the website access records can be filtered, and the additional records generated when the user accesses the website page in the target website are removed to obtain the actual website access records of the user in the target website. The specific filtering mode is as follows: and filtering out items of requests such as jpeg, png, ico, js, css and the like in the website access records, and removing some publicly-known and useless website access records.
In this step, in order to mark the access behavior of the website access record quickly, the preprocessing of the website access record can be realized by establishing the corresponding relationship between the URL and the behavior type. The method therefore further comprises: carrying out access behavior marking on the website access record, wherein the access behavior marking at least comprises a file downloading behavior; and establishing a corresponding relation between the access behaviors and URL addresses, wherein one access behavior corresponds to one or more URL addresses. For example, the filtered website access records are respectively marked as a login system behavior, an access summary page behavior, an access details page behavior, a search behavior, a file download behavior, wherein the login system behavior corresponds to two different URL addresses, the access summary page behavior corresponds to three other different URL addresses, and so on, each behavior corresponding to one or more associated URL addresses. The fine-grained level of the behavioral marker characterization determines the accuracy and generalization ability of the recognition: when the qualitative is thicker, the recognition accuracy is relatively reduced, but the generalization ability is improved; conversely, when the qualitative granularity is finer, the recognition accuracy is improved, but the generalization ability is reduced. The generalization capability refers to the ability of the method to identify when used for access records that are not previously marked. The fine particle degree is determined according to actual needs, and a unified standard is not provided.
Specifically, according to the access behavior with the maximum similarity between the URL address in the website access record and the character string of the URL address in the corresponding relationship, the access behavior mark is performed on the website access record. That is to say, the character string similarity calculation is carried out on the URL character string of the website access record and the URL address corresponding to each access behavior type, all calculation results in each access behavior type are sorted from small to large by taking the average value, and the behavior type with the highest average value is selected as the behavior mark of the website access record.
In addition, if the number of URL addresses corresponding to each access behavior type is too large, a proportional sampling method may be adopted to extract a certain proportion, for example, 10% of URL addresses as samples participating in calculation, and then perform behavior marking on website access records according to contents according to the above method.
Step S2: and grouping the target website access records according to users to obtain personal access records corresponding to each user, arranging the personal access records according to a time positive sequence, and extracting a plurality of continuous website access records before the file downloading action as a characteristic sequence before the file downloading.
In the step, the website access records of the website are grouped according to the unique identification of the user, for example, the website access records are grouped according to the user ID to obtain a personal access record corresponding to each user, then the personal access records of each user are arranged according to a time positive sequence, the website access records marked with the file downloading behaviors are screened out, a plurality of continuous website access records before the website access records marked with the file downloading behaviors are extracted as a feature sequence before file downloading, and the corresponding number of feature sequences before file downloading can be extracted according to the number of file downloading behaviors of the user.
Step S3: grouping the personal access records according to time periods to obtain time period access records corresponding to each time period, and extracting continuous multiple access records from the time period access records which do not contain the file downloading behavior as non-file downloading characteristic sequences.
And (4) grouping the website access records in the personal access record obtained in the step (S2) according to time intervals to obtain time interval access records, for example, grouping every day to obtain time interval access records, and extracting a plurality of continuous website access records from the time interval access records not containing file downloading behaviors as a non-file downloading characteristic sequence.
Step S4: inputting the characteristic sequence before file downloading and the non-file downloading characteristic sequence into a trained first neural network model, and predicting the occurrence probability of the file downloading behavior of the target website user.
The first neural network model comprises a recurrent neural network, and the recurrent neural network is a recurrent neural network which takes sequence data as input, recurs in the evolution direction of the sequence and is connected with all nodes (recurrent units) in a chain manner. And the recurrent neural network has memorability, parameter sharing and complete graphic, so that the recurrent neural network has greater advantage in learning the nonlinear characteristics of the sequence. Inputting the characteristic sequence before file downloading and the non-file downloading characteristic sequence into a recurrent neural network, recording characteristic information and sequence information of each node in the characteristic sequence and the non-file downloading characteristic sequence, converting the characteristic information and the sequence information into an information matrix for further calculation at the downstream of a first neural network model, and finally obtaining the probability of occurrence of the file downloading behavior of the user of the predicted target website. The internal calculation process of the neural network does not have a common sense of describability, and thus the calculation process is not described in detail here.
A user may have a series of associated website visits before downloading a file, and therefore a recurrent neural network is used to perform learning analysis on a plurality of consecutive website visits before the website visits for all file downloading activities. However, it is single to predict the file downloading behavior only according to the pre-file downloading feature sequence and the non-pre-file downloading feature sequence, so more dimensional features need to be added to improve the accuracy of the final prediction.
In other embodiments, additional feature vectors corresponding to the access behaviors may be extracted according to the personal access records, where the additional feature vectors include day-wide distribution feature vectors, period feature vectors, type distribution feature vectors, and neighboring feature vectors, and file download additional feature vectors and non-file download additional feature vectors are generated;
inputting the pre-file-download feature sequence, the non-file-download feature sequence, the file-download additional feature vector and the non-file-download additional feature vector into a trained second neural network model, and predicting the probability of occurrence of file download behavior of the target website user.
The all-day distribution characteristic vector is the proportion of the access behaviors in all time periods in the all-day to the total number of the behaviors; the periodic feature vector is a maximum time interval in which the access behavior occurs periodically; the type distribution feature vector is the proportion of the access behaviors in the total number of behaviors; the neighboring feature vector is the number of access behaviors. Specifically, the all-day distributed feature vector is a feature vector generated by discretizing various access behaviors according to all-day time periods and counting the proportion of a certain behavior in each time period to the total times of all-day behaviors; the periodic feature vector is the maximum time interval of a certain behavior which occurs periodically; the type distribution characteristic vector is a characteristic vector which is used for counting the proportion of each access behavior in all behaviors and generating the length of the type of the non-repeated access behavior; the adjacent feature vectors are feature vectors which take the number of the access behavior types which are fully arranged as the length by counting the transition probability between the adjacent access behaviors.
Specific structure of the second neural network model referring to fig. 2, fig. 2 is a schematic structural diagram of the second neural network model according to the embodiment of the present application. As shown in fig. 2, the model includes a recurrent neural network, a convolutional neural network, and a density layer connected after the recurrent neural network and the convolutional neural network. Inputting a characteristic sequence before file downloading characteristics and a non-file downloading characteristic sequence into a recurrent neural network for characteristic extraction, wherein the recurrent neural network extracts abstract characteristics of the sequence of the characteristic sequences; inputting the file downloading additional characteristic vector and the non-file downloading additional characteristic vector into a convolutional neural network for characteristic extraction, wherein the convolutional neural network can better extract the characteristics without recording the sequence of input data; and then the density layer fuses output results of the cyclic neural network and the convolutional neural network and predicts the probability of the occurrence of the file downloading behavior of the website user.
And finally, comparing the occurrence probability of the file downloading behavior with a set occurrence threshold, and when the occurrence probability of the file downloading behavior is greater than the occurrence threshold, indicating that a user is about to perform the file downloading behavior in the target website.
The method for analyzing and predicting the file downloading behavior based on the website access record can be used for constructing a first neural network model and a second neural network model for a certain website and also for the same type of website, and can replace the characteristic sequence and the characteristic vector according to the thought of the method, and extract the required behavior characteristic sequence and behavior characteristic vector to train the corresponding neural network model.
Example two
Based on the same concept, referring to fig. 3, the present embodiment further provides an apparatus for analyzing and predicting a file downloading behavior based on a website access record, where the apparatus implements the method for analyzing and predicting a file downloading behavior based on a website access record, and the apparatus includes:
an acquisition module: the system comprises a website access record, a network access record and a server, wherein the website access record is used for acquiring a website access record of at least one user accessing a target website, the network access record records the access behavior of the user, the network access record comprises a URL (uniform resource locator) address, and the access behavior comprises a file downloading behavior and a non-file downloading behavior;
a first extraction module: the system is used for grouping the target website access records according to users to obtain personal access records corresponding to each user, arranging the personal access records according to a time positive sequence, and extracting a plurality of continuous website access records before the file downloading action as a file pre-downloading characteristic sequence;
a second extraction module: the personal access records are grouped according to time periods to obtain time period access records corresponding to each time period, and a plurality of continuous access records are extracted from the time period access records which do not contain the file downloading behavior and serve as non-file downloading characteristic sequences;
a prediction module: and the characteristic sequence before file downloading and the non-file downloading characteristic sequence are input into a trained first neural network model, and the probability of occurrence of the file downloading behavior of the target website user is predicted.
EXAMPLE III
The present embodiment further provides an electronic apparatus, specifically referring to fig. 4, including a memory 304 and a processor 302, where the memory 304 stores a computer program, and the processor 302 is configured to execute the computer program to perform the steps of any one of the methods for analyzing and predicting file downloading behavior based on website access records in the foregoing embodiments.
Specifically, the processor 302 may include a Central Processing Unit (CPU), or A Specific Integrated Circuit (ASIC), or may be configured to implement one or more Integrated circuits of the embodiments of the present Application.
Memory 304 may include, among other things, mass storage 304 for data or instructions. By way of example, and not limitation, memory 304 may include a Hard Disk Drive (Hard Disk Drive, abbreviated to HDD), a floppy Disk Drive, a Solid State Drive (SSD), flash memory, an optical Disk, a magneto-optical Disk, tape, or a Universal Serial Bus (USB) Drive or a combination of two or more of these. Memory 304 may include removable or non-removable (or fixed) media, where appropriate. The memory 304 may be internal or external to the data processing apparatus, where appropriate. In a particular embodiment, the memory 304 is a Non-Volatile (Non-Volatile) memory. In particular embodiments, Memory 304 includes Read-Only Memory (ROM) and Random Access Memory (RAM). The ROM may be mask-programmed ROM, Programmable ROM (PROM), Erasable PROM (EPROM), Electrically Erasable PROM (EEPROM), Electrically rewritable ROM (EAROM), or FLASH Memory (FLASH), or a combination of two or more of these, where appropriate. The RAM may be a Static Random-Access Memory (SRAM) or a Dynamic Random-Access Memory (DRAM), where the DRAM may be a Fast Page Mode Dynamic Random-Access Memory 304 (FPMDRAM), an Extended data output Dynamic Random-Access Memory (eddram), a Synchronous Dynamic Random-Access Memory (SDRAM), and the like.
The memory 304 may be used to store or cache various initialization data files that need to be processed and/or used for communication, as well as possibly computer program instructions executed by the processor 302.
The processor 302 may be configured to read and execute the computer program instructions stored in the memory 304 to implement the method for predicting file downloading behavior based on website access record analysis in the above embodiments.
Optionally, the electronic apparatus may further include a transmission device 306 and an input/output device 308, where the transmission device 306 is connected to the processor 302, and the input/output device 308 is connected to the processor 302.
The transmitting device 306 may be used to receive or transmit data via a network. Specific examples of the network described above may include wired or wireless networks provided by communication providers of the electronic devices. In one example, the transmission device includes a Network adapter (NIC) that can be connected to other Network devices through a base station to communicate with the internet. In one example, the transmitting device 306 can be a Radio Frequency (RF) module, which is used to communicate with the internet in a wireless manner.
The input/output device 308 is used to input or output information. For example, the input/output device may be a display screen, a mouse, a keyboard, or other devices. In this embodiment, the input device is used to input the acquired information, the input information may be data, tables, images, real-time videos, and the output information may be texts, charts, alarm information, etc. displayed by the service system.
Alternatively, in this embodiment, the processor 302 may be configured to execute the following steps by a computer program:
acquiring a website access record of at least one user accessing a target website, wherein the network access record records the access behavior of the user, the network access record comprises a URL (uniform resource locator) address, and the access behavior comprises a file downloading behavior and a non-file downloading behavior;
grouping the target website access records according to users to obtain personal access records corresponding to each user, arranging the personal access records according to a time positive sequence, and extracting a plurality of continuous website access records before the file downloading action as a characteristic sequence before the file downloading;
grouping the personal access records according to time periods to obtain time period access records corresponding to each time period, and extracting continuous multiple access records from the time period access records which do not contain the file downloading behavior as non-file downloading characteristic sequences;
inputting the characteristic sequence before file downloading and the non-file downloading characteristic sequence into a trained first neural network model, and predicting the occurrence probability of the file downloading behavior of the target website user.
In addition, in combination with the method for analyzing and predicting file downloading behavior based on website access records in the foregoing embodiments, the embodiments of the present application may be implemented as a computer program product. The computer program product comprises: a program or instructions which, when run on a computer, causes the computer to perform a method of implementing any of the above embodiments for predicting file download behavior based on website visitation record analysis.
In addition, in combination with the method for analyzing and predicting file downloading behavior based on website access records in the foregoing embodiments, the embodiments of the present application may provide a readable storage medium to implement. The readable storage medium having stored thereon a computer program; the computer program comprises program code for controlling a process to perform a process comprising any one of the above embodiments of the method for analyzing and predicting file download behavior based on website visitation records.
In general, the various embodiments may be implemented in hardware or special purpose circuits, software, logic or any combination thereof. Some aspects of the invention may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device, although the invention is not limited thereto. While various aspects of the invention may be illustrated and described as block diagrams, flow charts, or using some other pictorial representation, it is well understood that these blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.
Embodiments of the invention may be implemented by computer software executable by a data processor of the mobile device, such as in a processor entity, or by hardware, or by a combination of software and hardware. Computer software or programs (also referred to as program products) including software routines, applets and/or macros can be stored in any device-readable data storage medium and they include program instructions for performing particular tasks. The computer program product may comprise one or more computer-executable components configured to perform embodiments when the program is run. The one or more computer-executable components may be at least one software code or a portion thereof. Further in this regard it should be noted that any block of the logic flow as in the figures may represent a program step, or an interconnected logic circuit, block and function, or a combination of a program step and a logic circuit, block and function. The software may be stored on physical media such as memory chips or memory blocks implemented within the processor, magnetic media such as hard or floppy disks, and optical media such as, for example, DVDs and data variants thereof, CDs. The physical medium is a non-transitory medium.
It should be understood by those skilled in the art that various features of the above embodiments can be combined arbitrarily, and for the sake of brevity, all possible combinations of the features in the above embodiments are not described, but should be considered as within the scope of the present disclosure as long as there is no contradiction between the combinations of the features.
The above examples are merely illustrative of several embodiments of the present application, and the description is more specific and detailed, but not to be construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present application shall be subject to the appended claims.

Claims (10)

1. The method for analyzing and predicting file downloading behaviors based on website access records comprises the following steps:
acquiring a website access record of at least one user accessing a target website, wherein the network access record records the access behavior of the user, the network access record comprises a URL (uniform resource locator) address, and the access behavior comprises a file downloading behavior and a non-file downloading behavior;
grouping the target website access records according to users to obtain personal access records corresponding to each user, arranging the personal access records according to a time positive sequence, and extracting a plurality of continuous website access records before the file downloading action as a characteristic sequence before the file downloading;
grouping the personal access records according to time periods to obtain time period access records corresponding to each time period, and extracting continuous multiple access records from the time period access records which do not contain the file downloading behavior as non-file downloading characteristic sequences;
inputting the characteristic sequence before file downloading and the non-file downloading characteristic sequence into a trained first neural network model, and predicting the occurrence probability of the file downloading behavior of the target website user.
2. The method of claim 1, wherein the first neural network model comprises a recurrent neural network, the pre-file-download and non-file-download signature sequences are input into the recurrent neural network, the signature information and sequence information of each node in the two are recorded, and the signature information and the sequence information are converted into an information matrix.
3. The method for analyzing and predicting file download behavior based on website visitation record as claimed in claim 1, further comprising:
extracting additional characteristic vectors corresponding to the access behaviors according to the personal access records, wherein the additional characteristic vectors comprise all-day distribution characteristic vectors, cycle characteristic vectors, type distribution characteristic vectors and adjacent characteristic vectors, and generating file downloading additional characteristic vectors and non-file downloading additional characteristic vectors;
inputting the pre-file-download feature sequence, the non-file-download feature sequence, the file-download additional feature vector and the non-file-download additional feature vector into a trained second neural network model, and predicting the probability of occurrence of file download behavior of the target website user.
4. The method for analyzing and predicting file downloading behavior based on website access records according to claim 2, wherein the all-day distribution feature vector is a proportion of the access behavior in each time period in the all-day to the total behavior; the periodic feature vector is a maximum time interval in which the access behavior occurs periodically; the type distribution feature vector is the proportion of the access behaviors in the total number of behaviors; the neighboring feature vector is the number of access behaviors.
5. The method for analyzing and predicting file download behavior based on website visitation record as claimed in claim 1, further comprising: carrying out access behavior marking on the website access record, wherein the access behavior marking at least comprises a file downloading behavior; and establishing a corresponding relation between the access behaviors and URL addresses, wherein one access behavior corresponds to one or more URL addresses.
6. The method for analyzing and predicting file downloading behavior based on website access record according to claim 4, wherein the website access record is marked with the access behavior according to the access behavior with the largest similarity between the URL address in the website access record and the character string of the URL address in the corresponding relationship.
7. The method for analyzing and predicting file download behavior based on website visitation records according to claim 2, wherein the second neural network model comprises the recurrent neural network, a convolutional neural network, a density layer connected to the recurrent neural network and the convolutional neural network;
and the file downloading behavior additional characteristic and the non-file downloading behavior additional characteristic are input into the convolutional neural network for characteristic extraction, and the density layer fuses output results of the convolutional neural network and predicts the probability of the next file downloading behavior of the target website user.
8. Device based on website access record analysis prediction file download action, its characterized in that includes:
an acquisition module: the system comprises a website access record, a network access record and a server, wherein the website access record is used for acquiring a website access record of at least one user accessing a target website, the network access record records the access behavior of the user, the network access record comprises a URL (uniform resource locator) address, and the access behavior comprises a file downloading behavior and a non-file downloading behavior;
a first extraction module: the system is used for grouping the target website access records according to users to obtain personal access records corresponding to each user, arranging the personal access records according to a time positive sequence, and extracting a plurality of continuous website access records before the file downloading action as a file pre-downloading characteristic sequence;
a second extraction module: the personal access records are grouped according to time periods to obtain time period access records corresponding to each time period, and a plurality of continuous access records are extracted from the time period access records which do not contain the file downloading behavior and serve as non-file downloading characteristic sequences;
a prediction module: and the characteristic sequence before file downloading and the non-file downloading characteristic sequence are input into a trained first neural network model, and the probability of occurrence of the file downloading behavior of the target website user is predicted.
9. An electronic device comprising a memory and a processor, wherein the memory stores a computer program, and the processor is configured to execute the computer program to perform the method for analyzing and predicting file downloading behavior based on website access records according to any one of claims 1 or 6.
10. A computer program product, characterized in that it comprises software code portions for performing the method for predicting file download behavior based on website visitation record analysis according to any of claims 1-6, when said computer program product is run on a computer.
CN202110871515.9A 2021-07-30 2021-07-30 Method and device for analyzing and predicting file downloading behavior based on website access record Active CN113612639B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110871515.9A CN113612639B (en) 2021-07-30 2021-07-30 Method and device for analyzing and predicting file downloading behavior based on website access record

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110871515.9A CN113612639B (en) 2021-07-30 2021-07-30 Method and device for analyzing and predicting file downloading behavior based on website access record

Publications (2)

Publication Number Publication Date
CN113612639A true CN113612639A (en) 2021-11-05
CN113612639B CN113612639B (en) 2022-11-11

Family

ID=78306247

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110871515.9A Active CN113612639B (en) 2021-07-30 2021-07-30 Method and device for analyzing and predicting file downloading behavior based on website access record

Country Status (1)

Country Link
CN (1) CN113612639B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107423442A (en) * 2017-08-07 2017-12-01 火烈鸟网络(广州)股份有限公司 Method and system, storage medium and computer equipment are recommended in application based on user's portrait behavioural analysis
CN109902849A (en) * 2018-06-20 2019-06-18 华为技术有限公司 User's behavior prediction method and device, behavior prediction model training method and device
CN111797978A (en) * 2020-07-08 2020-10-20 北京天融信网络安全技术有限公司 Internal threat detection method and device, electronic equipment and storage medium
CN111798259A (en) * 2019-04-09 2020-10-20 Oppo广东移动通信有限公司 Application recommendation method and device, storage medium and electronic equipment
CN112801719A (en) * 2021-03-01 2021-05-14 深圳市欢太科技有限公司 User behavior prediction method, user behavior prediction device, storage medium, and apparatus

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107423442A (en) * 2017-08-07 2017-12-01 火烈鸟网络(广州)股份有限公司 Method and system, storage medium and computer equipment are recommended in application based on user's portrait behavioural analysis
CN109902849A (en) * 2018-06-20 2019-06-18 华为技术有限公司 User's behavior prediction method and device, behavior prediction model training method and device
CN111798259A (en) * 2019-04-09 2020-10-20 Oppo广东移动通信有限公司 Application recommendation method and device, storage medium and electronic equipment
CN111797978A (en) * 2020-07-08 2020-10-20 北京天融信网络安全技术有限公司 Internal threat detection method and device, electronic equipment and storage medium
CN112801719A (en) * 2021-03-01 2021-05-14 深圳市欢太科技有限公司 User behavior prediction method, user behavior prediction device, storage medium, and apparatus

Also Published As

Publication number Publication date
CN113612639B (en) 2022-11-11

Similar Documents

Publication Publication Date Title
US10567412B2 (en) Security threat detection based o patterns in machine data events
TWI706273B (en) Uniform resource locator (URL) attack detection method, device and electronic equipment
Ali Alheeti et al. Intelligent intrusion detection in external communication systems for autonomous vehicles
CN112434208A (en) Training of isolated forest and identification method and related device of web crawler of isolated forest
CN105183873A (en) Malicious clicking behavior detection method and device
CN108023868B (en) Malicious resource address detection method and device
CN108366012B (en) Social relationship establishing method and device and electronic equipment
CN103631787A (en) Webpage type recognition method and webpage type recognition device
CN112165484A (en) Network encryption traffic identification method and device based on deep learning and side channel analysis
Smolak et al. The impact of human mobility data scales and processing on movement predictability
EP3647982B1 (en) Cyber attack evaluation method and cyber attack evaluation device
JP7304223B2 (en) Methods and systems for generating hybrid learning techniques
EP3705974B1 (en) Classification device, classification method, and classification program
US10346856B1 (en) Personality aggregation and web browsing
Bai et al. Application behavior identification in DNS tunnels based on spatial-temporal information
Elezaj et al. Criminal network community detection in social media forensics
KR101785288B1 (en) Apparatus, Method, and Program for Fraud Detecting Related to an Online Content
CN117220968A (en) Honey point domain name optimizing deployment method, system, equipment and storage medium
CN113612639B (en) Method and device for analyzing and predicting file downloading behavior based on website access record
CN116776390A (en) Method, device, storage medium and equipment for monitoring data leakage behavior
CN111181756B (en) Domain name security judgment method, device, equipment and medium
CN109063721A (en) A kind of method and device that behavioural characteristic data are extracted
CN111639277A (en) Automated extraction method of machine learning sample set and computer-readable storage medium
CN113254672A (en) Abnormal account identification method, system, equipment and readable storage medium
Liang et al. Predicting network response times using social information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant