CN113592522A - Method and apparatus for processing traffic data, and computer-readable storage medium - Google Patents
Method and apparatus for processing traffic data, and computer-readable storage medium Download PDFInfo
- Publication number
- CN113592522A CN113592522A CN202110204546.9A CN202110204546A CN113592522A CN 113592522 A CN113592522 A CN 113592522A CN 202110204546 A CN202110204546 A CN 202110204546A CN 113592522 A CN113592522 A CN 113592522A
- Authority
- CN
- China
- Prior art keywords
- access behavior
- object access
- vector
- parameter
- value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0241—Advertisements
- G06Q30/0251—Targeted advertisements
- G06Q30/0269—Targeted advertisements based on user profile or attribute
- G06Q30/0271—Personalized advertisement
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0241—Advertisements
- G06Q30/0277—Online advertisement
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Business, Economics & Management (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Development Economics (AREA)
- Finance (AREA)
- Accounting & Taxation (AREA)
- Strategic Management (AREA)
- General Business, Economics & Management (AREA)
- Biomedical Technology (AREA)
- Entrepreneurship & Innovation (AREA)
- Economics (AREA)
- Marketing (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Game Theory and Decision Science (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Probability & Statistics with Applications (AREA)
- Information Transfer Between Computers (AREA)
Abstract
Disclosed are a method and apparatus for processing traffic data, and a computer-readable storage medium, the method including: acquiring flow data for accessing network services; extracting an object identification parameter from a value of a field related to object identification and an object access behavior parameter from a value of a field related to object access behavior based on the traffic data; determining an object identifier corresponding to the object access behavior parameter based on the object login data and the object identification parameter; constructing an object access behavior sequence, wherein the object access behavior sequence comprises an object identifier and object access behavior parameters which correspond to the object identifier and are arranged in a time sequence; and encoding the object access behavior sequence into an object access behavior feature vector. The method and the device can reduce the workload of manually extracting and analyzing the flow data, can automatically code the access behaviors of the objects into vectors, and have better generalization capability.
Description
Technical Field
The present disclosure relates to the field of artificial intelligence services, and more particularly, to a method and apparatus for processing traffic data, and a computer-readable storage medium.
Background
With the rapid development of network technology, Web services are widely applied to network services. For some network services, traffic data from the entire network may reach the trillion level.
It is very difficult to identify/track objects (e.g., users or terminals originating traffic) from a huge amount of traffic data. For example, traffic data of a user is collected at a network layer, and thus the collected traffic data often identifies the user according to an IP address of the user. However, the same user may use different IP addresses at different time periods, or the same IP address is shared by multiple users, and therefore, identifying users according to IP addresses is often unreliable. For another example, a user may have various accounts (e.g., QQ number, micro signal, video website membership account, etc.), and thus, it is not reliable to identify the user only by the user account.
Also, it is very difficult to identify the behavior of the user accessing the network service (e.g., the user's preferences, interests, etc.) from the massive amount of traffic data. For example, a user may access multiple addresses of network services, and the fields host (business address) and cgi (common gateway interface) of different network services are often of different formats, of different lengths. Also, for cgi, the cgi that the user accesses often carries a number and a resource ID. This results in traffic data, particularly in the cgi field, having too much feature space and noise to identify the user's behavior.
Therefore, there is a need for improvement of the prior art to provide a method for automatically and efficiently identifying users and user behaviors from a large amount of traffic data, and classifying and displaying the user behaviors. It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure, and thus may include information that does not constitute prior art known to those of ordinary skill in the art.
Disclosure of Invention
Embodiments of the present disclosure provide a method and apparatus for processing traffic data, and a computer-readable storage medium.
An embodiment of the present disclosure provides a method for processing traffic data, including: acquiring flow data for accessing network services; extracting an object identification parameter from a value of a field related to object identification and an object access behavior parameter from a value of a field related to object access behavior based on the traffic data; determining an object identifier corresponding to the object access behavior parameter based on the object login data and the object identification parameter; constructing an object access behavior sequence, wherein the object access behavior sequence comprises an object identifier and object access behavior parameters which correspond to the object identifier and are arranged in a time sequence; and encoding the object access behavior sequence into an object access behavior feature vector.
An embodiment of the present disclosure further provides an apparatus for processing object traffic data, including: a processor; and a memory, wherein the memory has stored therein a computer-executable program that, when executed by the processor, performs the method described above.
Embodiments of the present disclosure also provide a computer-readable storage medium having stored thereon computer instructions, which when executed by a processor, implement the above-described method.
According to another aspect of the present disclosure, a computer program product or computer program is provided, the computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer readable medium, and the processor executes the computer instructions to cause the computer device to perform the method provided in the above aspects or various alternative implementations of the above aspects.
The embodiment of the disclosure also provides a device for processing the flow data. The apparatus for processing traffic data comprises: a data acquisition module configured to: acquiring flow data for accessing network services; an object identifier mapping module configured to extract an object identification parameter from a value of a field related to object identification and an object access behavior parameter from a value of a field related to object access behavior based on the traffic data; and determining an object identifier corresponding to the object access behavior parameter based on the object login data and the object identification parameter.
For example, the apparatus for processing traffic data further comprises an embedded generation module configured to construct a sequence of object access behaviors, the sequence of object access behaviors comprising an object identifier and object access behavior parameters corresponding to the object identifier and arranged in a time sequence; and encoding the object access behavior sequence into an object access behavior feature vector.
For example, the apparatus for processing traffic data further comprises a data mart module configured to cluster the object access behavior feature vectors to generate cluster labels of the object access behaviors; and generating recommendation information or classification information associated with the object identifier based on the cluster label.
For example, the apparatus for processing traffic data further includes a data presentation module configured to reduce the object access behavior feature vector into a two-dimensional vector, and draw an object behavior presentation interface based on the two-dimensional vector, where distances of similar object access behavior feature vectors in the object behavior presentation interface are close.
Embodiments of the present disclosure provide a method of processing full network traffic that encodes object access behavior into an object access behavior feature vector (e.g., an embedding (embedding) vector) using a neural network model (e.g., a word2vec model). The embodiment of the disclosure can reduce the workload of manually extracting and analyzing the flow data, can automatically encode the access behavior of the object into the vector, and has better generalization capability. The embodiment of the disclosure can be applied to services such as advertisement recommendation and the like to realize advertisement recommendation aiming at the object level.
The embodiment of the disclosure can cluster the object access behavior feature vectors into class labels in an unsupervised clustering mode, and has better generalization capability compared with the method of directly using the values of each field of the flow data, and can generate a more robust prediction result.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings used in the description of the embodiments will be briefly introduced below. The drawings in the following description are merely exemplary embodiments of the disclosure.
Fig. 1 is an example schematic diagram illustrating a scenario in which multiple objects access a web service in accordance with an embodiment of the present disclosure.
Fig. 2A is a flow chart illustrating a method of processing traffic data according to an embodiment of the present disclosure.
Fig. 2B is a diagram illustrating a piece of traffic data according to an embodiment of the present disclosure.
Fig. 2C is a schematic diagram illustrating object login data and object identification parameters according to an embodiment of the present disclosure.
Fig. 3 is a schematic diagram illustrating obtaining object identification parameters and object access behavior parameters according to an embodiment of the disclosure.
Fig. 4A is a schematic diagram illustrating a word vector conversion model according to an embodiment of the present disclosure.
Fig. 4B is a schematic diagram illustrating an objective function according to an embodiment of the present disclosure.
Fig. 4C illustrates a hierarchical normalization process according to an embodiment of the disclosure.
Fig. 5 illustrates an apparatus for processing traffic data according to an embodiment of the present disclosure.
FIG. 6 shows a schematic diagram of an electronic device according to an embodiment of the disclosure.
Fig. 7 shows a schematic diagram of an architecture of an exemplary computing device, according to an embodiment of the present disclosure.
FIG. 8 shows a schematic diagram of a storage medium according to an embodiment of the disclosure.
Detailed Description
In order to make the objects, technical solutions and advantages of the present disclosure more apparent, example embodiments according to the present disclosure will be described in detail below with reference to the accompanying drawings. It is to be understood that the described embodiments are merely a subset of the embodiments of the present disclosure and not all embodiments of the present disclosure, with the understanding that the present disclosure is not limited to the example embodiments described herein.
In the present specification and the drawings, steps and elements having substantially the same or similar characteristics are denoted by the same or similar reference numerals, and repeated description of the steps and elements will be omitted. Meanwhile, in the description of the present disclosure, the terms "first", "second", and the like are used only for distinguishing the description, and are not to be construed as indicating or implying relative importance or order.
For the purpose of describing the present disclosure, concepts related to the present disclosure are introduced below.
Embodiments of the present disclosure relate to word vector transformation models (e.g., word2vec models) that may be Artificial Intelligence (AI) based. Artificial intelligence is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. For example, for the word vector conversion model of the present disclosure, it is possible to read and understand traffic data in a manner similar to a human, and discover object (e.g., user) specific access behavior therefrom. Artificial intelligence enables the word vector transformation model disclosed by the invention to have a function of identifying objects and object access behaviors from flow data by researching the design principle and implementation method of various intelligent machines.
The artificial intelligence technology relates to the field of extensive, and has the technology of hardware level and the technology of software level. The artificial intelligence software technology mainly comprises a computer vision technology, natural language processing, machine learning/deep learning and the like.
Optionally, the word vector conversion model in the present disclosure employs Natural Language Processing (NLP) technology. Natural language processing technology is an important direction in the fields of computer science and artificial intelligence, and can implement various theories and methods for effectively communicating between human and computer by using natural language. Natural language processing is a science integrating linguistics, computer science and mathematics. Thus, based on natural language processing techniques, the word vector conversion model of the present disclosure may parse massive traffic data to obtain characteristics of object access behavior (e.g., user access behavior).
Optionally, the natural language processing techniques employed by embodiments of the present disclosure may also be Machine Learning (ML) and deep Learning based. Machine learning is a multi-field cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The natural language processing technology utilizes machine learning to study how a computer simulates or realizes the behavior of human learning language, acquires new knowledge or skills by analyzing the existing and classified text data, and reorganizes the existing knowledge structure to continuously improve the performance of the knowledge structure. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and the like.
Alternatively, the word vector transformation models that may be used in the embodiments of the present disclosure hereinafter may all be artificial intelligence models, in particular artificial intelligence based neural network models. Typically, artificial intelligence based neural network models are implemented as acyclic graphs, with neurons arranged in different layers. Typically, the neural network model comprises an input layer and an output layer, the input layer and the output layer being separated by at least one hidden layer. The hidden layer transforms input received by the input layer into a representation that is useful for generating output in the output layer. The network nodes are all connected to nodes in adjacent layers via edges, and no edge exists between nodes in each layer. Data received at a node of an input layer of a neural network is propagated to a node of an output layer via any one of a hidden layer, an active layer, a pooling layer, a convolutional layer, and the like. The input and output of the neural network model may take various forms, which the present disclosure does not limit.
The scheme provided by the embodiment of the disclosure relates to technologies such as artificial intelligence, natural language processing and machine learning, and is specifically described by the following embodiment.
The word vector conversion model of the embodiments of the present disclosure may be specifically integrated in an electronic device, which may be a terminal or a server or the like. For example, the word vector conversion model may be integrated in the terminal. The terminal may be, but is not limited to, a mobile phone, a tablet Computer, a notebook Computer, a desktop Computer, a Personal Computer (PC), a smart speaker, a smart watch, or the like. At this time, the terminal serves as an analysis node of traffic data to hierarchically process and analyze traffic data flowing through the terminal.
For another example, the word vector conversion model of embodiments of the present disclosure may be integrated at a server. The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as cloud service, a cloud database, cloud computing, a cloud function, cloud storage, Network service, cloud communication, middleware service, domain name service, security service, Content Delivery Network (CDN), big data and an artificial intelligence platform. The terminal and the server may be directly or indirectly connected through wired or wireless communication, and the disclosure is not limited thereto.
It can be understood that the device for reasoning by applying the word vector conversion model of the embodiment of the disclosure may be a terminal, a server, or a system composed of a terminal and a server.
It is understood that the training method of the word vector conversion model of the embodiments of the present disclosure may be executed on a terminal, may be executed on a server, or may be executed by both the terminal and the server.
The word vector conversion model provided by the embodiment of the disclosure can also relate to artificial intelligence cloud services in the technical field of cloud. The Cloud technology (Cloud technology) is a hosting technology for unifying series resources such as hardware, software, network and the like in a wide area network or a local area network to realize calculation, storage, processing and sharing of data. The cloud technology is a general term of network technology, information technology, integration technology, management platform technology, application technology and the like applied based on a cloud computing business model, can form a resource pool, is used as required, and is flexible and convenient. Cloud computing technology will become an important support. Background services of the technical network system require a large amount of computing and storage resources, such as video websites, picture-like websites and more web portals. With the high development and application of the internet industry, each article may have its own identification mark and needs to be transmitted to a background system for logic processing, data in different levels are processed separately, and various industrial data need strong system background support and can only be realized through cloud computing.
Among them, the artificial intelligence cloud Service is also generally called AIaaS (AI as a Service, chinese). The method is a service mode of an artificial intelligence platform, and particularly, the AIaaS platform splits several types of common AI services and provides independent or packaged services at a cloud. This service model is similar to the one opened in an AI theme mall: all developers can access one or more artificial intelligence services provided by the platform through an Application Programming Interface (API), and some of the sophisticated developers can also use an AI framework and an AI infrastructure provided by the platform to deploy and operate and maintain cloud artificial intelligence services specific to the developers.
Fig. 1 is an example schematic diagram illustrating a scenario 100 in which multiple users access a network service in accordance with an embodiment of the disclosure.
At present, a plurality of network cloud services are carried on the same cloud network service operation and analysis platform. Some large group of network service providers may provide email services, video services, music services, shopping services, photo storage and processing services, chat services, and the like on a cloud network service operation and analysis platform as shown in fig. 1.
For a large-flow network platform, massive flow data are often collected, and manual extraction and analysis of the massive data are almost impossible. Traffic data (e.g., HTTP packets) includes a number of fields, some of which may be unrelated to object identification (e.g., user identification) and analysis of object access behavior (e.g., user access behavior). Meanwhile, a single piece of flow data may have too much feature space, resulting in great analysis difficulty.
The following is described with an example of a user as an object, however, it should be understood by those skilled in the art that the object that can be identified based on traffic data includes, but is not limited to, a user, and any other object that may be determined based on traffic data (e.g., a user terminal device, an application that initiates traffic, etc.).
For example, the same user may use different devices to access network services, as shown in fig. 1, and the user may use a mobile phone terminal, a portable computer, a tablet computer, or the like to access different network services. These devices often have different device identifiers and different IP addresses. The cloud network service operation and analysis platform can only know the IP address of the device initiating the access from the traffic data (e.g., HTTP data packets), and therefore, the cloud network service operation and analysis platform often recognizes the traffic data as different users accessing the network service, thereby causing difficulty in subsequent analysis of user access behavior.
Also for example, multiple users may use the same device to access network services. Such a device may be a public device, such as a computer in an internet cafe, etc. At this time, the same IP address is shared by a plurality of users. Therefore, the cloud network service operation and analysis platform often identifies traffic data initiated by different users as that the same user is accessing the network service, thereby causing difficulty in subsequent user access behavior analysis.
As another example, the same user may also have various accounts (e.g., QQ numbers, micro-signals, video website affiliate accounts, etc.). The user may use different accounts to access different network services provided on the same cloud network service operation and analysis platform. Therefore, identifying the user based only on the user account is also unreliable.
Therefore, there is a need for improvement of the prior art to provide a method for automatically and efficiently identifying objects and object behaviors from a large amount of traffic data, and classifying and displaying the object behaviors.
The present disclosure is based on the insight that a method of processing traffic data is provided that encodes object access behavior into an object access behavior feature vector (e.g., an embedding (embedding) vector) using a neural network model (e.g., a word2vec model). Thus, the data size for identifying objects from traffic data and analyzing the behavior of the objects is greatly reduced. The embodiment of the disclosure can reduce the workload of manually extracting and analyzing the flow data, can automatically encode the access behavior of the object into the vector, and has better generalization capability. The embodiment of the disclosure can be applied to services such as advertisement recommendation and the like to realize advertisement recommendation aiming at the object level.
The embodiment of the disclosure can cluster the object access behavior feature vectors into class labels in an unsupervised clustering mode, and has better generalization capability compared with the method of directly using the values of each field of the flow data, and can generate a more robust prediction result.
Fig. 2A is a flow chart illustrating a method 200 of processing traffic data according to an embodiment of the present disclosure. Fig. 2B is a diagram illustrating a piece of traffic data according to an embodiment of the present disclosure. Fig. 2C is a schematic diagram illustrating object login data and object identification parameters according to an embodiment of the present disclosure.
Referring to fig. 2A, in step S201, traffic data of an access network service is acquired.
As shown in fig. 2B, the user sends an access request (e.g., a network request based on hypertext Transfer Protocol (HTTP) or hypertext Transfer security Protocol over Secure Socket Layer) to the cloud network service operation and analysis platform to access the network service. Therefore, the cloud network service operation and analysis platform can acquire a piece of flow data. For a network platform with large flow, it often collects massive flow data. These network data are from the entire network and are therefore also referred to as traffic data.
In step S202, based on the traffic data, an object identification parameter is extracted from the value of the field related to object identification, and an object access behavior parameter is extracted from the value of the field related to object access behavior.
Taking the access request of the HTTP protocol shown in fig. 2B as an example, the HTTP access request includes a plurality of fields: a request header field, a CGI (Common network Interface) field, a GET parameter field, a POST parameter field, a request method field, a UA (User Agent) field, a referrer field, a cookie field, a DATE field, a Client-IP (Client IP address) field, and the like. Those skilled in the art will appreciate that the fields shown in fig. 2B are merely examples, and that an HTTP access request may also include more or fewer fields, as the present disclosure does not limit.
Fields related to object recognition help identify the identity of an object (e.g., a user). Among the above fields, the fields related to object identification are, for example: a Client-IP field, a UA field, a DATE field, etc. Assume that the value of the Client-IP field is 1.0.1, the value of the UA field is Mozilla/4.0, and the value of the DATE field is Wed,03Feb 202102:39:19 GMT. The cloud network service operations and analysis platform may determine that there is an object requesting access to the network service at 1.0.1 IP address at 02:39:19, 2/3/2021/year, beijing, and that the object uses Mozilla/4.0 version of the browser. It will be understood by those skilled in the art that the Client-IP field, UA field, DATE field are merely examples, and fields related to object recognition may include more or less fields, which are not limited by the present disclosure.
Fields related to object access behavior help identify the behavior of an object. Among the above fields, the fields related to the object access behavior are, for example: a Host field, a CGI field, a referrer field, etc.
For example, a request address URL (Uniform Resource Locator) in an access request of the HTTP protocol is composed of HOST + CGI. For example, the URL shown in FIG. 2B: http:// www.test.com/CGI-bin/hello. py, the value of the common gateway interface (i.e., CGI) field is: py, the value of the HOST field (hereinafter also referred to as site field) is: http:// www.test.com. And the referrer field includes a URL indicating that the object accesses the currently requested page starting from the page represented by the URL.
Let the value of the Host field be http:// www.test.com, the value of the CGI field be/CGI-bin/hello. py, and the value of the referrer be http:// localhost:8088/user/register. html. The cloud network service operations and analysis platform may determine that there is an object attempting to access http:// www.test.com from a http:// localhost:8088/user/register. It will be understood by those skilled in the art that the Host field, CGI field, and referrer field are merely examples, and that fields relating to object access behavior may include more or fewer fields, as the present disclosure is not limited thereto.
Thus, the cloud network service operation and analysis platform extracts a value of a field related to object recognition as an object recognition parameter and a value of a field related to object access behavior as an object access behavior parameter from the traffic data, thereby reducing the data amount of analysis traffic data.
Optionally, in order to further reduce the data volume of the analysis traffic data, the traffic data may also be filtered. For example, the cloud network service operations and analysis platform may filter the traffic data based on the importance of the values of the site fields of the traffic data. Then sampling a part of the traffic data according to a preset rule from the screened traffic data, and extracting a value of a field related to the object access behavior from the sampled part of the traffic data as an object access behavior parameter. Wherein the predetermined rule is, for example, random sampling, thereby increasing the robustness of the selected traffic data.
Optionally, in order to further reduce the data volume of the analysis traffic data, a plurality of object access behavior parameters may be further processed. E.g., normalizing the value of the CGI field, etc. The method of extracting the data amount of the reduced traffic data involved in the values of the fields related to object identification and the fields related to object access behavior will be further described later with reference to fig. 3, and the present disclosure will not be described herein again.
In step S203, an object identifier corresponding to the object access behavior parameter is determined based on the object login data and the object identification parameter.
Taking fig. 2C as an example, the object registration data includes at least one of the following items: an object login log, an internet protocol address of the object login, an object account of the object login, a device identifier of the object login, and a time of the object login. It will be understood by those skilled in the art that the above data is merely an example, and that the object registration data may also include more or less data, and the disclosure is not limited thereto.
For example, an object identifier may be used to uniquely identify an object. The determining, based on the object login data and the object identification parameter, the object identifier corresponding to the object access behavior parameter further includes: and determining an object identifier corresponding to the object access behavior parameter based on the constraint of the object login data on the object identification parameter.
An example of determining an object identifier corresponding to an object access behavior parameter by using object login data and an object identification parameter is briefly described below with reference to fig. 2C.
As shown in fig. 2C, assume that there are two related object logs (e.g., user logs) on the server. The first object log is associated with user X and indicates: the login time is Wed,03Feb 202101:39:19 GMT; the login time is 2 hours; the login IP address is 1.0.1; the login account is a brook; the logging device is a handset 230. Thus, it may be determined that the constraints of the first object log on the object identification parameters (e.g., assuming that the object identification parameters include only Client-IP, UA, and DATE) include: the time for initiating the flow data is required to be between Wed,03Feb 202101:39:19GMT and Wed,03Feb 202103:39:19 GMT; the login IP address is 1.0.1.
The second object log is associated with user Y and indicates: the login time is Mon,01Feb 202102:39:19 GMT; the login time is 2 hours; the login IP address is 1.0.1; the login account is the sea; the logging device is a handset 230. Thus, it may be determined that the constraints of the second object log on the object identification parameters (e.g., assuming that the object identification parameters include only Client-IP, UA, and DATE) include: the time for initiating the flow data is between Mon,01Feb 202102:39:19GMT and Mon,01Feb 202104:39:19 GMT; the login IP address is 1.0.1.
Then it may be determined that the object identification parameter for the piece of streaming data corresponds to the first object log based on the object identification parameters (e.g., Client-IP of 1.0.1, UA of Mozilla/4.0, DATE: Wed,03Feb 202102:39:19 GMT). That is, the piece of streaming data corresponds to the object identifier of the object X.
Furthermore, various ways may be used to determine the object identifier corresponding to the object access behavior parameter based on the object login data and the object identification parameter. For example, the object identifier may be determined based on the object login data and the object identification parameters by a neural network model. In some embodiments, the object ID of the user account/user account association may also be used directly as the object identifier to reduce the computational cost.
In step S204, an object access behavior sequence is constructed, where the object access behavior sequence includes an object identifier and object access behavior parameters corresponding to the object identifier and arranged in time order.
The object access behavior sequence is, for example, in the format [ object identifier, object access behavior parameter-1, object access behavior parameter-2, object access behavior parameter-3 … … ], in which the object access behavior parameters are arranged in time.
See table 1 below, which shows some examples of constructed sequences of object access behaviors.
In the example in table 1, the object access behavior parameter is a value normalized by the value of the CGI field, and those skilled in the art understand that the object access behavior parameter may have other forms or values, which is not limited in the present disclosure.
Although only the object access behavior parameters extracted from the 4 pieces of traffic data are shown in table 1, it should be understood by those skilled in the art that the present disclosure does not limit the number of pieces of traffic data. For example, the traffic data of user X for the last month may be spliced into a sequence of object access behaviors as described above.
In step S205, the object access behavior sequence is encoded into an object access behavior feature vector.
For example, step S205 further includes encoding each object access behavior parameter in the sequence of object access behaviors as a word vector using a word vector conversion model (e.g., word2vec model); and acquiring the object access behavior characteristic vector at least partially based on the word vector corresponding to each object access behavior parameter. The word2vec model is a text vectorization neural network algorithm in the field of natural language processing, and compared with an earlier bag-of-words model, the word2vec model has unique advantages in dimensionality, retention of context word order and retention of semantic context.
Optionally, the object access behavior feature vector may be a concatenation of word vectors corresponding to each object access behavior parameter. The object access behavior feature vector may be a Sentence vector obtained by further calculating based on a word vector using a neural network model such as CNN (Convolutional neural networks), Sentence2Vec, and encoder-decoder. The present disclosure is not so limited.
Optionally, the cloud network service operation and analysis platform may cluster the object access behavior feature vectors to generate a cluster label of the object access behavior; and generating recommendation information or classification information associated with the object identifier based on the cluster label. For example, the cloud network service operation and analysis platform may cluster the object access behavior feature vectors into cluster labels using k-means and other clustering methods, so that similar features may be aggregated together and provided to downstream services (e.g., targeted advertisements, personalized recommendations, etc.), or provided to other services (e.g., user behavior anomaly detection) in the cloud network service operation and analysis platform.
For example, assume that user X uses multiple services in a cloud network service operations and analysis platform, e.g., watching an online video using a user account (account number "creek"); listen to music, shop, and chat using the user account (account number "grass"). By using the clustering label of the user access behavior, the cloud network service operation and analysis platform can determine that the user X likes the content related to the trend, so that the video related to the trend can be tried to be pushed to the user account of the account number 'brook', and a trend new product can be pushed to the user account of the account number 'grass'.
For another example, if the frequency of the traffic data initiated by the user X for a long time is medium and low, but the cloud network service operation and analysis platform suddenly finds that the multiple accounts owned by the nearest user X suddenly initiate a higher frequency access request. The cloud network service operation and analysis platform may determine that the behavior of user X is abnormal (i.e., generate classification information indicating that the user behavior is abnormal), have a risk of stealing a number, and so on.
The present disclosure does not limit the manner of using the object access behavior feature vector and the manner of clustering the object access behavior feature vector as long as it can satisfy the requirements of a specific service.
For another example, the cloud network service operation and analysis platform may further reduce the object access behavior feature vector into a two-dimensional vector, and draw an object behavior display interface based on the two-dimensional vector, where distances of similar object access behavior feature vectors in the object behavior display interface are similar.
For example, t-SNE (t-Distributed stored probabilistic bor Embedding) technology can be used to reduce and visualize the object access behavior feature vector. the t-SNE is a nonlinear dimensionality reduction technology and can be used for visualizing high-dimensional data, and compared with the conventional dimensionality reduction technology such as PCA (principal component analysis), the t-SNE can well keep nonlinear characteristics and can better keep semantic characteristics of an original object access behavior characteristic vector after being mapped to a low-dimensional space. The dimension reduction mode of the object access behavior feature vector is not limited by the disclosure as long as the requirement of visualization can be met.
Thus, the method for processing full-network traffic provided by the embodiment of the disclosure can encode the object access behavior into an object access behavior feature vector (e.g., embedding vector) by using a neural network model (e.g., word2vec model). Thus, the data size for identifying objects from traffic data and analyzing the behavior of the objects is greatly reduced. The embodiment of the disclosure can reduce the workload of manually extracting and analyzing the flow data, can automatically encode the access behavior of the object into the vector, and has better generalization capability. The embodiment of the disclosure can be applied to services such as advertisement recommendation and the like so as to realize advertisement recommendation aiming at the user level.
The embodiment of the disclosure can cluster the object access behavior feature vectors into class labels in an unsupervised clustering mode, and has better generalization capability compared with the method of directly using the values of each field of the flow data, and can generate a more robust prediction result.
Fig. 3 is a schematic diagram illustrating obtaining object identification parameters and object access behavior parameters according to an embodiment of the disclosure.
For cloud platforms with a volume of trillion levels, collecting (especially in real time) the full volume of traffic data of the whole network can cause huge pressure on cloud storage equipment. The cloud network service operation and analysis platform needs to sample the traffic accessed by the object.
For example, the step S202 further includes: screening the traffic data based on the importance of the value of the site field of the traffic data; sampling a part of flow data from the screened flow data according to a preset rule; and extracting an object access behavior parameter from a value of a field related to the object access behavior based on the sampled portion of the traffic data.
For example, in actual use, the importance of the value of the site field may be set according to the priority of the traffic of interest. For example, traffic with large user traffic tends to be more important than traffic with small user traffic. As another example, businesses that involve monetary transactions tend to be more important than businesses that do not. Embodiments of the present disclosure may determine the importance of the value of the site field based on consideration of various factors, and the present disclosure is not limited thereto.
For example, the predetermined rule may randomly sample 1% of the traffic data. The random sampling rate may be determined according to the data magnitude that the storage platform can withstand, which is not limited by this disclosure.
Although only the values of the site field and the CGI field are collected in fig. 3 for extracting the object access behavior parameters, those skilled in the art should also understand that the values of other fields (e.g., access time, client IP, etc.) may also be collected, which is not limited by the present disclosure.
With continued reference to fig. 3, in the case where the field related to the object access behavior comprises a common gateway interface field, the extracting the object access behavior parameter from the value of the field related to the object access behavior further comprises: extracting a value corresponding to a common gateway interface field from the traffic data, and performing at least one of the following replacement operations on the value corresponding to the common gateway interface field: for each integer string in the value of the common gateway interface field, replacing the integer string with a constant string corresponding to the integer string, wherein different integer strings correspond to different constant strings; and/or replacing a character string with a constant character string for indicating that the character string length is greater than the preset length in the value of the public gateway interface field; and taking the value corresponding to the field of the common gateway interface after the replacement operation as the object access behavior parameter.
Referring to the example in fig. 3, the information in the common gateway interface field is generally highly relevant to the content of interest of the object. However, as shown in fig. 3, the value of the common gateway interface field for object access often carries an identifier of a number or a resource, which results in an excessively large feature space for the value of the common gateway interface field for object access. Therefore, the cloud platform can simply preprocess the value of the common gateway interface field of the object through preprocessing.
The preprocessing mode shown in fig. 3 includes two modes. The first method is as follows: an integer string in the value of the common gateway interface field (such as/index/123/456) is replaced with a constant string (e.g.,/index/NUM). The second method comprises the following steps: a string (e.g., 5d41402abc4b2a76b9719d911 in/index/5 d41402abc4b2a76b9719d911) having a segment length greater than a preset length (e.g., 16 letters/numbers) in the value of the common gateway interface field is processed into a constant string (e.g.,/index/STR, where STR indicates that the character at the location is greater than the preset length) indicating that the string length is greater than the preset length. The number of the object access behavior parameters acquired by the processing is less, and a more robust training effect can be generated in the subsequent model training process. The manner in which the word vector conversion model is trained will be described later with reference to fig. 4A to 4C, and the disclosure is not repeated herein. The two ways described above are merely examples, and the present disclosure does not limit this.
By processing the object access behavior parameters in fig. 3, the data size of the analyzed traffic data is further reduced, and the robustness of the selected traffic data is also increased. Compared with the method of directly using the values of the fields of the traffic data to identify the behavior of the object, the embodiment of the disclosure has better generalization capability and can generate more robust prediction results.
Fig. 4A is a schematic diagram illustrating a word vector conversion model according to an embodiment of the present disclosure. Fig. 4B is a schematic diagram illustrating an objective function according to an embodiment of the present disclosure. Fig. 4C illustrates a hierarchical normalization process according to an embodiment of the disclosure.
As shown in fig. 4A, the word vector conversion model includes an input layer, a hidden layer, and an output layer.
Wherein the input layer is configured to: and sequentially converting each object access behavior parameter in the object access behavior sequence into an input vector of a first dimension, and sequentially inputting the input vector to a hidden layer. The hidden layer is configured to: the input vector is converted into a hidden layer vector of a second dimension, and the hidden layer vector is input to the output layer, wherein the second dimension is smaller than the first dimension. The output layer is configured to: and normalizing the hidden layer vector to obtain an output vector, and taking the output vector as a word vector corresponding to the object access behavior parameter.
As shown in fig. 4A, optionally, in the process from the input layer to the hidden layer, the input layer converts each object access behavior parameter into a one-hot vector (also called a one-hot vector). only one bit in the one-hot vector is allowed to be 1. Therefore, in such a case, the first dimension is the sum of the number of object identifiers and the number of object access behavior parameters.
Hereinafter, the input vector is denoted as xtAnd the input vector corresponding to the object access behavior parameter with the index of t in the object access behavior sequence is represented.
And recording the weight matrix from the input layer to the hidden layer as W, wherein the dimension is Nxd, and d is a second dimension which is smaller than the first dimension. The second dimension is configurable, depending on the accuracy of the object behavior features that the word vector transformation model wishes to extract. Based on this, the hidden layer vector can be denoted as h, and the calculation formula is: h ═ WTxt。WTIs the transpose of the weight matrix of the hidden layer.
The output layer is configured such that the hidden layer vector can be normalized with softmax to obtain an output vector. Hereinafter, the output vector is denoted as y, y being Softmax (U)Th) Wherein softmax is a normalization function, UTIs the transpose of the weight matrix of the output layer.
Optionally, the object access behavior feature p(s) derived based on the output vector may be expressed as:at this time, the object access behavior characteristics p(s) are composed of a plurality of output vectors combined in order. Where N is the value of the first dimension, w1Representing an object identifier, w2,...wTRepresenting individual object access behavior parameters in the sequence of object access behaviors.
The training process of the word vector transformation model is described next with reference to fig. 4B.
For example, the method 200 for processing traffic data according to the embodiment of the present disclosure further includes: training a word vector conversion model before encoding each object access behavior parameter in the sequence of object access behaviors as a word vector using the word vector conversion model.
The training of the word vector conversion model comprises: the input layer converts each object access behavior parameter in the object access behavior sequence into a one-hot coded vector as an input vector, and inputs the input vector to the hidden layer in sequence; acquiring an output vector by utilizing the hidden layer and the output layer; calculating a value corresponding to the objective function based on the output vector; and adjusting parameters of the neurons in the hidden layer based on the value corresponding to the objective function so as to maximize the value corresponding to the objective function.
Fig. 4B shows the first, second, and third objective functions. Wherein the objective function of the word vector conversion model is one of a first objective function, a second objective function, and a third objective function. The objective function of the word vector conversion model may also be a weighted sum of the first objective function, the second objective function, and the third objective function.
Wherein the first objective function indicates: a similarity between the predicted value of the object access behavior predicted based on the object identifier and each parameter of the object access behavior in the sequence of object access behaviors. That is, the training of the word vector conversion model is a process of maximizing a similarity between an object access behavior prediction value predicted based on an object identifier and each object access behavior parameter in the object access behavior sequence.
For example, the first objective function L1 is Where N is the value of the first dimension, w1Representing an object identifier, w2,...wTRepresenting individual object access behavior parameters in a sequence of object access behaviors,is represented by an object identifier of w1In the case of (1), the probability of occurrence of the (j-1) th object access behavior parameter.
The second objective function indicates a similarity between an object access behavior prediction value predicted based on a current object access behavior parameter and a plurality of object access behavior parameters before and after the current object access behavior. That is, the training of the word vector conversion model is a process of maximizing the similarity between the predicted value of the object access behavior predicted based on the current parameter of the object access behavior and a plurality of parameters of the object access behavior before and after the current object access behavior.
For example, the second objective function L2 isWhere N is the value of the first dimension, wtAccessing a behavior parameter for the current object, wt+pFor the p-th current object access behavior parameter before the current object access behavior parameter, p (w)t+p|wt) For accessing a behavior parameter w based on a current objecttPredicted access behavior parameter w at current objecttThe predicted value of the previous p-th current object access behavior and wt+pThe similarity between them.
The third objective function indicates a similarity between a predicted value of the current object access behavior predicted based on a plurality of object access behavior parameters before and after the current object access behavior and the current object access behavior parameter. That is, the training of the word vector conversion model is a process of maximizing a similarity between a predicted value of the current object access behavior predicted based on a plurality of object access behavior parameters before and after the current object access behavior and the current object access behavior parameter.
For example, the third objective function L3 isWhere N is the value of the first dimension, wtThe behavior parameters are accessed for the current object,wt+pfor the p-th current object access behavior parameter before the current object access behavior parameter, p (w)t+p|wt) Based on accessing the behavior parameter w at the current objecttThe previous p-th object access behavior parameter wt+pPredicted current object access behavior prediction value and wtThe similarity between them.
In the process of solving the objective function, the softmax function needs to be calculated. Since the solution of the softmax function is very expensive, in the training process of the word vector conversion model, the calculation process of the output vector (namely the predicted value of the object access behavior mentioned above) is processed by using hierarchical normalization and negative sampling so as to reduce the calculation amount of the output vector.
Hierarchical normalization (e.g., hierarchical softmax) utilizes huffman trees to compute the output vector. The leaf nodes of the tree are all possible object access behavior parameters, and the non-leaf nodes are all logistic regression bi-classifiers (e.g., θ 1 to θ 4 in fig. 4C), each with different parameters. Therefore, the object access behavior parameters can be classified quickly, and the calculation amount of the softmax function is reduced.
Similarly, the softmax function also needs to be calculated when the vector is inferred. It may also use hierarchical normalization to compute the output vector. Namely, in the process of reasoning the output vector by using the word vector conversion model, the calculation process of the output vector is processed by using the hierarchical normalization.
In addition, negative sampling may also be used to aid in the training of the word vector conversion model. The larger the prediction probability of the word vector conversion model for the positive case is, the better (i.e., the closer the predicted value is to the true value, the better). For example, for the sequence [ X,/index/Cgi _ download. py,/index _5/NUM/NUM,/index _6/STR ], the closer the first object behavior parameter predicted based on the object identifier X is to/index/Cgi _ download. py (i.e./index/Cgi _ download. py is a positive example), the better. Suppose that the object is not likely to access/STR/Cgi _ baby (i.e./STR/Cgi _ baby is a negative example). And the word vector corresponding to the first object behavior parameter predicted based on the object identifier X is better as the difference between the word vectors corresponding to/STR/Cgi _ basic is larger.
Due to the small number of good cases, it is easy to ensure that the prediction probability of each good case is as large as possible. The negative examples are extremely large in number, so that the idea of negative sampling is to randomly select some negative examples according to a certain negative sampling strategy and then ensure that the prediction probability of the selected negative examples is as small as possible.
There are various strategies for negative sampling, e.g., uniform negative sampling, sampling at word frequency, etc. Optionally, the negative sampling strategy employed by embodiments of the present disclosure is uniform negative sampling. The present disclosure does not limit the strategy of negative sampling as long as it can satisfy the requirements.
Therefore, the neural network model (for example, word2vec model) provided by the embodiment of the disclosure has high training convergence speed and good prediction result.
Fig. 5 illustrates an apparatus 500 for processing traffic data according to an embodiment of the disclosure. The apparatus 500 for processing traffic data comprises: a data collection module 501, an object identifier mapping module 502, an embedding generation module 503, a data mart module 504, and a data presentation module 505.
A data acquisition module 501 configured to: traffic data for accessing a network service is obtained.
An object identifier mapping module 502 configured to extract an object identification parameter from a value of a field related to object identification and an object access behavior parameter from a value of a field related to object access behavior based on the traffic data; and determining an object identifier corresponding to the object access behavior parameter based on the object login data and the object identification parameter.
An embedding generation module 503 configured to construct an object access behavior sequence including an object identifier and object access behavior parameters corresponding to the object identifier and arranged in time order; and encoding the object access behavior sequence into an object access behavior feature vector.
A data mart module 504 configured to cluster the object access behavior feature vectors to generate cluster labels for object access behaviors; and generating recommendation information or classification information associated with the object identifier based on the cluster label.
And a data presentation module 505 configured to reduce the object access behavior feature vector into a two-dimensional vector, and draw an object behavior presentation interface based on the two-dimensional vector, where distances of similar object access behavior feature vectors in the object behavior presentation interface are close.
Thus, the apparatus for processing full-network traffic provided by the embodiment of the disclosure can encode the object access behavior into an object access behavior feature vector (e.g., embedding (embedding) vector) by using a neural network model (e.g., word2vec model). Thus, the data size for identifying objects from traffic data and analyzing the behavior of the objects is greatly reduced. The embodiment of the disclosure can reduce the workload of manually extracting and analyzing the flow data, can automatically encode the access behavior of the object into the vector, and has better generalization capability. The embodiment of the disclosure can be applied to services such as advertisement recommendation and the like to realize advertisement recommendation aiming at the object level.
As shown in fig. 6, the terminal and the server may be an electronic device 2000 (also referred to as a device 2000). The electronic device 2000 may include one or more processors 2010, and one or more memories 2020. Stored in the memory 2020, among other things, is computer readable code which, when executed by the one or more processors 2010, may perform various methods as described above.
The processor in the disclosed embodiments may be an integrated circuit chip having signal processing capabilities. The processor may be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like, which may be of the X86 or ARM architecture.
In general, the various example embodiments of this disclosure may be implemented in hardware or special purpose circuits, software, firmware, logic or any combination thereof. Certain aspects may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device. While aspects of embodiments of the disclosure have been illustrated or described as block diagrams, flow charts, or using some other pictorial representation, it is well understood that the blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.
For example, a method or apparatus in accordance with embodiments of the present disclosure may also be implemented by way of the architecture of computing device 3000 shown in fig. 7. As shown in fig. 7, computing device 3000 may include a bus 3010, one or more CPUs 3020, a Read Only Memory (ROM)3030, a Random Access Memory (RAM)3040, a communication port 3050 to connect to a network, input/output components 3060, a hard disk 3070, and the like. A storage device in the computing device 3000, such as the ROM 3030 or the hard disk 3070, may store various data or files used in the processing and/or communication of the method for determining a driving risk of a vehicle provided by the present disclosure, as well as program instructions executed by the CPU. Computing device 3000 can also include user interface 3080. Of course, the architecture shown in FIG. 7 is merely exemplary, and one or more components of the computing device shown in FIG. 7 may be omitted as needed in implementing different devices.
According to yet another aspect of the present disclosure, there is also provided a computer-readable storage medium. Fig. 8 shows a schematic diagram 4000 of a storage medium according to the present disclosure.
As shown in fig. 8, the computer storage media 4020 has stored thereon computer readable instructions 4010. The computer readable instructions 4010, when executed by a processor, can perform the various methods according to the embodiments of the disclosure described with reference to the above figures. The computer readable storage medium in embodiments of the present disclosure may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The non-volatile memory may be read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), or flash memory. Volatile memory can be Random Access Memory (RAM), which acts as external cache memory. By way of example and not limitation, many forms of RAM are available, such as Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), Synchronous Dynamic Random Access Memory (SDRAM), Double Data Rate Synchronous Dynamic Random Access Memory (DDRSDRAM), Enhanced Synchronous Dynamic Random Access Memory (ESDRAM), Synchronous Link Dynamic Random Access Memory (SLDRAM), and direct memory bus random access memory (DR RAM). It should be noted that the memories of the methods described herein are intended to comprise, without being limited to, these and any other suitable types of memory. It should be noted that the memories of the methods described herein are intended to comprise, without being limited to, these and any other suitable types of memory.
Embodiments of the present disclosure also provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The computer instructions are read by a processor of a computer device from a computer-readable storage medium, and the computer instructions are executed by the processor to cause the computer device to perform various methods according to embodiments of the present disclosure.
It is to be noted that the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In general, the various example embodiments of this disclosure may be implemented in hardware or special purpose circuits, software, firmware, logic or any combination thereof. Certain aspects may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device. While aspects of embodiments of the disclosure have been illustrated or described as block diagrams, flow charts, or using some other pictorial representation, it is well understood that the blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.
The exemplary embodiments of the present disclosure described in detail above are merely illustrative, and not restrictive. It will be appreciated by those skilled in the art that various modifications and combinations of these embodiments or features thereof may be made without departing from the principles and spirit of the disclosure, and that such modifications are intended to be within the scope of the disclosure.
Claims (15)
1. A method of processing traffic data, comprising:
acquiring flow data for accessing network services;
extracting an object identification parameter from a value of a field related to object identification and an object access behavior parameter from a value of a field related to object access behavior based on the traffic data;
determining an object identifier corresponding to the object access behavior parameter based on the object login data and the object identification parameter;
constructing an object access behavior sequence, wherein the object access behavior sequence comprises the object identifier and object access behavior parameters which correspond to the object identifier and are arranged in a time sequence; and
and encoding the object access behavior sequence into an object access behavior characteristic vector.
2. The method of claim 1, wherein the field related to object access behavior comprises a site field, and wherein extracting object identification parameters from values of the field related to object identification and object access behavior parameters from values of the field related to object access behavior based on the traffic data further comprises:
screening the traffic data based on the importance of the value of the site field of the traffic data;
sampling a part of flow data from the screened flow data according to a preset rule; and
extracting the object access behavior parameter from a value of a field related to the object access behavior based on the sampled portion of the traffic data.
3. The method of any of claims 1 or 2, wherein the object access behavior-related field comprises a common gateway interface field, and wherein extracting object access behavior parameters from values of the object access behavior-related field further comprises:
extracting a value corresponding to a common gateway interface field from the traffic data, and performing at least one of the following replacement operations on the value corresponding to the common gateway interface field:
for each integer string in the value of the communal gateway interface field, replacing the integer string with a constant string corresponding to the integer string, wherein different integer strings correspond to different constant strings; and/or
For a character string with the length of the character string larger than a preset length in the value of the public gateway interface field, replacing the character string with a constant character string for indicating that the length of the character string is larger than the preset length;
and taking the value corresponding to the field of the common gateway interface after the replacement operation as the object access behavior parameter.
4. The method of claim 1, wherein the object registration data comprises at least one of: the log of object login, the Internet protocol address of object login, the account number of object login, the device identifier of object login, and the time of object login;
wherein the determining, based on the object login data and the object identification parameter, the object identifier corresponding to the object access behavior parameter further includes: and determining an object identifier corresponding to the object access behavior parameter based on the constraint of the object login data on the object identification parameter.
5. The method of claim 1, wherein encoding the sequence of object access behaviors into the object access behavior feature vector further comprises:
encoding each object access behavior parameter in the object access behavior sequence into a word vector by using a word vector conversion model; and
and acquiring the object access behavior characteristic vector at least partially based on the word vector corresponding to each object access behavior parameter.
6. The method of claim 5, wherein the word vector conversion model comprises an input layer, a hidden layer, and an output layer, wherein,
the input layer is configured to: sequentially converting each object access behavior parameter in the object access behavior sequence into an input vector of a first dimension, and sequentially inputting the input vector to the hidden layer;
the hidden layer is configured to: converting the input vector into a hidden layer vector of a second dimension, wherein the second dimension is smaller than the first dimension, and inputting the hidden layer vector into the output layer;
the output layer is configured to: and normalizing the hidden layer vector to obtain an output vector, and taking the output vector as a word vector corresponding to the object access behavior parameter.
7. The method of claim 6, further comprising:
before encoding each object access behavior parameter in the object access behavior sequence into a word vector using a word vector conversion model, training the word vector conversion model;
wherein the training the word vector conversion model further comprises:
the input layer converts each object access behavior parameter in the object access behavior sequence into a one-hot coded vector as the input vector, and inputs the input vector to the hidden layer in sequence;
acquiring an output vector by utilizing the hidden layer and the output layer;
calculating a value corresponding to the objective function based on the output vector;
and adjusting parameters of the neurons in the hidden layer based on the value corresponding to the objective function so as to maximize the value corresponding to the objective function.
8. The method of claim 7, wherein the objective function is indicative of a weighted sum of one or more of a first objective function, a second objective function, a third objective function, wherein,
a first objective function indicates a similarity between an object access behavior prediction value predicted based on an object identifier and each object access behavior parameter in the sequence of object access behaviors;
a second objective function indicating a similarity between an object access behavior prediction value predicted based on a current object access behavior parameter and a plurality of object access behavior parameters before and after the current object access behavior; or
The third objective function indicates a similarity between a predicted value of the current object access behavior predicted based on a plurality of object access behavior parameters before and after the current object access behavior and the current object access behavior parameter.
9. The method of claim 8, wherein,
the first objective function L1 is Where N is the value of the first dimension, w1Representing an object identifier, w2,…wTRepresenting individual object access behavior parameters in a sequence of object access behaviors,is represented by an object identifier of w1The probability of occurrence of the j-1 th object access behavior parameter;
the second objective function L1 isWhere N is the value of the first dimension, wtAccessing a behavior parameter for the current object, wt+pFor the p-th current object access behavior parameter before the current object access behavior parameter, p (w)t+p|wt) For accessing a behavior parameter w based on a current objecttPredicted access behavior parameter w at current objecttThe predicted value of the previous p-th current object access behavior and wt+pThe similarity between them;
the third objective function L3 isWhere N is the value of the first dimension, wtAccessing a behavior parameter for the current object, wt+pFor the p-th current object access behavior parameter before the current object access behavior parameter, p (w)t+p|wt) Based on accessing the behavior parameter w at the current objecttPrevious p-th object access behavior parameterNumber wt+pPredicted current object access behavior prediction value and wtThe similarity between them.
10. The method of any one of claims 6-9,
the hidden layer vector h is h ═ WTxtWherein x istRepresenting an input vector corresponding to an object access behavior parameter indexed t in said sequence of object access behaviors, WTTranspose of weight matrix for hidden layer;
the output layer vector y is equal to softmax (U)Th) Wherein softmax is a normalization function, UTIs the transpose of the weight matrix of the output layer.
11. The method of any one of claims 7-9, wherein said obtaining an output vector using said hidden layer and said output layer further comprises:
in the process of reasoning the output vector by using the word vector conversion model, processing the calculation process of the output vector by using hierarchical normalization; and
and in the process of training the word vector conversion model, utilizing hierarchical normalization and negative sampling to process the calculation process of the output vector.
12. The method of any of claims 1 to 9, further comprising:
clustering the object access behavior characteristic vectors to generate clustering labels of the object access behaviors; and
generating recommendation information or classification information associated with the object identifier based on the cluster label.
13. The method of any of claims 1 to 9, further comprising:
and reducing the object access behavior feature vector into a two-dimensional vector, and drawing an object behavior display interface based on the two-dimensional vector, wherein the distances of similar object access behavior feature vectors in the object behavior display interface are close.
14. An apparatus for processing object traffic data, comprising:
a processor; and
memory, wherein the memory has stored therein a computer-executable program that, when executed by the processor, performs the method of any of claims 1-13.
15. A computer readable storage medium having stored thereon computer instructions which, when executed by a processor, implement the method of any one of claims 1-13.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110204546.9A CN113592522A (en) | 2021-02-23 | 2021-02-23 | Method and apparatus for processing traffic data, and computer-readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110204546.9A CN113592522A (en) | 2021-02-23 | 2021-02-23 | Method and apparatus for processing traffic data, and computer-readable storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113592522A true CN113592522A (en) | 2021-11-02 |
Family
ID=78238054
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110204546.9A Pending CN113592522A (en) | 2021-02-23 | 2021-02-23 | Method and apparatus for processing traffic data, and computer-readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113592522A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114567498A (en) * | 2022-03-04 | 2022-05-31 | 科来网络技术股份有限公司 | Metadata extraction and processing method and system for network behavior visualization |
CN115603955A (en) * | 2022-09-26 | 2023-01-13 | 北京百度网讯科技有限公司(Cn) | Abnormal access object identification method, device, equipment and medium |
CN115603999A (en) * | 2022-10-12 | 2023-01-13 | 中国电信股份有限公司(Cn) | Container safety protection method, device, equipment and storage medium |
-
2021
- 2021-02-23 CN CN202110204546.9A patent/CN113592522A/en active Pending
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114567498A (en) * | 2022-03-04 | 2022-05-31 | 科来网络技术股份有限公司 | Metadata extraction and processing method and system for network behavior visualization |
CN114567498B (en) * | 2022-03-04 | 2024-02-02 | 科来网络技术股份有限公司 | Metadata extraction and processing method and system for network behavior visualization |
CN115603955A (en) * | 2022-09-26 | 2023-01-13 | 北京百度网讯科技有限公司(Cn) | Abnormal access object identification method, device, equipment and medium |
CN115603955B (en) * | 2022-09-26 | 2023-11-07 | 北京百度网讯科技有限公司 | Abnormal access object identification method, device, equipment and medium |
CN115603999A (en) * | 2022-10-12 | 2023-01-13 | 中国电信股份有限公司(Cn) | Container safety protection method, device, equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103678652B (en) | Information individualized recommendation method based on Web log data | |
Huang et al. | Identifying disaster related social media for rapid response: a visual-textual fused CNN architecture | |
US20190392258A1 (en) | Method and apparatus for generating information | |
CN113592522A (en) | Method and apparatus for processing traffic data, and computer-readable storage medium | |
CN111371806A (en) | Web attack detection method and device | |
CN113032525A (en) | False news detection method and device, electronic equipment and storage medium | |
CN114915468B (en) | Intelligent analysis and detection method for network crime based on knowledge graph | |
CN113128773B (en) | Training method of address prediction model, address prediction method and device | |
CN112231592A (en) | Network community discovery method, device, equipment and storage medium based on graph | |
CN111625715A (en) | Information extraction method and device, electronic equipment and storage medium | |
CN114330966A (en) | Risk prediction method, device, equipment and readable storage medium | |
CN113779429A (en) | Traffic congestion situation prediction method, device, equipment and storage medium | |
CN111403011B (en) | Registration department pushing method, device and system, electronic equipment and storage medium | |
CN116977701A (en) | Video classification model training method, video classification method and device | |
CN115687647A (en) | Notarization document generation method and device, electronic equipment and storage medium | |
CN115204436A (en) | Method, device, equipment and medium for detecting abnormal reasons of business indexes | |
Li et al. | [Retracted] Deep Unsupervised Hashing for Large‐Scale Cross‐Modal Retrieval Using Knowledge Distillation Model | |
CN116432648A (en) | Named entity recognition method and recognition device, electronic equipment and storage medium | |
Singh et al. | Advances in Computing and Data Sciences: Second International Conference, ICACDS 2018, Dehradun, India, April 20-21, 2018, Revised Selected Papers, Part II | |
JP2024530998A (en) | Machine learning assisted automatic taxonomy for web data | |
CN113919338B (en) | Method and device for processing text data | |
CN115051863A (en) | Abnormal flow detection method and device, electronic equipment and readable storage medium | |
CN114896412A (en) | Embedded vector generation method, and method and device for classifying same-name personnel based on enterprise pairs | |
CN113627514A (en) | Data processing method and device of knowledge graph, electronic equipment and storage medium | |
Prasad et al. | Face-Based Alumni Tracking on Social Media Using Deep Learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
REG | Reference to a national code |
Ref country code: HK Ref legal event code: DE Ref document number: 40055736 Country of ref document: HK |
|
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |