CN116049818B - Big data anomaly analysis method and system for digital online service - Google Patents

Big data anomaly analysis method and system for digital online service Download PDF

Info

Publication number
CN116049818B
CN116049818B CN202310139661.1A CN202310139661A CN116049818B CN 116049818 B CN116049818 B CN 116049818B CN 202310139661 A CN202310139661 A CN 202310139661A CN 116049818 B CN116049818 B CN 116049818B
Authority
CN
China
Prior art keywords
user
data
user operation
behavior
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310139661.1A
Other languages
Chinese (zh)
Other versions
CN116049818A (en
Inventor
吕艳娜
韦涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianyi Safety Technology Co Ltd
Original Assignee
Tianyi Safety Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianyi Safety Technology Co Ltd filed Critical Tianyi Safety Technology Co Ltd
Priority to CN202310139661.1A priority Critical patent/CN116049818B/en
Publication of CN116049818A publication Critical patent/CN116049818A/en
Application granted granted Critical
Publication of CN116049818B publication Critical patent/CN116049818B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/554Detecting local intrusion or implementing counter-measures involving event detection and direct action
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3438Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment monitoring of user actions
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a big data anomaly analysis method and a system for digital online business, comprising the following steps: firstly, determining a specified online service scene from a security detection instruction; then, determining key user behaviors in the user behaviors to be analyzed based on the appointed online service scene; then, calling an abnormality analysis model corresponding to the appointed online service scene from a security model database; inputting the key user behaviors into an anomaly analysis model to obtain anomaly identification results corresponding to the key user behaviors; finally judging the abnormal behavior category represented by the abnormal recognition result, and executing a corresponding strategy, so that the design is designed, by firstly determining the key user behavior in the user behavior to be analyzed and combining a pre-trained abnormal analysis model, the abnormal analysis is carried out, and compared with the prior art that the total user behavior data is directly analyzed, the abnormal analysis processing efficiency of the computer equipment on the digital online service is improved.

Description

Big data anomaly analysis method and system for digital online service
Technical Field
The invention relates to the field of big data analysis, in particular to a big data anomaly analysis method and a big data anomaly analysis system for digital online business.
Background
With the development of the internet, various digital online services are closely related to the life of people, such as daily life shopping, online entertainment, online asset transaction and the like. Along with the promotion of the service height digital online process, the corresponding service safety problem is also revealed, in the prior art, a scheme for carrying out safety exception analysis on the behavior of a user on the internet exists, and how to further complete the digital online service safety management and exception analysis in the large data age with explosive growth of data volume is a problem which needs to be solved by a person skilled in the art.
Disclosure of Invention
The invention aims to provide a big data anomaly analysis method and a big data anomaly analysis system for digital online business.
In a first aspect, an embodiment of the present invention provides a method for analyzing big data anomalies for a digitized online service, including:
receiving a security detection instruction aiming at a target digital online service platform, wherein the target digital online service platform maintains a plurality of online service scenes;
under the condition that the security detection instruction passes the verification, determining a specified online service scene from the security detection instruction;
Determining key user behaviors in user behaviors to be analyzed based on a specified online service scene;
calling an abnormality analysis model corresponding to the appointed online service scene from a security model database;
inputting the key user behaviors into an anomaly analysis model to obtain anomaly identification results corresponding to the key user behaviors;
judging abnormal behavior categories represented by the abnormal recognition results, wherein the abnormal behavior categories comprise abnormal behaviors, suspicious abnormal behaviors and abnormal-free behaviors;
generating alarm information and blocking a communication source of the user behavior to be analyzed under the condition that the abnormal behavior type represented by the abnormal recognition result is abnormal behavior; under the condition that the abnormal behavior category represented by the abnormal recognition result is suspicious abnormal behavior, storing the key user behavior into a secondary detection list; and under the condition that the abnormal behavior category represented by the abnormal recognition result is abnormal behavior-free, training the abnormal analysis model by taking the key user behavior as a positive sample.
In a second aspect, an embodiment of the present invention provides a server system, including a server, where the server is used in the method of the first aspect.
Compared with the prior art, the invention has the beneficial effects that: by adopting the big data anomaly analysis method and the big data anomaly analysis system for the digital online service, the specified online service scene is determined from the security detection instruction; then, determining key user behaviors in the user behaviors to be analyzed based on the appointed online service scene; then, calling an abnormality analysis model corresponding to the appointed online service scene from a security model database; inputting the key user behaviors into an anomaly analysis model to obtain anomaly identification results corresponding to the key user behaviors; finally judging the abnormal behavior category represented by the abnormal recognition result, and executing a corresponding strategy, so that the design is designed, by firstly determining the key user behavior in the user behavior to be analyzed and combining a pre-trained abnormal analysis model, the abnormal analysis is carried out, and compared with the prior art that the total user behavior data is directly analyzed, the abnormal analysis processing efficiency of the computer equipment on the digital online service is improved.
Drawings
In order to more clearly illustrate the technical solution of the embodiments of the present invention, the drawings that are required to be used in the embodiments will be briefly described. It is appreciated that the following drawings depict only certain embodiments of the invention and are therefore not to be considered limiting of its scope. Other relevant drawings may be made by those of ordinary skill in the art without undue burden from these drawings.
FIG. 1 is a schematic flow chart of steps of a big data anomaly analysis method for a digital online service according to an embodiment of the present invention;
fig. 2 is a schematic block diagram of a server according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more clear, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.
The following describes specific embodiments of the present invention in detail with reference to the drawings.
In order to solve the foregoing technical problems in the background art, fig. 1 is a schematic flow chart of a big data anomaly analysis method for a digitalized online service according to an embodiment of the disclosure, and the big data anomaly analysis method for the digitalized online service is described in detail below.
S101, receiving a security detection instruction aiming at a target digital online service platform, wherein the target digital online service platform maintains a plurality of online service scenes;
s102, determining a specified online service scene from the security detection instruction under the condition that the security detection instruction passes verification;
s103, determining key user behaviors in user behaviors to be analyzed based on the specified online service scene;
s104, calling an abnormality analysis model corresponding to the specified online service scene from the security model database;
s105, inputting the key user behaviors into an anomaly analysis model to obtain anomaly identification results corresponding to the key user behaviors;
s106, judging abnormal behavior categories represented by the abnormal recognition results, wherein the abnormal behavior categories comprise abnormal behaviors, suspicious abnormal behaviors and abnormal-free behaviors;
s107, generating alarm information and blocking a communication source of user behaviors to be analyzed under the condition that the abnormal behavior category represented by the abnormal recognition result is abnormal behaviors; under the condition that the abnormal behavior category represented by the abnormal recognition result is suspicious abnormal behavior, storing the key user behavior into a secondary detection list; and under the condition that the abnormal behavior category represented by the abnormal recognition result is abnormal behavior-free, training the abnormal analysis model by taking the key user behavior as a positive sample.
In this embodiment, the digital online service platform may be an online shopping platform, an online live broadcast platform, a digital asset online transaction platform, etc., where different platforms maintain different service scenarios, for example, the online shopping platform maintains service scenarios of purchasing goods, returning goods, etc., the online live broadcast platform maintains service scenarios of transmitting barrages, giving away gifts, etc., and the digital asset online transaction platform maintains service scenarios of buying, selling, replacing, etc. digital asset. The security detection instruction may include a user-specified online service scenario. After the online service scene designated by the user is determined, the user behavior to be analyzed in the scene can be determined as the key user behavior aiming at the online service scene, and the key user behavior is identified by calling the abnormal analysis model corresponding to the pre-trained designated online service scene, so that the corresponding identification result is obtained. Generating alarm information and blocking a communication source of the user behavior to be analyzed under the condition that the abnormal behavior type represented by the abnormal recognition result is abnormal behavior; under the condition that the abnormal behavior category represented by the abnormal recognition result is suspicious abnormal behavior, storing the key user behavior into a secondary detection list; and under the condition that the abnormal behavior category represented by the abnormal recognition result is abnormal behavior-free, training the abnormal analysis model by taking the key user behavior as a positive sample so as to realize self-updating of the model. By means of the design, the determined key user behaviors are identified, rather than the overall user behaviors, and the pre-trained anomaly analysis model is combined, so that the anomaly analysis efficiency of the computer equipment for different online service scenes can be improved.
In order to clearly describe the solution provided by the embodiment of the present invention, the foregoing step S103 may be implemented by the following detailed implementation manner.
S201, according to a specified online service scene, user operation identification is carried out on a plurality of historical user behavior nodes contained in the historical user behaviors, and user operation data and corresponding user operation vectors corresponding to the historical user behavior nodes are determined.
S202, respectively carrying out server response identification on a plurality of historical user behavior nodes according to an online service scene, and determining server response data and corresponding server response vectors corresponding to the historical user behavior nodes.
S203, according to the determined user operation data and server response data, historical behavior data pairs corresponding to the historical user behaviors are obtained, wherein each historical behavior data pair comprises a plurality of server response vectors and a plurality of user operation vectors of an online service.
S204, determining key user behaviors from the user behaviors to be analyzed according to the obtained historical behavior data pairs and the undetermined behavior data pairs matched with the user behaviors to be analyzed.
It should be noted that, in the embodiment of the present application, each obtained historical behavior data pair may be used for a service such as repeated data identification, repeated data search, and the like.
In the embodiment of the application, the user operation data and the server response data in the user behaviors are mined, compared with the conventional global user behavior characterization, the user operation vector and the server response vector can accurately conduct the difference between the behavior data, so that the accuracy of searching the behavior data is improved, and meanwhile, compared with the fact that each node in the related technology extracts a feature, the method and the device can accurately extract multiple continuous user behavior information by acquiring the historical behavior data pairs containing the server response vector and the user operation vector, and therefore the behavior data searching effect is better.
In some embodiments, in S201, according to the specified online service scenario, user operation identification is performed on a plurality of historical user behavior nodes included in the historical user behavior, and in determining user operation data corresponding to each of the plurality of historical user behavior nodes, various detection models trained according to the open source data set may be used, but are not limited to.
In some embodiments, each user operation data may be used as input to a pre-trained target user operation vector construction model to obtain a corresponding user operation vector, and specifically, each user operation data corresponding to a plurality of historical user behavior nodes is used as input to a pre-trained target user operation vector construction model to obtain a corresponding user operation vector. The training process for the target user operation vector build model is described below.
Taking a historical user behavior node as a user behavior node 1 as an example, the user behavior node 1 comprises user operation data M, user operation data N and user operation data C, the user operation data M is input into a target user operation vector construction model which is trained in advance to obtain a user operation vector M, the user operation data N is input into the target user operation vector construction model which is trained in advance to obtain a user operation vector N, and the user operation data C is input into the target user operation vector construction model which is trained in advance to obtain a user operation vector C.
In some embodiments, according to the online service scenario, server response identification is performed on the plurality of historical user behavior nodes, and a process of determining server response data corresponding to each of the plurality of historical user behavior nodes is similar to S201.
In some embodiments, the server response data may be input to a pre-trained target server response vector extraction model as input to a corresponding server response vector, and specifically, the server response data corresponding to each of the plurality of historical user behavior nodes is input to the pre-trained target server response vector extraction model as input to a corresponding server response vector. The training process of the target server response vector extraction model is similar to that of the target user operation vector construction model, see in detail below.
Taking a historical user behavior node as the user behavior node 1 as an example, the user behavior node 1 comprises server response data M, server response data N and server response data C, the server response data M is input into a pre-trained target server response vector extraction model to obtain a server response vector M, the server response data N is input into the pre-trained target server response vector extraction model to obtain a server response vector N, and the server response data C is input into the pre-trained target server response vector extraction model to obtain a server response vector C.
In some embodiments, the following steps may be taken, but are not limited to, when 203 is performed:
s2031, determining a matching relationship between each user operation data and each server response data according to the determined each user operation data and each server response data.
Specifically, in the embodiment of the present application, for each historical user behavior node in a plurality of historical user behavior nodes, the following steps are sequentially taken:
taking a historical user behavior node x as an example, the historical user behavior node x is any one of a plurality of reference data nodes, server response data contained in the historical user behavior node x are determined, and association coefficients between the server response data and user operation data contained in the historical user behavior node x are respectively obtained; and executing matching operation on the user operation data with the corresponding association coefficient not smaller than a preset association coefficient threshold value in the server response data and the user operation data.
Taking the user behavior node 1 as an example, assuming that a preset association coefficient threshold value is 60%, for the server response data M, association coefficients between the server response data M and the user operation data M, the user operation data N and the user operation data C are determined, respectively, association coefficients between the server response data M and the user operation data M, between the server response data N and the user operation data C are 100%, 0 and 0, respectively, matching operation is performed on the server response data M and the user operation data M, and similarly, matching operation is performed on the server response data N and the user operation data N.
Through the implementation mode, the server response data and the user operation data are matched through the association coefficient, so that the association accuracy between the server response data and the user operation data is ensured to a certain extent, the accuracy of determining the coherent behavior of the user is improved, and the behavior data duplicate removal effect is further improved.
S2032, obtaining a historical behavior data pair corresponding to the historical user behavior according to the matching relation and the first vector distance between server response vectors matched by the server response data.
Specifically, in the embodiment of the present application, when S2032 is executed, the following steps may be adopted:
S20321, determining first vector distances among server response vectors matched with the server response data, and determining server-related response information corresponding to the historical user behaviors according to the determined first vector distances.
Specifically, when S20321 is executed, for a plurality of historical user behavior nodes, the following steps are sequentially executed according to the user operation flow:
acquiring response data sets of all servers, wherein each server response data set comprises: first server response data in a current historical user behavior node and second server response data in a next historical user behavior node;
determining first vector distances matched by the server response data sets, wherein each first vector distance is used for representing a vector distance between corresponding first server response data and second server response data;
and determining a target server response data set with the corresponding first vector distance not exceeding a first vector distance threshold value from the server response data sets, and obtaining server related response information according to the determined target server response data sets.
In the embodiment of the present application, the first vector distance may be represented by, but is not limited to, a J2 distance, and the J2 distance may also be referred to as a euclidean distance. The smaller the J2 distance, the higher the first vector distance.
In this embodiment of the present application, as a possible implementation manner, each server response data set may be directly obtained, a first vector distance matched with each server response data set is determined, further, a target server response data set with a corresponding first vector distance not exceeding a first vector distance threshold is determined from each server response data set, and server related response information is obtained according to each determined target server response data set.
As another possible implementation manner, for the current historical user behavior node, second server response data associated with each first server response data may be determined respectively for each first server response data in the current historical user behavior node in turn, and further, server related response information is obtained according to the second server response data associated with each first server response data, where each first server response data and each associated second server response data are determined as each target server response data set. And if the number of the second server response data with the distance from the first server response data is not smaller than the first vector distance threshold value is a plurality, selecting the second server response data with the largest distance from the first server response data and being the second server response data associated with the first server response data. The second server response data associated with the first server response data may be considered to be the representation of the first server response data at the next node, where both are two behavior data in the same user's coherent behavior.
For example, it is assumed that M includes a user behavior node 1 and a user behavior node 2 in the history user behavior, and that the user behavior node 1 includes server response data M, server response data N, and server response data C, and that the user behavior node 2 includes server response data D, server response data E, and server response data F.
N for user behavior node 1, the current historical user behavior node is user behavior node 1, the next historical user behavior node is user behavior node 2, for server response data M, each server response data set is obtained, each server response data set comprises server response data set 1 (server response data M, server response data D), server response data set 2 (server response data M, server response data E), server response data set 3 (server response data M, server response data F), then, the vector distance between server response data M and server response data D is determined as a first vector distance 1 corresponding to server response data set 1, the vector distance between server response data M and server response data E is determined as a first vector distance 2 corresponding to server response data set 2, the vector distance between server response data M and server response data F is determined as a first vector distance 3 corresponding to server response data set 3, assuming that the first vector distance threshold is 90%, the first vector distance 1, the first vector distance 2, the first vector distance 3, the server response data M is 90%, the server response data M is determined as a first vector distance 3 corresponding to server response data set 2, and the server response data set 2 is determined to be the first vector distance 1 corresponding to server response data set 1, the server response data M is determined to be the server response data set 2, and the server response data set is determined to be the first vector distance 1.
Similarly, the target server response data set is determined to be the server response data set 5 (server response data N, server response data E) from among the server response data set 4 (server response data N, server response data D), the server response data set 5 (server response data N, server response data E), and the server response data set 6 (server response data N, server response data F) for the server response data N, and the target server response data set is determined to be the server response data set 9 (server response data C, server response data F) from among the server response data set 7 (server response data C, server response data D), the server response data set 8 (server response data C, server response data E), and the server response data set 9 (server response data C, server response data F) for the server response data C.
According to the implementation mode, the relevant response information of the server is determined according to the first vector distance matched with the response data set of each server, so that the accuracy of the relevant response information of the server is improved, the accuracy of determining the coherent behavior of the user is further improved, and the behavior data duplicate removal effect is improved.
S20322, obtaining a historical behavior data pair corresponding to the historical user behavior according to the matching relation and the server-related response information.
In the embodiment of the present application, the matching relationship includes an association relationship between user operation data and server response data in each historical user behavior, that is, an association relationship between user operation data and server response data in each user behavior node. The server-related response information includes an association relationship between server response data and server response data in the sequential node. According to the matching relation and the server-related response information, historical behavior data pairs can be determined, and each historical behavior data pair comprises a plurality of server response vectors and a plurality of user operation vectors of one online service.
For example, the matching relationship includes a user operation vector M and a server response vector M in the user behavior node 1, a user operation vector D and a server response vector D in the user behavior node 2, and a user operation vector G and a server response vector G in the user behavior node 3, the server-related response information includes an association relationship between the server response vector M and the server response vector D, and the server response vector D and the server response vector G, and according to the matching relationship and the server-related response information, historical behavior data pair 1 is obtained, the historical behavior data pair 1 is a user coherent behavior of the business scenario 1 in the user behavior node 1, the user behavior node 2, and the user behavior node 3, the historical behavior data pair 1 includes the user operation vector M and the server response vector M in the user behavior node 1, the user operation vector D and the server response vector D in the user behavior node 2, and the user operation vector G and the server response vector G in the user behavior node 3. Similarly, according to the matching relation and the server-related response information, a historical behavior data pair 2 and a historical behavior data pair 3 are obtained, wherein the historical behavior data pair 2 is the user coherent behavior of the service scene 2 in the user behavior node 1 and the user behavior node 2, and the historical behavior data pair 3 is the user coherent behavior of the service scene 3 in the user behavior node 1 and the user behavior node 2 and the user behavior node 3.
Through the implementation manner, the user coherent behavior contained in the user behavior can be determined through the matching relationship and the server related response information, and the importance of each server response data in the repeated elimination can be improved when the behavior data is repeated through the user coherent behavior in the follow-up, so that the information loss caused by single repeating is avoided, and recall omission is further caused.
In some embodiments, in determining server-related response information, user behavior information that matches each server response data is recorded. Specifically, response data of each server included in a first historical user behavior node in the plurality of historical user behavior nodes can be used as an initial node, and initializing user behavior information matched with each historical behavior data pair can be recorded. The initialization user behavior information comprises the following information: user consistent behavior flags, temporal ordering of occurrences of server response data in user consistent behavior, server location of server response data, server response vector, belonging node flags, belonging user behavior flags. Illustratively, the initialization value of the user's coherent behavior flag, the temporal ordering of the occurrence of server response data in the user's coherent behavior is 1.
And after determining each target server response data set, the MM records corresponding user behavior information according to second server response data contained in the determined target server response data set.
The user behavior information also includes user operation data information, and the user operation data information includes a flag of user operation data associated with the sign behavior data and a corresponding user operation vector. The marking of the user operation data may start with 0 and add 1 for each new increment.
Taking server response data M in the user behavior node 1 as an example, the user behavior information 1 corresponding to the historical behavior data pair 1 further includes a user coherent behavior mark, a time sequence of occurrence of the server response data in the user coherent behavior, a server position of the server response data, a server response vector, a user operation vector, a mark of the user operation data, a mark of the affiliated node, and an affiliated user behavior mark, wherein the user operation vector is the user operation vector M, and the mark of the user operation data is 0.
It should be noted that, in the embodiment of the present application, if a certain server response vector does not have an associated server response vector in the next node, the corresponding user behavior information does not need to be recorded.
For any historical user behavior node, if server response data which is not associated with the server response data of the previous node exists, namely, a new server response exists in the historical user behavior node, the server response data is used as newly added server response data, and the recorded new historical behavior data pair is used for initializing user behavior information.
In some embodiments, if there are multiple historical behavior data pairs, determining second vector distances between multiple user operation vectors corresponding to the multiple historical behavior data pairs, and obtaining service attribution information between the multiple historical behavior data pairs according to the determined second vector distances. The second vector distance may also be represented by a J2 distance. Correspondingly, in the process of determining the key user behavior from the user behaviors to be analyzed according to the obtained historical behavior data pairs and the pending behavior data pairs matched with the user behaviors to be analyzed, the key user behavior can be determined from the obtained historical behavior data pairs, the pending behavior data pairs matched with the user behaviors to be analyzed, the service attribution information between the historical behavior data pairs and the service attribution information between the pending behavior data pairs.
Specifically, when determining the second vector distances between the plurality of user operation vectors corresponding to the plurality of historical behavior data pairs, determining the second vector distances between the user operation vectors corresponding to the plurality of historical behavior data pairs for any two historical behavior data pairs, correspondingly, when obtaining service attribution information between the plurality of historical behavior data pairs according to the determined second vector distances, and obtaining the service attribution information between any two historical behavior data pairs according to the determined second vector distances.
When service attribution information among a plurality of historical behavior data pairs is obtained according to the determined second vector distance, the following steps are executed for any two historical behavior data pairs contained in the plurality of historical behavior data pairs:
determining a plurality of user operation vectors contained in one historical behavior data pair, respectively carrying out second vector distances between the plurality of user operation vectors contained in the other historical behavior data pair, and determining a target second vector distance with the distance not exceeding a preset second vector distance threshold value from the determined second vector distances; when any two historical behavior data pairs are determined to belong to the same online service according to the determined second vector distance of each target, the association relationship between any two historical behavior data pairs is recorded in service attribution information.
And when the number of the second vector distances of the targets is larger than a preset number threshold value, determining that any two historical behavior data pairs belong to the same online service. As another possible implementation manner, when the ratio of the number of the second vector distances of each target to the total number of the user operation data is not smaller than the preset ratio threshold, determining that any two pairs of historical behavior data belong to the same online service, wherein the total number of the user operation data is as follows: the number of user operation vectors contained in one historical behavior data pair, or the number of user operation vectors contained in the other historical behavior data pair, or the minimum of the number of user operation vectors contained in both historical behavior data pairs.
Taking the historical behavior data pair 1 and the historical behavior data pair 2 as examples, the historical behavior data pair 1 comprises a user operation vector M, a user operation vector D and a user operation vector G of the service scene 1, the historical behavior data pair 2 comprises a user operation vector N and a user operation vector F of the service scene 1, second vector distances between the user operation vector M and the user operation vector N, between the user operation vector M and the user operation vector F, between the user operation vector D and the user operation vector F, between the user operation vector G and the user operation vector N and between the user operation vector G and the user operation vector F are respectively determined, then, from the determined second vector distances, target second vector distances with the distance not exceeding a preset second vector distance threshold value are determined, the preset ratio threshold value is 0.5, the second vector distance threshold value is 0.2, the number of target second vector distances is 2, the total number of user operation data is 2, the ratio of the number of each target second vector distances to the total number of user operation data is 1, the ratio of the number of the target second vector distances to the total number of the user operation data is not smaller than the preset second vector distance threshold value is 0.5, and the historical behavior data pair 1 is determined to exist.
It should be noted that if there is a correlation between two pairs of historical behavior data, the user operation marks in the two pairs of historical behavior data may be marked as the same mark, and the same mark may be any one of the corresponding user operation marks in the two pairs of historical behavior data, or may be the user operation mark with the smallest value in the corresponding user operation marks in the two pairs of historical behavior data.
According to the implementation mode, under the condition that a plurality of historical behavior data pairs exist, service attribution information among the plurality of historical behavior data pairs is obtained according to the determined second vector distances, and when behavior data is repeated according to the historical behavior data pairs, relevant user coherent behaviors can be quickly searched according to the service attribution information, so that the behavior data searching efficiency is improved. Furthermore, the matching operation can be performed on the historical behavior data pair belonging to the same service scene, so that the subsequent behavior data searching efficiency is improved.
In some embodiments, in order to avoid that user operation data with poor quality of behavior data affects a behavior data search result, after each historical behavior data pair is obtained, for user operation data corresponding to a plurality of user operation vectors included in each historical behavior data pair, a set number of user operation data may be selected from the plurality of user operation vectors according to a quality matching coefficient of behavior data matched by the plurality of user operation vectors, and a plurality of user operation vectors included in each historical behavior data pair may be updated according to the user operation vector corresponding to the selected user operation data. In some embodiments, in order to improve the accuracy M of the repeated data, the following steps may be adopted, but are not limited to, when executing S204:
S2041, according to each historical behavior data pair and according to each pending behavior data pair matched with the behavior of the user to be analyzed, obtaining the relative vector distance between each pending behavior data pair and each historical behavior data pair.
Specifically, when S2041 is performed, the following steps may be adopted, but are not limited to:
s20411, determining a server response vector distance between each pending behavior data pair and each historical behavior data pair according to the server response vector corresponding to each historical behavior data pair and the server response vector corresponding to each pending behavior data pair.
Taking a historical behavior data pair M and a pending behavior data pair N as examples, the historical behavior data pair M is any one historical behavior data pair in each historical behavior data pair, and the pending behavior data pair N is any one pending behavior data pair in each pending behavior data pair.
It is assumed that the historical behavior data pair M includes a server response vector M1, server response vectors M2 and … …, and a server response vector MN, and the pending behavior data pair N includes a server response vector N1, server response vectors N2 and … …, and a server response vector NM, and server response vector distances between the server response vector M1, the server response vectors M2 and … …, and the MN and the server response vector N1, and the server response vectors N2, … …, and NM, respectively, are determined.
S20412, determining the vector distance between each pending behavior data pair and each historical behavior data pair according to the corresponding user operation vector of each historical behavior data pair and the corresponding user operation vector of each pending behavior data pair.
It is assumed that the historical behavior data pair M includes a user operation vector M1, user operation vectors M2 and … …, and a user operation vector MN, and the pending behavior data pair N includes a user operation vector N1, user operation vectors N2 and … …, and a user operation vector NM, and server response vector distances between the user operation vector M1, the user operation vectors M2 and … …, and the user operation vector MN and the user operation vector N1, the user operation vectors N2 and … …, and the user operation vector NM are respectively determined.
S20413, according to the obtained response vector distance of each server and the obtained operation vector distance of each user, obtaining the relative vector distance between each undetermined behavior data pair and each historical behavior data pair.
In the embodiment of the application, the relative vector distance between the historical behavior data pair M and the pending behavior data pair N is determined by at least one of the following information: the number of server response vectors that are similar in the historical behavior data pair M and the pending behavior data pair N, and the number of user operation vectors that are similar in the historical behavior data pair M and the pending behavior data pair N.
When the distance between the server response vector in the historical behavior data pair M and the server response vector in the undetermined behavior data pair N is larger than a preset server response vector distance threshold, the server response vector in the historical behavior data pair M and the server response vector in the undetermined behavior data pair N are similar server response vectors. Illustratively, the preset server response vector distance threshold is 0.2.
Similarly, when the distance between the user operation vector in the historical behavior data pair M and the user operation vector in the undetermined behavior data pair N is greater than the preset user operation vector distance threshold, the user operation vector in the historical behavior data pair M and the server response vector in the undetermined behavior data pair N are similar user operation vectors.
According to the implementation mode, the relative vector distance between each undetermined behavior data pair and each historical behavior data pair is obtained according to the response vector distance of each server and the operation vector distance of each user, and therefore the relative vector distance comprises the response vector distance of the server and the operation vector distance of the user, and is more accurate than the relative vector distance.
S2042, according to the obtained relative vector distances, determining the user behavior vector distance between each user behavior to be analyzed and the historical user behaviors.
In this embodiment, when the number of similar server response vectors in the pair of historical behavior data M and the pair of pending behavior data N exceeds a first threshold, and/or the number of similar user operation vectors in the pair of historical behavior data M and the pair of pending behavior data N exceeds a second threshold, it is determined that the pair of historical behavior data M and the pair of pending behavior data N are the same user behavior.
Wherein the first threshold is determined according to a first preset parameter value and a first number of server response vectors, wherein the first number of server response vectors may be one of the following information: the number of server response vectors included in the historical behavior data pair M, the number of server response vectors included in the pending behavior data pair N, the minimum of the numbers of server response vectors included in the historical behavior data pair M and the pending behavior data pair N, but not limited thereto.
The second threshold is determined in accordance with a second preset parameter value and a second number of user operational vectors, wherein the second number of user operational vectors may be one of the following information: the number of user operation vectors included in the history behavior data pair M, the number of user operation vectors included in the pending behavior data pair N, and the minimum of the number of user operation vectors included in the history behavior data pair M and the pending behavior data pair N are not limited thereto.
In the embodiment of the application, taking one user behavior to be analyzed and historical user behaviors as an example, according to the obtained pair vector distances, each undetermined behavior data pair in the user behaviors to be analyzed and the same user behaviors in each historical behavior data pair can be determined, and according to the number of the same user behaviors, the user behavior vector distance between the user behaviors to be analyzed and the historical user behaviors is determined. User behavior vector distances may also be referred to herein as user behavior similarities.
Illustratively, the value of the user behavior vector distance between the user behavior to be analyzed and the historical user behavior is a ratio of the number of the same user behaviors contained in the user behavior to be analyzed and the historical user behavior to the number of the third user behaviors, wherein the number of the third user behaviors can be one of the following information: the number of pending behavior data pairs included in the user behavior to be analyzed, the number of historical behavior data pairs included in the historical user behavior, or the minimum value of the two, but is not limited thereto.
S2043, determining the key user behaviors meeting the search conditions according to the user behavior vector distance between each user behavior to be analyzed and the historical user behaviors.
Specifically, in the embodiment of the present application, when S2043 is executed, the following manner may be adopted, but is not limited to:
as a possible implementation manner, according to the user behavior vector distance between each user behavior to be analyzed and the historical user behavior, determining the key user behaviors meeting the search condition, sorting the user behaviors to be analyzed, and determining a certain number of key user behaviors from the user behaviors to be analyzed.
As another possible implementation manner, to-be-analyzed user behaviors, of which the user behavior vector distance from the historical user behaviors exceeds a preset user behavior vector distance threshold, are determined, and the determined to-be-analyzed user behaviors are used as key user behaviors.
It should be noted that, in the embodiment of the present application, when the historical behavior data pair M and the pending behavior data pair N are the same user behavior, the user behavior to be analyzed corresponding to the historical behavior data pair M may be used as a key user behavior, and the historical behavior data pair M, the user behavior vector distance between the user behavior to be analyzed corresponding to the historical behavior data pair M and the historical user behavior, and the user behavior mark of the user behavior to be analyzed corresponding to the historical behavior data pair M may be obtained.
It should be noted that in the embodiment of the present application, the number of the key user actions may be one or more, and the number of the same user actions may also be one or more.
Next, a training process of the user operation vector construction model will be described.
The pre-trained user operation vector build model is referred to herein as the original user operation vector build model, and the pre-trained user operation vector build model is referred to herein as the target user operation vector build model.
The training process for modeling the original user operational vector includes two phases: the system comprises a data acquisition stage and a model training stage, wherein the data acquisition stage is used for acquiring a sample user behavior data set, and the model training stage is used for training an original user operation vector construction model according to the sample user behavior data set.
In the data acquisition stage, firstly, according to a first reference vector distance between training data in a training user behavior data set, obtaining each sample user behavior data set, and secondly, according to each sample user behavior data set, constructing a sample user behavior data set, wherein each sample user behavior data set comprises at least three samples, and the sample user behavior data set corresponding to one sample in the at least three samples is different from the sample user behavior data set corresponding to the current comparison behavior data.
Wherein each sample user behavior data set contains at least two training data, and each training data in the sample user behavior data set is the current sample user behavior data. The current sample user behavior data may also be referred to herein as a positive sample pair.
The process of obtaining the sample user behavior data sets according to the first reference vector distance between the training data in the training user behavior data sets may be, but is not limited to, the following ways:
mode 1: user operation identification is carried out on each training data in the training user behavior data group to obtain user operation data matched with each training data, then labeling is carried out on each two user operation data in each obtained user operation data, whether each two user operation data are current sample user behavior data is determined, and each group of sample user behavior data sets are obtained according to labeling results.
Mode 2: in order to improve labeling efficiency and data preparation efficiency, a flow of a method for obtaining a user behavior data set of each sample is provided in the embodiments of the present application, and the method includes the following flows:
s1101, carrying out user operation identification on each training data in the training user behavior data group to obtain user operation data matched with each training data. The training user behavior data set may contain one or more user behaviors or may contain one or more behavior data.
S1102, according to the determined user operation data images, user operation vectors corresponding to the user operation data are obtained.
Specifically, when S1102 is executed, the obtained user operation data is input into the pre-trained user operation model, and the user operation vector matched with the user operation data is obtained.
S1103, executing aggregation operation on the obtained user operation vectors to obtain each aggregation category.
For example, if the set number is 40 and the total number of user operation data included in each training data is 400, the aggregation category number=400/80=5, that is, the aggregation operation is performed on each obtained user operation vector, so as to obtain 5 aggregation categories.
For example, if the set number is 30 and the total number of user operation data included in each training data is 180, the aggregation category number= [180/30] =6, that is, the obtained user operation vectors are subjected to the aggregation operation, and 6 aggregation categories are obtained.
S1104, determining user operation vectors meeting the preset aggregation category conditions from each aggregation category according to the distance between the user operation vectors in each aggregation category, and obtaining each sample user behavior data set according to the user operation data corresponding to the determined user operation vectors.
Specifically, when S1104 is executed, for each user operation vector in each aggregation category, a matching coefficient corresponding to the user operation vector is obtained according to a distance between the user operation vector and other user operation vectors except for the user operation vector in the same aggregation category, and then, according to the matching coefficient matched by each user operation vector in each aggregation category, a user operation vector meeting a preset aggregation category condition is determined from each aggregation category.
Taking the user operation vector Wi in the aggregation category W as an example, the aggregation category W is any one of aggregation categories, each user operation vector is contained in the aggregation category W, and the user operation vector Wi is any one of the user operation vectors contained in the aggregation category W.
In some embodiments, the following steps may be employed, but are not limited to, determining the matching coefficients corresponding to the user operation vector Wi:
step Y1, determining the distance between the user operation vector Wi and other user operation vectors except Wi in the aggregation category W. The distance between the user operation vector Wi and other user operation vectors may be represented by, but is not limited to, a J2 distance.
For example, the aggregate class M includes the user operation vectors 1 to 20, and for the user operation vector 1, a distance J1 between the user operation vector 1 and the user operation vector 2 is determined, a distance J2 between the user operation vector 1 and the user operation vector 3 is determined, and similarly, distances J1 to J19 between the user operation vector 1 and the user operation vector 2-the user operation vector 20 are determined.
And Y2, determining the user operation vectors with set numbers from other user operation vectors according to the determined distance.
Specifically, according to the determined distance, from other user operation vectors, the user operation vector with the smallest distance is determined according to the set number.
For example, the distances between the user operation vector 1 and the user operation vector 2-user operation vector 20 are J1 to J19, respectively, and the values of J1 to J19 are assumed to be J19, J18, … … J1 in order from small to large, and the set number is assumed to be 5, and J19, J18, J17, J16, J15 are determined from the other user operation vectors.
And Y3, determining a matching coefficient of the user operation vector Wi according to the determined distance between each user operation vector and the user operation vector Wi.
Specifically, the average value of the distances corresponding to the determined user operation vectors is used as the matching coefficient of the user operation vector Wi.
For example, assume that the average value of J19, J18, J17, J16, J15 is taken as the matching coefficient of the user operation vector 1.
According to the matching coefficients of the matching of the user operation vectors in each aggregation class, in the process of determining the user operation vectors meeting the preset aggregation class conditions from each aggregation class, determining the user operation vectors with the corresponding matching coefficients smaller than the lower limit of the matching coefficients from each user operation vector in each aggregation class, and taking the determined user operation vectors as the user operation vectors meeting the preset aggregation class conditions.
In this embodiment, if the matching coefficient of the user operation vector Wi is smaller than the lower limit of the matching coefficient, the user operation vector Wi is reserved, that is, if the matching coefficient of the user operation vector Wi is smaller than the lower limit of the matching coefficient, the user operation vector Wi is the user operation vector meeting the preset aggregation category condition. If the matching coefficient of the user operation vector Wi is not less than the matching coefficient lower limit, the user operation vector Wi is deleted from the aggregation class W.
For example, assuming that the lower limit of the matching coefficient is 10 and the matching coefficient of the user operation vector 1 is greater than the lower limit of the matching coefficient, the user operation vector 1 is deleted from the aggregation class 1.
Similarly, for the aggregate class 1, the matching coefficients of the user operation vectors 1 to 20 are respectively determined, and if the lower limit of the matching coefficients is 10 and the matching coefficients of the user operation vectors 1 to 20 are respectively 1 to 20, the user operation vectors 10 to 20 are deleted from the aggregate class 1, and at this time, the aggregate class 1 contains the user operation vectors 1 to 9.
For each aggregation category, the aggregation operation cleaning can be performed through steps Y1-Y4, so that a clean aggregation category is obtained. Each clean aggregate category is a sample user behavior data set, and the behavior data contained in the group are similar samples to each other.
It should be noted that, in the embodiment of the present application, if the number of user operation vectors included in one aggregation category is smaller than the set aggregation category number, the aggregation category may be discarded. For example, the aggregation class data is set to 5, and when the number of user operation vectors included in one aggregation class is smaller than 5, the aggregation class is discarded.
Assuming that the sample user behavior data set is N groups, since the online user behavior resources are rich, a huge amount of N groups of data, such as millions of groups, can be collected.
As a possible implementation manner, in the process of constructing a sample user behavior data set according to each sample user behavior data set, the embodiment of the present application provides a process of constructing a sample user behavior data set, which may refer to the following steps:
s1401, acquiring corresponding current sample user behavior data from each acquired sample user behavior data set.
Model learning is carried out on all N sample user behavior data sets, each T sample user behavior data sets are subjected to model learning once, the T sample user behavior data sets are processed in one batch, and two training data can be randomly extracted from each group in the T sample user behavior data sets to serve as similar behavior data pairs.
For example, the sample user behavior data set includes: sample user behavior data set 1, sample user behavior data set 2, … …, sample user behavior data set T, wherein sample user behavior data set 1 contains training data 1k,1p, 1g, etc., training data 1k,1p, 1g are all user operation data of the same user, sample user behavior data set 2 contains training data 2k,2p, 2g, etc., sample user behavior data set T contains training data Tk, tp, tg, etc., two training data 1k,1p are extracted from sample user behavior data set 1 to obtain current sample user behavior data 1 (1 k,1 p), two training data 2k,2p are extracted from sample user behavior data set 2 to obtain current sample user behavior data 2 (2 k,2 p), and similarly, two training data Tk, tp are extracted from sample user behavior data set T to obtain current sample user behavior data T (Tk, tp).
S1402, for each current sample user behavior data in the current sample user behavior data, sequentially taking the following steps:
firstly, taking one sample contained in current sample user behavior data as current reference behavior data, respectively obtaining corresponding current comparison behavior data from other current sample user behavior data, and determining second reference vector distances between the obtained current comparison behavior data and the current reference behavior data;
and secondly, determining target training data from the current comparison behavior data according to the determined second reference vector distance, and obtaining sample user behavior data according to the target training data and current sample user behavior data.
In the embodiment of the present application, when a corresponding one of the other current sample user behavior data is obtained, one sample may be randomly selected from each pair of other current sample user behavior data. The second reference vector distance may employ, but is not limited to, a J2 distance.
Taking the current sample user behavior data 1 (1 k,1 p) as an example, training data 1k in the current sample user behavior data 1 (1 k,1 p) is taken as current reference behavior data, training data 2k, … … are obtained from the current sample user behavior data 2 (2 k,2 p), training data Tk are obtained from the current sample user behavior data T (Tk, tp), and the current reference behavior data 1k is determined and respectively separated from the training data 2k, the training data 3p, … … and a second reference vector between the training data Tk.
According to the determined second reference vector distance, in the process of determining the target training data from the current comparison behavior data, the following modes can be adopted, but are not limited to:
target training data determination mode: and sequencing the current comparison behavior data according to the determined value of the second reference vector distance in order from small to large, and sequentially selecting target training data with the preset target training data number from the current comparison behavior data according to the sequencing result in order from small to large.
And in the process of obtaining the sample user behavior data according to the target training data and the current sample user behavior data, combining the current sample user behavior data with the target training data respectively to obtain the sample user behavior data.
For example, assuming that the target training data is sample 7k, sample 8p, … …, and sample 27k for the current reference behavior data 1k, then the current sample user behavior data 1 (1 k,1 p) is combined with sample 7k, sample 8p, … …, and sample 27k, respectively, to obtain each sample user behavior data including: (1 k,1p,7 k), (1 k,1p,8 p), … …, (1 k,1p,27 k).
In the model training stage, a possible user operation vector construction model provided in the embodiment of the present application includes a convolutional neural network and an embedded layer, where the convolutional neural network is used to extract basic features, and the convolutional neural network may employ a feature extraction module, and the embedded layer is used to output user behavior data representation.
In the embodiment of the present application, a manner of constructing a sample user behavior data set is provided, and the following steps may be specifically referred to:
s1801, according to each sample user behavior data set contained in the training user behavior data set and preset processing turns, a sample user behavior data set matched with each processing turn is constructed, each sample user behavior data set contains at least three samples, and a sample user behavior data set corresponding to one sample in the at least three samples is different from a sample user behavior data set corresponding to the current comparison behavior data.
It should be noted that, in the embodiment of the present application, after each sample user behavior data set is obtained in the data acquisition stage, a sample user behavior data set matched with each processing round may be obtained according to each sample user behavior data set in the data acquisition stage, or a sample user behavior data set matched with each processing round may be obtained according to each sample user behavior data set in the model training stage, and the method for obtaining the sample user behavior data set is described in S1401-S1402.
S1802, inputting the generated user behavior data sets of each sample into an original user operation vector construction model in turn to obtain corresponding model loss parameters, adjusting the original user operation vector construction model according to the obtained model loss parameters, and outputting a target user operation vector construction model when the preset convergence condition is determined to be met.
The present application will be described with reference to specific examples.
Examples: repeated data for the login.
The historical user behavior is user behavior 2, and the user behavior 2 comprises user behavior nodes 1 and 2, and according to the specified online service scene: user login, namely user operation identification is carried out on the user behavior node 1 and the user behavior node 2 respectively, so that user operation data and corresponding user operation vectors corresponding to the user behavior node 1 and the user behavior node 2 are obtained, wherein the user behavior node 1 comprises two pieces of user operation data, and the user behavior node 2 comprises two pieces of user operation data.
According to the above-mentioned online service scenario, server response identification is performed on the user behavior node 1 and the user behavior node 2, and each server response data and corresponding server response vector corresponding to each of the user behavior node 1 and the user behavior node 2 are determined, where the user behavior node 1 includes two server response data, and the user behavior node 2 includes two server response data.
According to the determined user operation data and the determined server response data, two historical behavior data pairs corresponding to the user behavior 2 are obtained, wherein the historical behavior data pairs 1 comprise two server response vectors and two user operation vectors of the login operation 1, and the historical behavior data pairs 1 comprise two server response vectors and two user operation vectors of the login operation 2.
And determining the key user behaviors matched with the historical user behaviors from the user behaviors to be analyzed according to the obtained historical behavior data pair 1 and the historical behavior data pair 2 of the user behaviors 2 and the corresponding undetermined behavior data pairs matched with the historical user behaviors.
The embodiment of the invention provides a server 100, wherein the server 100 comprises a processor and a nonvolatile memory storing computer instructions, and when the computer instructions are executed by the processor, the server 100 executes the big data anomaly analysis method for the digital online service. As shown in fig. 2, fig. 2 is a block diagram of a server 100 according to an embodiment of the present invention. The server 100 includes a memory 111, a processor 112, and a communication unit 113. For data transmission or interaction, the memory 111, the processor 112 and the communication unit 113 are electrically connected to each other directly or indirectly. For example, the elements may be electrically connected to each other via one or more communication buses or signal lines.
The foregoing description, for purpose of explanation, has been presented with reference to particular embodiments. However, the illustrative discussions above are not intended to be exhaustive or to limit the disclosure to the precise forms disclosed. Many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the disclosure and its practical application, to thereby enable others skilled in the art to best utilize the disclosure and various embodiments with various modifications as are suited to the particular use contemplated. The foregoing description, for purpose of explanation, has been presented with reference to particular embodiments. However, the illustrative discussions above are not intended to be exhaustive or to limit the disclosure to the precise forms disclosed. Many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the disclosure and its practical application, to thereby enable others skilled in the art to best utilize the disclosure and various embodiments with various modifications as are suited to the particular use contemplated.

Claims (9)

1. The big data anomaly analysis method for the digital online service is characterized by comprising the following steps:
receiving a security detection instruction aiming at a target digital online service platform, wherein the target digital online service platform maintains a plurality of online service scenes;
Under the condition that the security detection instruction passes verification, determining a specified online service scene from the security detection instruction;
determining key user behaviors in the user behaviors to be analyzed based on the specified online service scene;
calling an abnormality analysis model corresponding to the appointed online service scene from a security model database;
inputting the key user behaviors into the anomaly analysis model to obtain anomaly identification results corresponding to the key user behaviors;
judging the abnormal behavior category represented by the abnormal recognition result, wherein the abnormal behavior category comprises abnormal behaviors, suspicious abnormal behaviors and abnormal-free behaviors;
generating alarm information and blocking a communication source of the user behavior to be analyzed under the condition that the abnormal behavior type represented by the abnormal recognition result is the abnormal behavior; under the condition that the abnormal behavior category represented by the abnormal recognition result is the suspicious abnormal behavior, storing the key user behavior into a secondary detection list; training the abnormal analysis model by taking the key user behaviors as positive samples under the condition that the abnormal behavior category represented by the abnormal recognition result is the abnormal behavior;
The determining the key user behavior in the user behaviors to be analyzed based on the specified online service scene comprises the following steps:
according to the appointed online service scene, user operation identification is carried out on a plurality of historical user behavior nodes contained in the historical user behaviors respectively, and user operation data and corresponding user operation vectors corresponding to the historical user behavior nodes are determined;
according to the online service scene, server response identification is carried out on the plurality of historical user behavior nodes respectively, and server response data and corresponding server response vectors corresponding to the plurality of historical user behavior nodes are determined;
for each historical user behavior node of the plurality of historical user behavior nodes, the following steps are sequentially taken:
determining the association coefficients between server response data contained in one historical user behavior node and user operation data contained in the one historical user behavior node respectively;
matching the user operation data with the corresponding association coefficient not smaller than a preset association coefficient threshold value in the server response data and the user operation data with the server response data;
Aiming at the plurality of historical user behavior nodes, the following steps are sequentially executed according to the user operation flow:
acquiring response data sets of all servers, wherein each server response data set comprises: first server response data in a current historical user behavior node and second server response data in a next historical user behavior node;
determining first vector distances matched by the server response data sets, wherein each first vector distance is used for representing a vector distance between corresponding first server response data and second server response data;
determining a target server response data set with the corresponding first vector distance not exceeding a first vector distance threshold value from the server response data sets, and obtaining server related response information according to the determined target server response data sets;
according to the determined user operation data and the server response data, determining a matching relation between the user operation data and the server response data;
according to the matching relation and the server related response information, historical behavior data pairs corresponding to the historical user behaviors are obtained, and each historical behavior data pair comprises a plurality of server response vectors and a plurality of user operation vectors of an online service;
Determining a server response vector distance between each pair of pending behavior data and each pair of historical behavior data according to the server response vector corresponding to each pair of historical behavior data and the server response vector corresponding to each pair of pending behavior data corresponding to each user behavior to be analyzed;
determining a user operation vector distance between each pending behavior data pair and each historical behavior data pair according to the user operation vector corresponding to each historical behavior data pair and the user operation vector corresponding to each pending behavior data pair;
according to the obtained response vector distance of each server and the obtained operation vector distance of each user, obtaining the relative vector distance between each undetermined behavior data pair and each historical behavior data pair;
determining a user behavior vector distance between each user behavior to be analyzed and the historical user behavior according to each obtained specific vector distance;
and determining the user behaviors to be analyzed, of which the user behavior vector distances between the user behaviors and the historical user behaviors exceed a preset user behavior vector distance threshold, from the user behaviors to be analyzed, and taking the determined user behaviors to be analyzed as key user behaviors.
2. The method of claim 1, wherein after obtaining the pair of historical behavior data corresponding to the historical user behavior, further comprising:
if a plurality of historical behavior data pairs exist, determining second vector distances among a plurality of user operation vectors corresponding to the plurality of historical behavior data pairs;
for any two historical behavior data pairs contained in the plurality of historical behavior data pairs, executing the following steps:
determining a plurality of user operation vectors contained in one historical behavior data pair, respectively carrying out second vector distances between the plurality of user operation vectors contained in the other historical behavior data pair, and determining a target second vector distance with the distance not exceeding a preset second vector distance threshold value from the determined second vector distances;
when the fact that any two historical behavior data pairs belong to the same online service is determined according to the determined second vector distance of each target, the association relationship between the any two historical behavior data pairs is recorded in service attribution information.
3. The method of claim 1, wherein determining the user operation vector for each of the plurality of historical user behavior nodes comprises:
And respectively inputting the user operation data corresponding to the plurality of historical user behavior nodes into a pre-trained target user operation vector construction model to obtain corresponding user operation vectors, wherein the target user operation vector construction model is trained and obtained by the following modes:
according to each sample user behavior data set contained in the training user behavior data set and preset processing rounds, constructing sample user behavior data sets matched with each processing round, wherein each sample user behavior data set contains at least three samples, and the sample user behavior data set corresponding to one sample in the at least three samples is different from the sample user behavior data set corresponding to the current comparison behavior data;
and inputting the generated user behavior data sets of each sample into an original user operation vector construction model in turn to obtain corresponding model loss parameters, adjusting the original user operation vector construction model according to the obtained model loss parameters, and outputting a target user operation vector construction model when the preset convergence condition is determined to be met.
4. The method of claim 3, wherein constructing a sample user behavior data set for each process round match based on each sample user behavior data set included in the training user behavior data set and the preset process round, comprises:
Acquiring current sample user behavior data matched with each processing round from each sample user behavior data set contained in the training user behavior data set according to the preset processing round;
for each current sample user behavior data, the following steps are sequentially taken:
taking one sample contained in the current sample user behavior data as current reference behavior data, respectively obtaining corresponding current comparison behavior data from other current sample user behavior data, and determining second reference vector distances between each obtained current comparison behavior data and the current reference behavior data;
and determining target training data from the current comparison behavior data according to the determined second reference vector distance, and obtaining sample user behavior data according to the target training data and the current sample user behavior data.
5. A method according to claim 3, wherein the sample user behavior data set is obtained by:
user operation identification is carried out on each training data in the training user behavior data group, and user operation data matched with each training data and corresponding user operation vectors are obtained;
Executing aggregation operation on the obtained user operation vectors to obtain aggregation categories;
and determining the user operation vectors meeting the preset aggregation category conditions from the aggregation categories according to the distances among the user operation vectors in the aggregation categories, and obtaining the user behavior data sets of the samples according to the user operation data corresponding to the determined user operation vectors.
6. The method of claim 5, wherein determining the user operation vectors satisfying the preset aggregation category condition from the respective aggregation categories according to the distances between the user operation vectors in the respective aggregation categories, respectively, comprises:
determining matching coefficients for matching the user operation vectors in each aggregation class according to the distance between the user operation vectors in each aggregation class;
and determining the user operation vectors meeting the preset aggregation category conditions from the aggregation categories according to the matching coefficients matched with the user operation vectors in the aggregation categories.
7. The method of claim 6, wherein the determining the matching coefficients for each user operation vector match in each aggregate category based on the distance between each user operation vector in each aggregate category comprises:
For each user operation vector in the respective aggregate category, performing the steps of:
determining the distance between one user operation vector in one aggregation class and other user operation vectors except the one user operation vector in the one aggregation class, and determining the user operation vectors with set number from the other user operation vectors according to the determined distance;
and determining the matching coefficient of the user operation vector according to the determined distance between each user operation vector and the user operation vector.
8. The method of claim 7, wherein the determining, from each aggregation category, the user operation vector satisfying the preset aggregation category condition according to the matching coefficient of each user operation vector match in each aggregation category, includes:
determining the lower limit of the matching coefficient matched with each aggregation class according to the matching coefficient matched with each user operation vector in each aggregation class;
and determining a user operation vector with a corresponding matching coefficient smaller than the lower limit of the matching coefficient from the user operation vectors in each aggregation category respectively, and taking the determined user operation vector as the user operation vector meeting the preset aggregation category condition.
9. A server system comprising a server for performing the method of any of claims 1-8.
CN202310139661.1A 2023-02-21 2023-02-21 Big data anomaly analysis method and system for digital online service Active CN116049818B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310139661.1A CN116049818B (en) 2023-02-21 2023-02-21 Big data anomaly analysis method and system for digital online service

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310139661.1A CN116049818B (en) 2023-02-21 2023-02-21 Big data anomaly analysis method and system for digital online service

Publications (2)

Publication Number Publication Date
CN116049818A CN116049818A (en) 2023-05-02
CN116049818B true CN116049818B (en) 2024-03-01

Family

ID=86125531

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310139661.1A Active CN116049818B (en) 2023-02-21 2023-02-21 Big data anomaly analysis method and system for digital online service

Country Status (1)

Country Link
CN (1) CN116049818B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108229963A (en) * 2016-12-12 2018-06-29 阿里巴巴集团控股有限公司 The Risk Identification Method and device of user's operation behavior
KR20190046018A (en) * 2017-10-25 2019-05-07 한국전자통신연구원 Method of detecting abnormal behavior on the network and apparatus using the same
CN110517097A (en) * 2019-09-09 2019-11-29 平安普惠企业管理有限公司 Identify method, apparatus, equipment and the storage medium of abnormal user
CN111047423A (en) * 2019-11-01 2020-04-21 支付宝(杭州)信息技术有限公司 Risk determination method and device and electronic equipment
WO2020135392A1 (en) * 2018-12-24 2020-07-02 杭州海康威视数字技术股份有限公司 Method and device for detecting abnormal behavior
CN114925273A (en) * 2022-05-23 2022-08-19 天津众群科技有限公司 User behavior prediction method based on big data analysis and AI prediction analysis system
CN115391670A (en) * 2022-11-01 2022-11-25 南京嘉安网络技术有限公司 Knowledge graph-based internet behavior analysis method and system

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108229963A (en) * 2016-12-12 2018-06-29 阿里巴巴集团控股有限公司 The Risk Identification Method and device of user's operation behavior
KR20190046018A (en) * 2017-10-25 2019-05-07 한국전자통신연구원 Method of detecting abnormal behavior on the network and apparatus using the same
WO2020135392A1 (en) * 2018-12-24 2020-07-02 杭州海康威视数字技术股份有限公司 Method and device for detecting abnormal behavior
CN110517097A (en) * 2019-09-09 2019-11-29 平安普惠企业管理有限公司 Identify method, apparatus, equipment and the storage medium of abnormal user
CN111047423A (en) * 2019-11-01 2020-04-21 支付宝(杭州)信息技术有限公司 Risk determination method and device and electronic equipment
CN114925273A (en) * 2022-05-23 2022-08-19 天津众群科技有限公司 User behavior prediction method based on big data analysis and AI prediction analysis system
CN115391670A (en) * 2022-11-01 2022-11-25 南京嘉安网络技术有限公司 Knowledge graph-based internet behavior analysis method and system

Also Published As

Publication number Publication date
CN116049818A (en) 2023-05-02

Similar Documents

Publication Publication Date Title
CN111177792B (en) Method and device for determining target business model based on privacy protection
CN110166827B (en) Video clip determination method and device, storage medium and electronic device
CN108108821A (en) Model training method and device
CN111931062A (en) Training method and related device of information recommendation model
CN108595494A (en) The acquisition methods and device of reply message
CN111506820B (en) Recommendation model, recommendation method, recommendation device, recommendation equipment and recommendation storage medium
CN112966014A (en) Method and device for searching target object
CN115687760A (en) User learning interest label prediction method based on graph neural network
CN113641811B (en) Session recommendation method, system, equipment and storage medium for promoting purchasing behavior
CN112925899B (en) Ordering model establishment method, case clue recommendation method, device and medium
CN114581954A (en) Cross-domain retrieval and target tracking method based on pedestrian features
CN116049818B (en) Big data anomaly analysis method and system for digital online service
CN111833115B (en) Operation identification method and device, storage medium and server
CN111294650A (en) Video recommendation method, device, equipment and computer storage medium
CN116541592A (en) Vector generation method, information recommendation method, device, equipment and medium
CN115937556A (en) Object identification method, device, equipment and storage medium
CN112231700B (en) Behavior recognition method and apparatus, storage medium, and electronic device
CN114547440A (en) User portrait mining method based on internet big data and artificial intelligence cloud system
CN114708051A (en) Method and device for vehicle recommendation based on user information
CN114691981A (en) Session recommendation method, system, device and storage medium
CN115578100A (en) Payment verification mode identification method and device, electronic equipment and storage medium
CN113761272A (en) Data processing method, data processing equipment and computer readable storage medium
CN115461740A (en) Behavior control method and device and storage medium
CN114781625B (en) Network model training and push content determining method and device
CN109462778B (en) Live broadcast identification recommendation method and related equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20240201

Address after: Chinatelecom tower, No. 19, Chaoyangmen North Street, Dongcheng District, Beijing 100010

Applicant after: Tianyi Safety Technology Co.,Ltd.

Country or region after: China

Address before: No. 202 Jiayuan Street, Ningjiang District, Songyuan City, Jilin Province, 138000

Applicant before: Lv Yanna

Country or region before: China

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant