CN111737320A - Method and device for establishing group user behavior baseline and computer equipment - Google Patents

Method and device for establishing group user behavior baseline and computer equipment Download PDF

Info

Publication number
CN111737320A
CN111737320A CN202010621812.3A CN202010621812A CN111737320A CN 111737320 A CN111737320 A CN 111737320A CN 202010621812 A CN202010621812 A CN 202010621812A CN 111737320 A CN111737320 A CN 111737320A
Authority
CN
China
Prior art keywords
user
group
behavior baseline
establishing
behavior
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010621812.3A
Other languages
Chinese (zh)
Inventor
罗振珊
唐炳武
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Property and Casualty Insurance Company of China Ltd
Original Assignee
Ping An Property and Casualty Insurance Company of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Property and Casualty Insurance Company of China Ltd filed Critical Ping An Property and Casualty Insurance Company of China Ltd
Priority to CN202010621812.3A priority Critical patent/CN111737320A/en
Publication of CN111737320A publication Critical patent/CN111737320A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Databases & Information Systems (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

When the establishment method is realized, a user portrait is established first, then the user is classified through the user portrait, and finally the group user behavior baseline of the same category is established based on the individual behavior baseline of the users of the same category. By using the method of the embodiment of the application, a plurality of group user behavior baselines used by users of different categories can be established quickly.

Description

Method and device for establishing group user behavior baseline and computer equipment
Technical Field
The application relates to the field of data mining, in particular to a method and a device for establishing a group user behavior baseline and computer equipment.
Background
With the rapid development of network application technology, network behaviors of users are more and more diversified, and how to identify the behaviors of network users, find abnormal behavior events and ensure the safety of a network becomes more and more important. At present, a behavior baseline of an individual is mainly established, and then whether an abnormal behavior event occurs to a user or not is judged through the behavior baseline of the individual. However, for convenience of management, the same group user behavior baseline is generally used for managing the same department or group, but the behavior baseline of each person is different in working mode, habit and the like, and the situation that the group user behavior baseline is not matched with the person may occur, so that management is disordered.
Disclosure of Invention
The application mainly aims to provide a method and a device for establishing a group user behavior baseline, computer equipment and a storage medium, and aims to solve the problem that the behavior baselines aiming at different types of people cannot be established quickly in the prior art.
In order to achieve the above object, the present application provides a method for establishing a group user behavior baseline, including:
acquiring a user portrait of each user and an individual behavior baseline corresponding to the user portrait, wherein the user portrait is a portrait constructed based on specified information of the user and log history data corresponding to the user within a specified time period;
performing clustering calculation on all the user figures to obtain user groups of different categories;
and establishing corresponding group user behavior baselines based on the individual behavior baselines of different users in the user group of the same category.
Further, the method for acquiring the individual behavior baseline corresponding to the user image includes:
acquiring log history data of the user and specified information of the user;
acquiring dates corresponding to all data in the log historical data;
classifying data with a date of working days to obtain working day log historical data, and classifying data with a date of holidays to obtain holiday log historical data;
and establishing a working day individual behavior baseline of the user according to the working day log historical data and the designated information of the user, and establishing a holiday individual behavior baseline of the user according to the holiday log historical data and the designated information of the user.
Further, the step of establishing a corresponding group user behavior baseline based on individual behavior baselines of different users in the same category of user group further includes:
adopting an orphan forest algorithm to remove abnormal data in individual baselines of different users in a user group of the same category;
and establishing the group user behavior baseline by utilizing each individual behavior baseline after abnormal data are eliminated.
Further, after the step of establishing a corresponding group user behavior baseline based on the individual behavior baselines of different users in the same category of user group, the method further includes:
acquiring a current behavior log of a current period of a first user and a user portrait of the first user;
extracting a specified characteristic value of the current behavior log, wherein the specified characteristic value is a characteristic value required to be embodied in the behavior baseline of the group of users; and determining a user group category of the first user from the user representation of the first user;
comparing the specified characteristic value with a reference characteristic value corresponding to the specified characteristic in a first group user behavior baseline, wherein the first group user behavior baseline is a group user behavior baseline corresponding to a user group category to which the first user belongs;
and if the comparison result meets the condition of triggering risk early warning, sending alarm information.
Further, if the comparison result meets the condition for triggering risk early warning, after the step of sending alarm information, the method further comprises the following steps:
judging whether the specified characteristic value reaches a preset abnormal data threshold value or not;
and if not, marking the specified characteristics on the individual behavior baseline corresponding to the first user.
Further, after the step of labeling the specified feature on the individual behavior baseline corresponding to the first user, the method further includes:
judging whether the number of times of the characteristic marking on the individual behavior baseline corresponding to the first user reaches a preset number value or not;
and if so, reconstructing an individual behavior baseline corresponding to the first user.
Further, in an embodiment, after the step of establishing the corresponding group user behavior baseline based on the individual behavior baselines of the different users in the same category of user group, the method further includes:
and associating the users with the categories by utilizing association rules.
The present application further provides a device for establishing a group user behavior baseline, including:
an acquisition unit, configured to acquire a user profile of each user, and an individual behavior baseline corresponding to the user profile, where the user profile is a profile constructed based on specified information of the user and log history data corresponding to the user within a specified time period;
the clustering unit is used for carrying out clustering calculation on all the user figures to obtain user groups of different categories;
the establishing unit is used for establishing corresponding group user behavior baselines based on the individual behavior baselines of different users in the user group of the same category.
The present application further provides a computer device comprising a memory storing a computer program and a processor implementing the steps of the method of any one of claims 1 to 7 when the processor executes the computer program.
The present application also provides a computer-readable storage medium having stored thereon a computer program which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.
According to the method, the device and the computer equipment for establishing the group user behavior baseline, when the establishing method is realized, the user portrait is established first, then the user is classified through the user portrait, and finally the group user behavior baseline of the same category is established based on the individual behavior baseline of the users of the same category. By using the method of the embodiment of the application, a plurality of group user behavior baselines used by users of different categories can be established quickly.
Drawings
Fig. 1 is a schematic flowchart of a method for establishing a group user behavior baseline according to an embodiment of the present application;
FIG. 2 is a block diagram illustrating an exemplary configuration of an apparatus for establishing a group user behavior baseline according to an embodiment of the present disclosure;
fig. 3 is a block diagram illustrating a structure of a computer device according to an embodiment of the present application.
The implementation, functional features and advantages of the objectives of the present application will be further explained with reference to the accompanying drawings.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
Referring to fig. 1, an embodiment of the present application provides a method for establishing a group user behavior baseline, including:
s1, acquiring a user portrait of each user and an individual behavior baseline corresponding to the user portrait, wherein the user portrait is a portrait constructed based on the specified information of the user and the log history data corresponding to the user in a specified time period;
s2, performing clustering calculation on all the user portraits to obtain user groups of different categories;
s3, establishing corresponding group user behavior baselines based on the individual behavior baselines of different users in the same category of user groups.
In the embodiment, the server acquires personal information of each user, establishes a user portrait and tags the log historical data by acquiring log historical data of each user and designated information of the user, can quickly read the information in the user portrait and establish an individual behavior baseline of the user, performs cluster analysis on the basis of the user portrait to obtain groups of different categories, and then establishes group user behavior baselines of the groups of different categories.
As described in step S1, the server obtains the specification information of each user, which mainly includes the gender, age, department, position information, academic information, and the like of the user. Then, establishing a user portrait by acquiring log historical data of each user and combining designated information of the user, wherein a label of the user portrait comprises (i) user types (divided into four types of a salesman, a housekeeper, a driving account and the like); working day activity, which is (the number of days in the past 90 working days for accessing the specified system)/total number of working days; third, holiday liveness, which is (the number of days of holidays (including weekends and legal holidays) accessing a specified system in the past 90 days)/total holiday number of days; fourthly, calculating the total shift duration/total days; whether there is any abnormal behavior, matching the results of other abnormal detection models. The above-mentioned specified system may be a PNBS (new core system for safe insurance business) or the like. The user type, the working day activity, the holiday activity, the diligence index and the like are obtained through the log historical data.
The individual behavior of each user is obtained by extracting the specified features from the log history data, for example, an individual behavior baseline is established based on the log history data of the past 90 days, and the extracted specified features include: total access frequency/day, including mean, standard deviation, Q1, Q3, maximum, minimum; number of SESSION _ ID/day, including mean, standard deviation, Q1, Q3, maximum, minimum; ③ IP number/day, including mean, standard deviation, Q1, Q3, maximum value, minimum value; price inquiring times/day, including mean value, standard deviation, Q1, Q3, maximum value and minimum value; retrieval times/day including mean, standard deviation, Q1, Q3, maximum value and minimum value; sixthly, the number of times of launching tracking is per day, including the mean value, the standard deviation, Q1, Q3, the maximum value and the minimum value; seventhly, the number of HTTP access failures/day comprises a mean value, a standard deviation, Q1, Q3, a maximum value and a minimum value. When the average value and the standard deviation of the data are calculated, in order to avoid the influence of noise data, a quartile range method is adopted to remove the noise data. Wherein, the above-mentioned Q1, Q3 are the middle quartile Q1, Q3, and the first quartile (Q1), also called the smaller quartile, is equal to the 25 th% of all the values in the sample after the arrangement from small to large. The second quartile (Q2), also known as the median, is equal to the 50 th% of all values in the sample, arranged from small to large. The third quartile (Q3), also known as the larger quartile, is equal to the 75% of all values in the sample arranged from small to large. The difference between the third quartile and the first quartile is also called an InterQuartile Range (IQR).
As described in step S2, when all user figures are obtained for clustering calculation, the optimal clustering number is determined by the elbow method to determine the clustering number, and then the user figures are clustered by using the Kmeans algorithm, specifically, the specific working process is to select K points as initial clustering centers, assign each object to the nearest center to form K clusters, recalculate the center of each cluster, repeat the above iteration steps until the clusters do not change any more or reach the specified iteration number, and finally obtain a plurality of user groups of different categories. The elbow method is a common method for removing vertex clustering numbers in the Kmeans calculation, and is not described herein.
As described in the above step S3, the group user behavior baseline of the user of the same category is established according to the individual behavior baseline of each user in the group of users of the same category, so that a behavior baseline suitable for the user of the category can be obtained. In the present application, the characteristics of the group user behavior baseline are the same as those of the individual user behavior baseline, but the specific corresponding numerical values may change. In one embodiment, each characteristic value in the group user behavior baseline may be an average value of the characteristic values in each individual user behavior baseline in the group, or the like.
In an embodiment, the method for obtaining the individual behavior baseline corresponding to the user image includes:
acquiring log historical data of the user;
obtaining the date corresponding to each piece of data in the log history data,
classifying data with a date of working days to obtain working day log historical data, and classifying data with a date of holidays to obtain holiday log historical data;
and establishing a working day individual behavior baseline of the user according to the working day log historical data and the designated information of the user, and establishing a holiday individual behavior baseline of the user according to the holiday log historical data and the designated information of the user.
In this embodiment, since the behavior baselines for workdays and holidays are different, separate analysis is required, and the interface can determine whether a given date is a workday or a holiday by invoking the hundredth interface http:// www.easybots.cn/api/holiday. Further, when the group user behavior baselines are established, workday group user behavior baselines, holiday group user behavior baselines and the like can be established according to needs. For example, when a workday group user behavior baseline is established, a workday individual behavior baseline is selected, and a holiday group user behavior baseline is established, and a holiday individual behavior baseline is selected.
In an embodiment, the step S3 of establishing a corresponding group user behavior baseline based on the individual behavior baselines of different users in the same category of user group further includes:
s301, eliminating abnormal data in individual baselines of different users in the same category of user groups by adopting a solitary forest algorithm;
s302, establishing the group user behavior baseline by using the individual behavior baselines after the abnormal data are removed.
In this embodiment, the above-mentioned orphan forest algorithm (iForest) is commonly used for mining abnormal data, such as attack detection and traffic anomaly analysis in network security, and a financial institution is used for mining out fraudulent behaviors. The algorithm has low memory requirement, high processing speed and linear time complexity. High-dimensional data and big data can be well processed, and online anomaly detection can be realized. The abnormal data refers to interference data, for example, the operation times of a certain user in a certain day may be particularly large or may be particularly small, and such obviously abnormal data may affect the result of data analysis, so that the isolated forest algorithm may be used to remove the interference data when calculating the mean and the standard deviation. For example: a user normally logs in the A webpage 1-2 times a day, but on a certain day, due to some reasons, the logging needs to be repeated for multiple times, 50 times are logged in totally, and the 50 times are abnormal data. And establishing a group user behavior baseline by using each individual behavior baseline after abnormal data are removed, wherein the obtained group user behavior baseline is more accurate and has stronger practicability.
In an embodiment, after the step S3 of establishing a corresponding group user behavior baseline based on the individual behavior baselines of different users in the same category of user group, the method further includes:
acquiring a current behavior log of a current period of a first user and a user portrait of the first user;
extracting a specified characteristic value of the current behavior log, wherein the specified characteristic value is a characteristic value required to be embodied in the behavior baseline of the group of users; and determining a user group category of the first user from the user representation of the first user;
comparing the specified characteristic value with a reference characteristic value corresponding to the specified characteristic in a first group user behavior baseline, wherein the first group user behavior baseline is a group user behavior baseline corresponding to a user group category to which the first user belongs;
and if the comparison result meets the condition of triggering risk early warning, sending alarm information.
In this embodiment, the individual behavior baselines and the group user behavior baselines are unified behavior baselines in a set period, such as a day behavior baseline, a week behavior baseline, a quarter behavior baseline, and the like, where the current period is the current period, and generally is the period that has not been completed. The comparison method is that in a range space, for example, the number of times of logging in the website a in a period is a designated characteristic, and the corresponding reference characteristic value is 5 times, so that when the designated characteristic value is not more than 7 times, the risk early warning cannot be started, that is, the condition for triggering the risk early warning is that the designated characteristic value is more than 8. In another embodiment, the formula Q1+1.5(Q3-Q1) is used as a trigger threshold, when the characteristic value is specified to be greater than Q1+1.5(Q3-Q1), the characteristic is considered to deviate from the individual behavior baseline, a risk alarm is automatically triggered and an instruction is sent to a server, and the server performs identification judgment on the characteristic. The above Q1 and Q3 are Q1 and Q3 in the quartile, and are not described herein again.
In one embodiment, if the comparison result meets the condition for triggering risk pre-warning, after the step of sending out alarm information, the method further includes:
judging whether the specified characteristic value reaches a preset abnormal data threshold value or not;
and if not, marking the specified characteristics on the individual behavior baseline corresponding to the first user.
In this embodiment, after triggering the risk alarm, the server determines whether the specified feature value reaches a preset abnormal data threshold, if so, removes abnormal data in the individual behavior baseline, and if not, marks the feature, because there is a possibility that the behavior habit of the first user changes, for example, the first user logs in the a webpage 1 to 4 times each day, but logs in 8 times today, and does not reach the abnormal data threshold although a warning is triggered, marks the feature, so as to track the data subsequently.
In one embodiment, after the step of labeling the specified feature on the individual behavior baseline corresponding to the first user, the method further includes:
judging whether the number of times of the characteristic marking on the individual behavior baseline corresponding to the first user reaches a preset number value or not;
and if so, reconstructing an individual behavior baseline corresponding to the first user.
In this embodiment, when the number of feature labels reaches a preset threshold, the change of the personal behavior of the first user is indicated, and the number of labeled times includes the sum of the number of times each labeled feature is labeled. For example: the number of times that the first user logs in the web page a each day is 1-4, but logs in the web page a 7 times today, although a warning is triggered, the abnormal data threshold is not reached, the feature is labeled once, if the number of times that the first user logs in the web page a each day in the next N days is 7-10, the feature of the log-in web page a is labeled N +1 times, during the period, other features may be also labeled M times, the number of times that the feature is labeled is equal to N +1+ M (both M and N are positive integers), and when the number of times that the feature is labeled reaches the preset threshold, it is determined that the personal behavior of the user changes, and the individual behavior baseline of the first user needs to be reestablished.
In an embodiment, after the step S3 of establishing a corresponding group user behavior baseline based on the individual behavior baselines of different users in the same category of user group, the method further includes:
and associating the users with the categories by utilizing association rules.
In the embodiment, the association rule is an important subject of data mining and is used for mining the correlation between valuable data items from a large amount of data. Common problems solved by association rules are: "how much chance a consumer has to buy product B if he bought product a? "and" what products he will also buy if he bought products C and D? "the same data features observed from different dimensions may result in different results, such as date, region, channel, product, user, etc., which are dimensions, a 3D model is built, and more than 80 types are obtained through the clustering of step S3, so that the main features of each type can be roughly known, such as type a: the main characteristics of subject a are 'failing', B type: the main features of subject b are "excellent", class C: c teacher's lesson, subject a, class D: d, the teachers in the lessons of b need to manually classify the subjects in the past to obtain the relationship between the category A and the category C, the deviation can occur due to the complicated work, the subjects of the lessons of C teachers can be obtained through association rules, most students are 'failing', therefore, the teaching of the subjects of a teachers has obvious problems in teaching of the subjects of a teachers can be obtained through mining and analyzing, and correction is needed, so that powerful technical basis and support are provided for supervising and urging the C teachers to improve and improve the teaching effect. If the users have relevance, the reasons of high or low performance, efficiency and the like can be analyzed according to the relevance, and reference is provided for solving the problems. For example, if two persons in a certain industry are colleagues and are logically in an upstream-downstream relationship, they are associated together by the association rule, and if the downstream person works inefficiently, the progress of the upstream may be affected. Further, in the present application, association relations for groups of each category may be established through association rules, and associations between users of each category are analyzed, so that relationships how different user groups should be matched with each other are mined based on group user behavior baselines of users of each category. In the application, the user categories are imaged, and then the association relations of the user categories are connected through colored arrows and the like, so that the user can conveniently check, analyze and use the user categories.
According to the method for establishing the group user behavior baseline, the user image is established first, then the users are classified through the user image, and then the group user behavior baseline of the same category is established based on the individual behavior baseline of the users of the same category. By using the method of the embodiment of the application, a plurality of group user behavior baselines used by users of different categories can be established quickly.
Referring to fig. 2, an embodiment of the present application further provides an apparatus for establishing a group user behavior baseline, including:
an acquisition unit 10 configured to acquire a user profile of each user, and an individual behavior baseline corresponding to the user profile, wherein the user profile is a profile constructed based on specification information of the user and log history data corresponding to the user within a specified time period;
a clustering unit 20, configured to perform clustering calculation on all the user profiles to obtain user groups of different categories;
the establishing unit 30 is configured to establish a corresponding group user behavior baseline based on individual behavior baselines of different users in a same category of user group.
In an embodiment, the apparatus for establishing a group user behavior baseline further includes:
a log obtaining unit, configured to obtain log history data of the user;
a date acquisition unit for acquiring a date corresponding to each piece of data in the log history data,
the classification unit is used for classifying the data with the date of working days to obtain working day log historical data and classifying the data with the date of holidays to obtain holiday log historical data;
and the individual behavior baseline establishing unit is used for establishing a working day individual behavior baseline of the user according to the working day log historical data and the specified information of the user, and establishing a holiday individual behavior baseline of the user according to the holiday log historical data and the specified information of the user.
In an embodiment, the establishing unit 30 further includes:
the abnormal elimination module is used for eliminating abnormal data in individual baselines of different users in the same category of user groups by adopting a solitary forest algorithm;
and the establishing module is used for establishing the group user behavior baseline by utilizing each individual behavior baseline after abnormal data are removed.
In an embodiment, the apparatus for establishing a group user behavior baseline further includes:
the system comprises a first acquisition unit, a second acquisition unit and a display unit, wherein the first acquisition unit is used for acquiring a current behavior log of a current period of a first user and a user portrait of the first user;
the extraction unit is used for extracting a specified characteristic value of the current behavior log, wherein the specified characteristic value is a characteristic value required to be embodied in the group user behavior baseline; and determining a user group category of the first user from the user representation of the first user;
a comparing unit, configured to compare the specified feature value with a reference feature value corresponding to the specified feature in a first group-user behavior baseline, where the first group-user behavior baseline is a group-user behavior baseline corresponding to a user group category to which the first user belongs;
and the alarm unit is used for sending alarm information if the comparison result accords with the condition of triggering risk early warning.
In an embodiment, the apparatus for establishing a group user behavior baseline further includes:
the first judgment unit is used for judging whether the specified characteristic value reaches a preset abnormal data threshold value or not;
and the marking unit is used for marking the specified characteristics on the individual behavior baseline corresponding to the first user if the specified characteristics are not marked.
In an embodiment, the apparatus for establishing a group user behavior baseline further includes:
the second judging unit is used for judging whether the times of marking the characteristics on the individual behavior baseline corresponding to the first user reach a preset quantity value or not;
and the reconstruction unit is used for reconstructing the individual behavior baseline corresponding to the first user if the individual behavior baseline is the same as the individual behavior baseline.
In an embodiment, the apparatus for establishing a group user behavior baseline further includes:
and the association unit is used for associating the users with the categories by utilizing the association rule.
The units, modules and the like in the above embodiments are devices for correspondingly executing the methods in the above embodiments.
The device for establishing the group user behavior baseline of the embodiment of the application establishes the user portrait first, then classifies the users through the user portrait, and then establishes the group user behavior baseline of the same category based on the individual behavior baseline of the users of the same category. By using the method of the embodiment of the application, a plurality of group user behavior baselines used by users of different categories can be established quickly.
Referring to fig. 3, a computer device, which may be a server and whose internal structure may be as shown in fig. 3, is also provided in the embodiment of the present application. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the computer designed processor is used to provide computational and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The memory provides an environment for the operation of the operating system and the computer program in the non-volatile storage medium. The database of the computer device is used for storing log data, user figures, behavior baselines and other data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, may implement the method for establishing a group user behavior baseline of any of the above embodiments.
Those skilled in the art will appreciate that the architecture shown in fig. 3 is only a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects may be applied.
The embodiment of the present application further provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the method for establishing a group user behavior baseline according to any one of the above embodiments is implemented.
It will be understood by those skilled in the art that all or part of the processes of the methods of the above embodiments may be implemented by hardware associated with instructions of a computer program, which may be stored on a non-volatile computer-readable storage medium, and when executed, may include processes of the above embodiments of the methods. Any reference to memory, storage, database, or other medium provided herein and used in the examples may include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms, such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), double-rate SDRAM (SSRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and bus dynamic RAM (RDRAM).
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, apparatus, article, or method that includes the element.
The above description is only a preferred embodiment of the present application, and not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings of the present application, or which are directly or indirectly applied to other related technical fields, are also included in the scope of the present application.

Claims (10)

1. A method for establishing a group user behavior baseline is characterized by comprising the following steps:
acquiring a user portrait of each user and an individual behavior baseline corresponding to the user portrait, wherein the user portrait is a portrait constructed based on specified information of the user and log history data corresponding to the user within a specified time period;
performing clustering calculation on all the user figures to obtain user groups of different categories;
and establishing corresponding group user behavior baselines based on the individual behavior baselines of different users in the user group of the same category.
2. The method for establishing a group user behavior baseline according to claim 1, wherein the method for acquiring the individual behavior baseline corresponding to the user image comprises:
acquiring log history data of the user and specified information of the user;
acquiring dates corresponding to all data in the log historical data;
classifying data with a date of working days to obtain working day log historical data, and classifying data with a date of holidays to obtain holiday log historical data;
and establishing a working day individual behavior baseline of the user according to the working day log historical data and the designated information of the user, and establishing a holiday individual behavior baseline of the user according to the holiday log historical data and the designated information of the user.
3. The method for establishing a group user behavior baseline according to claim 1, wherein the step of establishing a corresponding group user behavior baseline based on the individual behavior baselines of different users in the same category of user group further comprises:
adopting an orphan forest algorithm to remove abnormal data in individual baselines of different users in a user group of the same category;
and establishing the group user behavior baseline by utilizing each individual behavior baseline after abnormal data are eliminated.
4. The method for establishing a group user behavior baseline according to claim 1, wherein after the step of establishing a corresponding group user behavior baseline based on the individual behavior baselines of different users in the same category of user group, the method further comprises:
acquiring a current behavior log of a current period of a first user and a user portrait of the first user;
extracting a specified characteristic value of the current behavior log, wherein the specified characteristic value is a characteristic value required to be embodied in the behavior baseline of the group of users; and determining a user group category of the first user from the user representation of the first user;
comparing the specified characteristic value with a reference characteristic value corresponding to the specified characteristic in a first group user behavior baseline, wherein the first group user behavior baseline is a group user behavior baseline corresponding to a user group category to which the first user belongs;
and if the comparison result meets the condition of triggering risk early warning, sending alarm information.
5. The method for establishing a group user behavior baseline according to claim 4, wherein after the step of sending out alarm information if the comparison result meets the condition for triggering risk pre-warning, the method further comprises:
judging whether the specified characteristic value reaches a preset abnormal data threshold value or not;
and if not, marking the specified characteristics on the individual behavior baseline corresponding to the first user.
6. The method for establishing a group user behavior baseline according to claim 5, wherein after the step of labeling the specified feature on the individual behavior baseline corresponding to the first user, the method further comprises:
judging whether the number of times of the characteristic marking on the individual behavior baseline corresponding to the first user reaches a preset number value or not;
and if so, reconstructing an individual behavior baseline corresponding to the first user.
7. The method for establishing a group user behavior baseline according to claim 1, wherein in an embodiment, after the step of establishing the corresponding group user behavior baseline based on the individual behavior baselines of different users in the same category of user group, the method further comprises:
and associating the users with the categories by utilizing association rules.
8. An apparatus for establishing a group user behavior baseline, comprising:
an acquisition unit, configured to acquire a user profile of each user, and an individual behavior baseline corresponding to the user profile, where the user profile is a profile constructed based on specified information of the user and log history data corresponding to the user within a specified time period;
the clustering unit is used for carrying out clustering calculation on all the user figures to obtain user groups of different categories;
the establishing unit is used for establishing corresponding group user behavior baselines based on the individual behavior baselines of different users in the user group of the same category.
9. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.
CN202010621812.3A 2020-06-30 2020-06-30 Method and device for establishing group user behavior baseline and computer equipment Pending CN111737320A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010621812.3A CN111737320A (en) 2020-06-30 2020-06-30 Method and device for establishing group user behavior baseline and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010621812.3A CN111737320A (en) 2020-06-30 2020-06-30 Method and device for establishing group user behavior baseline and computer equipment

Publications (1)

Publication Number Publication Date
CN111737320A true CN111737320A (en) 2020-10-02

Family

ID=72652224

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010621812.3A Pending CN111737320A (en) 2020-06-30 2020-06-30 Method and device for establishing group user behavior baseline and computer equipment

Country Status (1)

Country Link
CN (1) CN111737320A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112579581A (en) * 2020-11-30 2021-03-30 贵州力创科技发展有限公司 Data access method and system of data analysis engine
CN114817377A (en) * 2022-06-29 2022-07-29 深圳红途科技有限公司 User portrait based data risk detection method, device, equipment and medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112579581A (en) * 2020-11-30 2021-03-30 贵州力创科技发展有限公司 Data access method and system of data analysis engine
CN112579581B (en) * 2020-11-30 2023-04-14 贵州力创科技发展有限公司 Data access method and system of data analysis engine
CN114817377A (en) * 2022-06-29 2022-07-29 深圳红途科技有限公司 User portrait based data risk detection method, device, equipment and medium
CN114817377B (en) * 2022-06-29 2022-09-20 深圳红途科技有限公司 User portrait based data risk detection method, device, equipment and medium

Similar Documents

Publication Publication Date Title
WO2020253358A1 (en) Service data risk control analysis processing method, apparatus and computer device
CN109165840B (en) Risk prediction processing method, risk prediction processing device, computer equipment and medium
CN109272396B (en) Customer risk early warning method, device, computer equipment and medium
CN109767322B (en) Suspicious transaction analysis method and device based on big data and computer equipment
CN109858737B (en) Grading model adjustment method and device based on model deployment and computer equipment
CN109598095B (en) Method and device for establishing scoring card model, computer equipment and storage medium
CN109376237B (en) Client stability prediction method, device, computer equipment and storage medium
CN109582876B (en) Tourist industry user portrait construction method and device and computer equipment
CN110738388B (en) Method, device, equipment and storage medium for evaluating risk conduction through association map
CN109543925B (en) Risk prediction method and device based on machine learning, computer equipment and storage medium
US20160012544A1 (en) Insurance claim validation and anomaly detection based on modus operandi analysis
CN111192153B (en) Crowd relation network construction method, device, computer equipment and storage medium
CN109801151B (en) Financial falsification risk monitoring method, device, computer equipment and storage medium
CN110729054B (en) Abnormal diagnosis behavior detection method and device, computer equipment and storage medium
CN112395500A (en) Content data recommendation method and device, computer equipment and storage medium
CN111737320A (en) Method and device for establishing group user behavior baseline and computer equipment
CN112784168B (en) Information push model training method and device, information push method and device
CN110781380A (en) Information pushing method and device, computer equipment and storage medium
Ghankutkar et al. Modelling machine learning for analysing crime news
CN112288279A (en) Business risk assessment method and device based on natural language processing and linear regression
CN112035775B (en) User identification method and device based on random forest model and computer equipment
CN113947076A (en) Policy data detection method and device, computer equipment and storage medium
Tatusch et al. Show me your friends and i’ll tell you who you are. finding anomalous time series by conspicuous cluster transitions
CN112990989A (en) Value prediction model input data generation method, device, equipment and medium
CN112464670A (en) Recognition method, recognition model training method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination