CN117874681A

CN117874681A - Detection method and device based on user overall behavior portraits, medium and equipment

Info

Publication number: CN117874681A
Application number: CN202410057030.XA
Authority: CN
Inventors: 许云风; 马振; 邹武; 马飞; 夏玉明; 魏国富; 陈言; 胡绍勇; 张照龙
Original assignee: Information and Data Security Solutions Co Ltd
Current assignee: Information and Data Security Solutions Co Ltd
Priority date: 2024-01-15
Filing date: 2024-01-15
Publication date: 2024-04-12

Abstract

The application discloses a detection method, a detection device, a detection medium and detection equipment based on overall behavior portraits of users, wherein a two-layer clustering algorithm is designed, and a first layer aims at feature clusters extracted from log data to obtain 7-class portrait labels; the second layer clusters again to form the whole portrait of all users on the basis of 7 portrait labels. And then, calculating the similarity by utilizing the characteristic data of the user to be detected and each overall behavior image, and dividing which overall behavior image the user to be detected belongs to according to the calculation result, wherein the overall behavior image can be used for describing the overall behavior of the user to be detected. The application realizes the following effects: comprehensively describing the overall behavior portraits of the users; the feature extraction is flexible, and the variation coefficient and gpt are used for assisting in marking; an unsupervised clustering algorithm is adopted to construct an overall behavior portrait, the algorithm is simple, and the clustering algorithm is packaged into an operator to accelerate the portrait detection speed; the method combines expert experience in the aspects of feature extraction and label formation, and is suitable for user behavior portrayal detection in the safety field.

Description

Detection method and device based on user overall behavior portraits, medium and equipment

Technical Field

The application relates to the field of data security, in particular to a detection method, a detection device, a detection medium and detection equipment based on an overall behavior portrait of a user.

Background

With the rapid development of the internet and digital technology, the development of a plurality of businesses causes the space for bearing mass data to be continuously enlarged, the data security situation is serious, the data security is protected from being separated from the insight of various users, how to search for abnormal users among the mass users and how to discriminate various risk behaviors in the mass access operation activities of the users is very important. Currently, in the field of data security, most algorithm detection technologies are commonly used for analyzing single-sided scene category abnormal user behaviors such as data leakage or compliance audit, and cannot clearly show the association between various behaviors, so that the accuracy of detection results is low.

Disclosure of Invention

In view of the above, the application provides a detection method, a detection device, a detection medium and detection equipment based on the overall behavior portraits of users, which are used for extracting user characteristics from different categories to form user portraits labels of different categories, and finally the obtained behavior portraits can comprehensively describe the behaviors of the users, so that the problem of lower accuracy of the current detection method is effectively solved.

According to one aspect of the application, a detection method based on the overall behavior portraits of a user is provided, which comprises the following steps:

acquiring a plurality of historical user characteristics, and generating user portrait tags of different categories based on the historical user characteristics, wherein the user portrait tags of different categories comprise a user potential risk tag, an account security behavior tag, a compliance audit behavior tag, a network attack behavior tag, a data leakage behavior tag, an operation behavior tag and a business operation behavior tag;

processing the user portrait labels of different categories by using a clustering method to obtain the overall user behavior portrait;

extracting image labels of users to be detected based on the user image labels of different categories;

and respectively calculating the similarity between the user to be detected and each overall behavior portrait according to the portrait label of the user to be detected, and determining a behavior detection result corresponding to the user to be detected according to the similarity, wherein the behavior detection result is used for identifying whether the user to be detected has abnormal behavior risks.

According to another aspect of the present application, there is provided a detection apparatus based on an overall behavioral image of a user, the apparatus including:

The tag generation module is used for acquiring a plurality of historical user characteristics and generating user portrait tags of different types based on the historical user characteristics, wherein the user portrait tags of different types comprise user potential risk tags, account security behavior tags, compliance audit behavior tags, network attack behavior tags, data leakage behavior tags, operation and maintenance operation behavior tags and business operation behavior tags;

the overall portrayal generating module is used for processing the user portrayal labels of different categories by using a clustering method to obtain the overall behavior portrayal of the user;

the user behavior detection module is used for extracting the portrait tags of the users to be detected based on the portrait tags of the users in different categories; and respectively calculating the similarity between the user to be detected and each overall behavior portrait according to the portrait label of the user to be detected, and determining a behavior detection result corresponding to the user to be detected according to the similarity, wherein the behavior detection result is used for identifying whether the user to be detected has abnormal behavior risks.

According to still another aspect of the present application, there is provided a medium having stored thereon a program or instructions which, when executed by a processor, implement the above-described method for detecting an overall behavioral portrayal based on a user.

According to still another aspect of the present application, there is provided an apparatus including a storage medium and a processor, the storage medium storing a computer program, the processor executing the computer program to implement the above detection method based on the overall behavior portrayal of the user.

By means of the technical scheme, the method and the device extract a large number of features from 7 aspects, form 7 types of labels through a clustering algorithm, cluster the labels through the clustering algorithm, finally form overall behavior portraits of all users, and detect portraits of the users to be detected through a similarity algorithm. The innovation point of the behavior portrait detection method is as follows: 1) Comprehensively describing the overall behavior portraits of the users; 2) The feature extraction is very flexible. The method not only uses the variation coefficient, but also uses the gpt technology to perform auxiliary marking, and simultaneously adopts the large model technology to detect the attack type; 3) The image detection speed is high, and the use is simple. The overall behavior portraits are constructed by adopting an unsupervised clustering algorithm, so that the complexity of the application of a supervised learning algorithm is avoided, and meanwhile, the speed of behavior portrait detection is greatly increased by encapsulating the clustering algorithm into operators; 4) Integrates the expert experience. A large amount of security expert experience is combined in the aspects of feature extraction and label formation, so that the method can be well applied to user behavior portrait detection in the security field.

The foregoing description is only an overview of the technical solutions of the present application, and may be implemented according to the content of the specification in order to make the technical means of the present application more clearly understood, and in order to make the above-mentioned and other objects, features and advantages of the present application more clearly understood, the following detailed description of the present application will be given.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute an undue limitation to the application. In the drawings:

fig. 1 is a schematic flow chart of a detection method based on a user overall behavior image according to an embodiment of the present application;

fig. 2 is a schematic flow chart of another detection method based on overall behavior image of a user according to an embodiment of the present application;

fig. 3 shows a block diagram of a detection device based on a user overall behavior image according to an embodiment of the present application.

Detailed Description

The present application will be described in detail hereinafter with reference to the accompanying drawings in conjunction with embodiments. It should be noted that, in the case of no conflict, the embodiments and features in the embodiments may be combined with each other.

In this embodiment, a method for detecting an overall behavior portrait based on a user is provided, as shown in fig. 1, where the method includes:

step 101, acquiring a plurality of historical user features, and generating user portrait labels of different categories based on the historical user features, wherein the user portrait labels of different categories comprise user potential risk labels, account security behavior labels, compliance audit behavior labels, network attack behavior labels, data leakage behavior labels, operation and maintenance operation behavior labels and business operation behavior labels.

The detection method based on the overall behavior portraits of the users is used for describing various operation behaviors of the users and determining that each user to be detected has abnormal behavior risks based on the operation behaviors of the users. Specifically, in the step, firstly, user characteristics are extracted from history log data, and then, user portrait labels of behaviors of each of a plurality of major classes are obtained through a clustering algorithm, wherein user portrait labels of different major classes can mine and analyze user behaviors from different angles, and further, user behavior portraits describing overall behaviors of users are generated.

The user portrait tags comprise seven main categories of user potential risk tags, account security class behavior tags, compliance audit class behavior tags, network attack class behavior tags, data leakage class behavior tags, operation and maintenance operation class behavior tags and business operation class behavior tags.

The user characteristics corresponding to each user portrait tag may be set according to expert experience. For example, the user characteristics corresponding to the user potential risk tags include user personal characteristics and user tenure characteristics; the user characteristics corresponding to the account security behavior labels comprise login characteristics; the user characteristics corresponding to the compliance audit class behavior labels comprise compliance audit behavior characteristics; the user characteristics corresponding to the network attack behavior labels comprise access operation characteristics, data receiving and transmitting characteristics and attack record characteristics; the user characteristics corresponding to the data leakage behavior labels comprise outgoing sensitive data characteristics; the user characteristics corresponding to the operation and maintenance operation type behavior labels comprise system operation characteristics and database operation characteristics; the user characteristics corresponding to the business operation class behavior labels comprise business report operation characteristics. Optionally, as shown in fig. 2, step 101 includes the steps of:

step 101.1, determining a first historical user characteristic corresponding to the user potential risk, clustering the first historical user characteristic, and determining a user potential risk label according to a clustering result of the first historical user characteristic.

In step 101.1, the first historical user characteristics corresponding to the user potential risk include user personal characteristics including age, gender, contact, etc., and user job title characteristics including department, post, job title status, etc. In this step, first, a user attribute related to the potential risk of the user, i.e. a first historical user feature, is selected, as shown in table 1 below, in which, the job status feature may include two attribute values of on-job and off-job, the post feature may include five attribute values of an operation and maintenance engineer, the user type may include three attribute values of an enterprise employee, a partner and a third party, and at this time, if a user is a salesman of the on-job partner, a feature vector or a triplet corresponding to the user may be expressed as (1,5,2); clustering all the selected first historical user characteristic data through a k-means algorithm, wherein for a certain user, if the attribute value to be selected is null, the calculation is not participated, for example, if the attribute value corresponding to the post characteristic of the certain user is null, the calculation is not participated; finally, determining a user potential risk label, namely a general class of portrait labels of the user potential risk according to a clustering result, wherein three sub-class portrait labels are generated in the table through clustering and respectively correspond to three specific portraits of low potential risk, medium potential risk and high potential risk in the user potential risk portraits, and the clustering type 1 corresponds to a low potential risk user (the high probability risk of staff of an incumbent enterprise is low, so the staff is marked as low potential risk); cluster category 3 corresponds to high risk potential users (off-duty third party operation and maintenance engineers need to pay attention to and thus are marked as high risk potential); the potential risk users in the cluster category 2 correspondence (the testers of the incumbent partners also need to pay attention to the risk and are therefore marked as medium potential risks).

TABLE 1

TABLE 2

And step 101.2, determining a second historical user characteristic corresponding to login authentication, clustering the second historical user characteristic, and determining an account security behavior label according to a clustering result of the second historical user characteristic.

In step 101.2, the second user characteristic corresponding to the account security class behavior tag is a characteristic relating to a login authentication scenario. In this step, first, the second history user feature corresponding to login authentication is extracted, as shown in table 3 below, in which the number of login successes, the number of login failures, the number of device ports accessed, the similarity of passwords, the access volatility, and the number of source devices used are selected as the second history user feature; then clustering all selected second historical user characteristic data by using a k-means algorithm; finally, obtaining a user account security behavior label, namely a sub-portrait label of the main class of account security behavior according to a clustering result, wherein four sub-portrait labels of the main class of account security behavior are generated through clustering in the table, and correspond to four specific sub-portraits of ' frequent login behavior ', no account risk ', ' almost normal login behavior ', low account risk ', ' suspicious login behavior ', higher account risk ' and ' quite suspicious login behavior ', wherein the medium moment of the user login access behavior corresponding to the clustering class 1 indicates that the corresponding user has high probability of being an operator with high login success times; the number of source devices used by the users and the number of the ports of the accessed target devices corresponding to the clustering type 2 are very outstanding, but other indexes are relatively normal, so that the login behavior is almost normal, and the account risk is low; the number of times of login failure of the user corresponding to the clustering type 3 is higher, the similarity of passwords and the volatility of access are both higher, so that the login behavior is suspicious, the risk of an account is higher, the number of times of login failure of the user corresponding to the clustering type 4 is very higher, the similarity of the passwords and the volatility of access are very higher, so that the login behavior is very suspicious, and the risk of an account is high.

TABLE 3 Table 3

Sequence number	Feature name	Annotating
			1	succ_cnt	Number of successful logins
2	fail_cnt	Number of login failures
			3	device_port_cnt	Number of device ports accessed
4	password_s	Similarity of passwords
			5	access_v	Volatility of access
6	use_device_cnt	Number of source devices used

TABLE 4 Table 4

The password similarity feature is obtained according to the login password, specifically, firstly, a historical login password is obtained, and the login password groups corresponding to each user are obtained according to the grouping of the historical login passwords by the user. In addition, the system can be grouped according to the user and the application system at the same time, so that the user safety can be conveniently inspected aiming at different application systems. And then calculating a login password value corresponding to each historical login password. In a specific calculation process, determining ASCII code values of each character in the historical login password, and summing the ASCII code values to obtain a quantized value corresponding to each password, namely a login password value. And finally, calculating the password similarity characteristic of each user according to the login password value. Specifically, an aggregation function is used to obtain a mean value, a standard deviation and a variation coefficient respectively, and password similarity is the ratio of the standard deviation to the mean value. The aggregation function may be specifically an aggregation function in a ClickHouse-SQL grammar and a Spark-SQL grammar in a GPL engine, such as an AVG function.

The access volatility characteristic is obtained according to access records, specifically, firstly, historical access records are obtained, and according to the grouping of users on the historical access records, a historical access record group corresponding to each user is obtained. In addition, the system can be grouped according to the user and the application system at the same time, so that the user safety can be conveniently inspected aiming at different application systems. And then according to the access frequency characteristics corresponding to each user, calculating the access volatility corresponding to each user. For example, data of one month can be taken and grouped according to the day, the daily access times of each user are counted, and finally the daily access times of each user are processed by using an aggregation function to obtain the average value, standard deviation and variation coefficient of the daily access times, wherein the access volatility characteristic of the user is the ratio of the standard deviation to the average value. In addition, it should be noted that several other characteristics, besides the access times, are counted based on the data of the last day. And 203, determining a third historical user characteristic corresponding to the non-compliance audit behavior, clustering the third historical user characteristic, and determining a compliance audit class behavior label according to a clustering result of the third historical user characteristic.

And 101.3, determining a third historical user characteristic corresponding to the unqualified audit behavior, clustering the second historical user characteristic, and determining a compliance audit behavior label according to a clustering result of the third historical user characteristic. In step 101.3, a third historical user characteristic corresponding to the compliance audit class behavior tag relates to non-compliance audit behavior, such as a fort bypass visit or office direct visit to a production area, etc. In this step, first, selecting a third historical user feature, which is a user attribute related to compliance audit behavior, and selecting four features of a source address area, a target address area, address compliance and whether to access a core database in table 5, where the source address area and the target address area may respectively include four attribute values of a test area, an office area, a production area and other areas, the address compliance may include two attribute values of compliance and non-compliance, and whether to access the core database may include two attribute values of yes or no; clustering all selected third historical user characteristic data k-means algorithms; finally, determining compliance audit behavior labels according to a clustering result, wherein the table 6 is shown as a table, and the table is provided with four sub-class sub-portrait labels which respectively correspond to three specific sub-portraits of normal compliance audit behavior, risk of compliance audit behavior and high risk of compliance audit behavior in the compliance audit behavior portraits, wherein the clustering class 1 corresponds to a group user, accesses a production area by an office area, but does not have bypass behavior or access to a core database, so that the compliance audit behavior is classified as a normal label; the cluster class 2 corresponding class users access the production area by the test area, have no detour behavior but access the core database, so that risk labels exist for compliance audit behavior; cluster class 2 corresponds to a class user accessing the production area by the office area, has detour behavior, and has accessed the core database, so there is a high risk tag for classifying compliance audit class behavior.

TABLE 5

TABLE 6

And 101.4, determining a fourth historical user characteristic corresponding to the network access behavior, clustering the fourth historical user characteristic, and determining a network attack behavior label according to a clustering result of the fourth historical user characteristic.

In step 101.4, the user characteristics corresponding to the network attack behavior tag are used to characterize the access behavior of the user, and mainly relate to fields such as access url (http_url), user agent (http_user), cookie (http_cookie), request body (http_request_body), number of bytes sent (send_bytes), number of bytes received (received_bytes), time occupied by the access operation (taken_time). In this step, firstly, selecting a user attribute related to the network attack behavior, namely a fourth historical user feature, as shown in table 7 below, wherein the total time occupied by the access operation, the total number of bytes transmitted, the total number of bytes received and the number of attacks are selected as the fourth historical user feature; then clustering all selected fourth historical user features by using a k-means algorithm; finally, obtaining a plurality of user network attack behavior labels, namely a large class of sub-portrait labels of network attack behaviors according to a clustering result, wherein three sub-portrait labels of the network attack behaviors are generated through clustering in a table 8, and the three sub-portrait labels respectively correspond to three specific sub-portraits without obvious network attack behaviors, risk of existence of the network attack behaviors and high risk of existence of the network attack behaviors, wherein the clustering type 1 corresponds to the small flow of group user requests and the small flow of received flow, and the identified attack times are zero, so that the sub-portrait labels without obvious network attack behaviors are classified; the flow requested by the corresponding group users of the clustering type 2 and the received flow are large, and the identified attack times are large, so that the attack times are classified as risk labels of network attack behaviors; the cluster type 3 has large flow of the corresponding group user request and the received flow, and the attack times are large, so that the network attack behavior is classified as having a high risk label.

TABLE 7

TABLE 8

The three characteristics of the total time occupied by the access operation, the total number of bytes sent and the total number of bytes received are obtained by grouping the users first and then summing the three characteristics for each user. The attack times are obtained by the following steps:

acquiring historical attack records, and detecting an attack type (attack_type) of each historical attack record based on the payload, wherein the attack type is detected through a text classification model such as Tf-Idf, word2Vec, BERT and the like; grouping the historical attack records according to users to obtain attack record groups corresponding to each user; counting the attack times by using an aggregation function count () to obtain the attack times characteristic corresponding to each user, wherein the attack times with the attack type not being 'no attack' are attack times.

Where payload is the payload. It will be appreciated that in order to make data transmission more reliable, the original data is transmitted in batches, and certain auxiliary information, such as the size of the data volume, check bits, etc., is added to the head and tail of each batch of data, so as to obtain a data packet, where the original data is payload.

The specific steps for detecting attack types based on payload are as follows:

Raw training data is prepared: taking one month of data from a large project site, taking the data into consideration that the data needs to be desensitized, so that each piece of data only takes four fields, namely http_url, http_ useragent, http _cookie and http_request_body, and then splicing to generate a new field text, namely

text=http_url+http_user+http_cookie+http_request_body, which is the training text field to be used by the following text classification model; each piece of data is tagged with an attack type tag (label): here we use chatgpt for marking, considering the efficiency of marking. And acquiring a public interface calling mode of chatgpt, setting a prompt word (prompt), and crawling an attack type (attack_type) corresponding to the text to be trained in a script mode. Then, please ask the security technical expert to confirm and correct the data with the attack type not being empty, and the attack_type after adjustment and correction is the label of the sample. It should be noted that if the sample attack type is null, the normal sample is indicated, and marked as "no attack"; because of the labor cost and the efficiency of expert correction, it is assumed here that the normal samples of the GPT decisions are all correct. The interface of GPT3.5 is preferably invoked here, and GPT4 is more preferred; training using a text classification model: text classification is performed using a transducer-based model, where the BERT text classification model is selected for training. Variants of the BERT model are BERT-Base, BERT-Large, roBERTa, ALBERT, ELECTRA, etc., where we choose the roberta-Base model. It should be noted that, here, a pre-training mode is adopted, and a pre-training model is downloaded from huggingface; detecting data to be analyzed, and outputting attack types: the data of the last day of the project site is detected, and an attack type (attack_type) is output.

And 101.5, determining a fifth historical user characteristic corresponding to the data leakage behavior, clustering the fifth historical user characteristic, and determining a data leakage behavior label according to a clustering result of the fifth historical user characteristic.

In step 101.5, the fifth historical user feature corresponding to the data leakage behavior tag is used to analyze outgoing sensitive data, and the access point may include a sensitive table, a sensitive file, a sensitive interface, etc., and access a large amount of sensitive data through the access point, or send a large amount of sensitive data out through the access point. The fifth history user feature can be analyzed by a DLP (Data leakage prevention, data leakage protection) log, referring to fields of risk level (task_level), file total size (file_size), file name (data_name), file type (data_type), and the like. In this step, firstly, selecting a user attribute related to the data leakage behavior, namely, a fifth historical user feature, as shown in table 9 below, wherein a plurality of features such as the number of files, the number of file types and the like are selected as the fifth historical user feature; then clustering all extracted fifth historical user features by using a k-means algorithm; finally, obtaining a plurality of data leakage behavior labels, namely a large class of sub-image labels of the data leakage behavior according to a clustering result, wherein the four sub-image labels of the data leakage behavior are formed through clustering in a table 10, and correspond to four specific sub-images without obvious data leakage behavior, with potential data leakage risk, higher data leakage risk and higher data leakage risk in the data leakage behavior images respectively, wherein the clustering type 1 corresponds to a small number of files accessed by a group user, has small total flow packet access and low data leakage risk, and is classified as the non-obvious data leakage behavior labels; the clustering type 2 corresponds to the group users, has more access files and larger access total flow package, and relates to a small number of sensitive files and high risk files, so that the clustering type 2 is classified as having potential risk labels of data leakage; the clustering type 3 corresponds to the group users, the number of the accessed files is large, the total access flow package is large, and the related sensitive files and high-risk files are large, so that the sensitive files and the high-risk files are classified as having high data leakage risk labels; the number of the files accessed by the users of the corresponding group of the clustering type 4 is large, the total flow packet accessed is large, the related sensitive files and high-risk files are also large, and a plurality of intercepted files exist, so that the high risk of data leakage is classified.

TABLE 9

Sequence number	Feature name	Annotating
			1	file_cnt	Number of files
2	file_type_cnt	Number of file types
			3	file_size_total	Total size of flow packets
4	sens_file_cnt	Number of sensitive files
			5	highrisk_file_cnt	High risk number of files
6	nopass_file_cnt	Number of files intercepted

Table 10

And 101.6, determining a sixth historical user characteristic corresponding to the operation and maintenance operation behaviors, clustering the sixth historical user characteristic, and determining an operation and maintenance operation class behavior label according to a clustering result of the sixth historical user characteristic, wherein the operation and maintenance operation behaviors comprise system operation behaviors and database operation behaviors.

In step 101.6, the sixth historical user characteristics corresponding to the operation and maintenance operation type behavior tag include system operation characteristics and database operation characteristics, specifically may relate to fields such as a resource name (resource_name), an operation instruction (process_cmdline), a database table name (database_table_name), a destination port (dst_port), a session id (session_id), and the like, and may be obtained through an operation log of the 4A server or the bastion machine, and the like. The 4A server is a server capable of providing a unified security management platform solution and comprises centralized authentication, centralized account numbers, centralized authorities and centralized audit services. In this step, firstly, selecting a user attribute related to operation and maintenance operation type behaviors, namely a sixth historical user feature, as shown in the following table 11, wherein the system operation features such as the linux instruction operation times, the linux instruction operation type numbers and the like are selected, and meanwhile, the database operation features such as the database instruction operands, the database operation type numbers and the like are also selected as the sixth historical user feature, wherein the risk instruction in the table 11 refers to a risk operation instruction needing to pay attention, and the risk instruction refers to "rz", "sz", "rcp", "scp" and the like for the linux and refers to "drop", "ter", "delete", "trunk" and the like for the database; then clustering all extracted sixth historical user features by using a k-means algorithm; finally, obtaining a plurality of operation and maintenance operation class behavior labels, namely operation and maintenance operation class behavior sub-class image labels, wherein four sub-class sub-image labels are formed in the table through clustering, and correspond to four specific sub-images of operation and maintenance operation class behavior images, namely operation without obvious risk instruction, operation with potential risk instruction, operation with higher risk instruction and operation with very high risk instruction, wherein the clustering type 1 corresponds to the group user instruction operation times, the number of database tables of the operation is small, the number of ports and sessions involved is small, and no risk operation is hit, and the user instruction operation is normal, so that the operation labels without obvious risk instruction are classified; the cluster type 2 corresponds to the cluster user instruction operation times are larger, the linux and database operation times are also larger, and a small amount of risk operation is involved, and the user behavior is suspicious, so that the user behavior is classified as having a potential risk instruction operation label; the number of ports accessed by the users of the corresponding group of the clustering type 3 and the number of tables of the operation are large, and more risk operations are involved, and the user behavior is more suspicious and needs to be focused, so that the user behavior is classified as having a higher risk instruction operation label; the clustering type 4 has very large number of operation times of corresponding group user instructions, large number of accessed ports and operation tables, and involves a large number of risk operations, and the user behavior is very suspicious and needs to pay attention to, so that the clustering type 4 is classified as having high risk instruction operation labels.

TABLE 11

Sequence number	Feature name	Annotating
			1	operate_cnt	Number of instruction operations
2	port_cnt	Number of access ports
			3	linux_opt_cnt	Times of linux instruction operation
4	database_opt_cnt	Database instruction number of operations
			5	linux_cmd_cnt	Number of instruction types of linux operation
6	database_cmd_cnt	Database operation type number
			7	table_cnt	Number of operation tables
8	session_cnt	Number of session
			9	linux_keypoint_cnt	Linux risk instruction operation times
10	linux_keypoint_num	Linux risk instruction operation category number
			11	database_keypoint_cnt	Database risk instruction number of operations
12	database_keypoint_num	Database risk instruction operation category number

Table 12

And 101.7, determining a seventh historical user characteristic corresponding to the operation behavior of the service system, clustering the seventh historical user characteristic, and determining a service operation type behavior label according to a clustering result of the seventh historical user characteristic.

In step 101.7, the seventh historical user feature corresponding to the business operation type behavior tag is used to analyze the operation behavior of the user in the business system, and may specifically relate to fields such as report (data_name), operation type (genetic_opt_type), report type (data_type), report risk level (data_level), and the like. In this step, firstly, selecting a seventh historical user feature which is a user attribute related to the business operation behavior, as shown in the following table 13, wherein the characteristics such as the operation times, the sensitive data operation times and the like are selected as the seventh historical user feature, the sensitive data operation times are counted to be the times of operations with the data grade of high risk, and the key operations refer to operations needing attention such as downloading, deleting and the like; then clustering all extracted seventh historical user features by using a k-means algorithm; finally, obtaining a plurality of service operation type behavior labels, namely, service operation type behavior sub-portrait labels, according to a clustering result, wherein three sub-portrait labels are formed in the table through clustering, and correspond to three specific sub-portrait labels without obvious service operation risks, with potential service operation risks and with high service operation risks in service operation type behavior portraits respectively, wherein the clustering type 1 corresponds to the small number of service operation times of a group user, no high-risk operation is involved, the number of accessed reports and the number of key operation times are small, and therefore the clustering type 1 is classified as an operation label without obvious risk instructions; the clustering type 2 has larger corresponding group user business operation times, larger number of accessed reports and key operation times, and a small number of report operations with high risk level exist, so the report operations are classified as potential business operation risk labels; the clustering type 3 has larger operation times of corresponding group users, larger number of accessed reports and key operation times, and a large number of report operations with high risk level exist.

TABLE 13

Sequence number	Feature name	Annotating
			1	operate_cnt	Number of operations
2	high_risk_table_operate_cnt	Number of data operations related to sensitivity
			3	table_cnt	Number of operation report forms
4	package_cnt	Number of received traffic packets
			5	keypoint_cnt	Number of critical operations

TABLE 14

And 102, processing user portrait labels of different categories by using a clustering method to obtain the overall behavior portrait of the user.

In this step, the seven types of labels are first extracted to construct overall behavior features, where the overall behavior features are used to represent different types of user features. As shown in table 15 below, there are a total of 7 image tags (corresponding to 7 feature names in table 15), each tag having a maximum of four attribute values, and the overall behavior of each user can be quantified as a 7-tuple.

TABLE 15

After the 7-tuple data of a plurality of historical users are obtained, the overall behavior characteristics are clustered through a second-layer clustering algorithm (unsupervised learning), and then the overall behavior portraits of the users can be determined according to the clustering result of the overall behavior characteristics. Wherein, each user overall behavior image has different characteristics and corresponds to different user groups. In the step, all the integral behavior characteristics are clustered by using a k-means algorithm, and the integral behavior portraits of the users are determined according to the clustering result, wherein 6 typical integral behavior portraits of the users are formed through clustering as shown in the following table 16, the group 1 corresponding to the group user login authentication, compliance audit, network attack, data leakage, operation and maintenance operation and business operation are all normal, potential risk points of the group user login authentication, compliance audit, network attack, data leakage, operation and maintenance operation are related to the attributes of the users, and the users are normal users overall; the risk of login authentication and compliance audit behavior of the class 2 corresponding group users is high, and the risk exists in network attack behavior, but no obvious data leakage behavior, instruction operation behavior and business operation behavior exist, and in general, the users belong to non-compliance access users; the network attack behavior of the group users corresponding to the category 3 has risks, and has potential risks of data leakage and potential risks of operation of the business, but login authentication, compliance access behavior and instruction operation are normal, and in general, the users possibly cause the leakage of business data, which is typical potential risk users; class 4 corresponds to the problem that user login authentication has high risk instruction operation behaviors, and risks exist in compliance audit, network attack, data leakage and business operation, and in general, the users are illegal instruction operation users; the class 5 corresponds to the class group users, which have non-compliance access and very high business operation risks, the risks of data leakage and account numbers are higher, and the risks of network attack and instruction operation exist, so that in general, the users belong to illegal business operation users; the class 6 has high risks for network attack, data leakage, instruction operation and business operation of corresponding class group users, the risk exists for compliance audit, the account risk is high, and the users belong to typical data leakage users.

Further, the k-means algorithm in the previous step can be integrated in the analysis engine GPL through encapsulation into operators, and the GPL engine combines the clickhouse-sql, spark-sql and various encapsulated operators through pipeline symbols for streaming calculation. The operator is packaged through java language, and can be efficiently docked with clickhouse-sql and spark-sql, so that the operation speed is high.

Table 16

/>

And 103, extracting the image labels of the users to be detected based on the user image labels of different categories.

In this step, a user portrait detection operation is performed for the user to be detected. Specifically, the user behavior characteristics to be detected are extracted according to the method in the foregoing step 101, to obtain the following table 17; and then quantifying the characteristics of the user to be detected to obtain image labels of the user to be detected, as shown in the following table 18, wherein the labels 1 to 7 respectively correspond to the seven major image labels in the steps 101.1 to 101.7, the group categories 1 to 6 respectively correspond to the six user categories in the table 16, and test9527 is the user to be detected.

TABLE 17

User name	test9527
		Potential risk of users	Low potential risk
User account security class behavior	The login behavior is almost normal, and the account risk is low
		User compliance audit class behavior	Risk exists in compliance audit class behaviors
User network attack behavior	Risk of network attack
		User data leakage behavior	Behavior without obvious data leakage
User operation class behavior	Potentially risky instruction operation
		User business operation class behavior	There is a high risk of business operations

TABLE 18

And 104, respectively calculating the similarity between the user to be detected and each overall behavior image according to the portrait tag of the user to be detected, and determining a behavior detection result corresponding to the user to be detected according to the similarity, wherein the behavior detection result is used for identifying whether the user to be detected has abnormal behavior risks.

In this step, after obtaining the portrait tag of the user to be detected, the similarity between the user to be detected and each overall behavioral image is calculated according to the portrait tag of the user to be detected, and specifically, cosine similarity, euclidean distance, manhattan distance, pearson correlation coefficient, etc. may be used, and the calculation results are shown in table 19 below.

TABLE 19

	Class 1	Class 2	Class 3	Class 4	Category 5	Category 6
							test9527	0.852	0.812	0.927	0.892	0.944	0.823

As can be seen from table 19, the user to be detected is most similar to the category 5, so the behavior detection result corresponding to the user to be detected in combination with table 16 is: there is non-compliance access and high risk of service operation, data leakage and account risk are high, and network attack and instruction operation are risky, and in general, the user belongs to an illegal service operation user.

It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present invention.

The embodiment designs a two-layer clustering algorithm, wherein the first-layer clustering algorithm is used for clustering the features extracted from the history log data to obtain 7 kinds of user portrait labels; the second-layer clustering algorithm is used for forming the whole portrait of all users through clustering again on the basis of the portrait labels of the large-class users in the first layer 7. And then, carrying out similarity calculation by utilizing the characteristic data of the user to be detected and each overall behavior portrait, and dividing the overall behavior portrait of the user to be detected according to a similarity calculation result, wherein the overall behavior portrait can be used for describing the overall behavior of the user to be detected. By such a design, this embodiment achieves the following technical effects: 1) Comprehensively describing the overall behavior portraits of the users; 2) The feature extraction is very flexible. The method not only uses the variation coefficient, but also uses the gpt technology to perform auxiliary marking, and simultaneously adopts the large model technology to detect the attack type; 3) The image detection speed is high, and the use is simple. The overall behavior portraits are constructed by adopting an unsupervised clustering algorithm, so that the complexity of the application of a supervised learning algorithm is avoided, and meanwhile, the speed of behavior portrait detection is greatly increased by encapsulating the clustering algorithm into operators; 4) Integrates the expert experience. A large amount of security expert experience is combined in the aspects of feature extraction and label formation, so that the method can be well applied to user behavior portrait detection in the security field. Further, as a specific implementation of the above detection method based on the overall behavior portraits of the user, the embodiment of the application provides a detection device based on the overall behavior portraits of the user, as shown in fig. 3, the device includes: the system comprises a label generation module, an overall portrait generation module and a user behavior detection module.

The tag generation module is used for acquiring a plurality of historical user characteristics and generating different types of user portrait tags based on the historical user characteristics, wherein the different types of user portrait tags comprise user potential risk tags, account security class behavior tags, compliance audit class behavior tags, network attack class behavior tags, data leakage class behavior tags, operation and maintenance operation class behavior tags and business operation class behavior tags;

the overall portrayal generating module is used for processing user portrayal labels of different categories by using a clustering method to obtain an overall user behavior portrayal;

the user behavior detection module is used for extracting the image labels of the users to be detected based on the user image labels of different categories; and respectively calculating the similarity between the user to be detected and each overall behavior image according to the portrait tag of the user to be detected, and determining a behavior detection result corresponding to the user to be detected according to the similarity, wherein the behavior detection result is used for identifying whether the user to be detected has abnormal behavior risks.

In a specific application scenario, the tag generation module is configured to:

respectively determining first historical user characteristics corresponding to the user potential risks, clustering the first historical user characteristics, and determining a user potential risk label according to a clustering result of the first historical user characteristics;

Determining a second historical user characteristic corresponding to login authentication, clustering the second historical user characteristic, and determining an account security behavior label according to a clustering result of the second historical user characteristic;

determining a third historical user characteristic corresponding to the illegal audit behavior, clustering the third historical user characteristic, and determining a legal audit behavior label according to a clustering result of the third historical user characteristic;

and determining a fourth historical user characteristic corresponding to the network access behavior, clustering the fourth historical user characteristic, and determining a network attack behavior label according to a clustering result of the fourth historical user characteristic.

In a specific application scenario, optionally, the tag generation module is configured to:

determining a fifth historical user characteristic corresponding to the data leakage behavior, clustering the fifth historical user characteristic, and determining a data leakage behavior label according to a clustering result of the fifth historical user characteristic;

determining a sixth historical user characteristic corresponding to the operation and maintenance operation behaviors, clustering the sixth historical user characteristic, and determining an operation and maintenance operation type behavior label according to a clustering result of the sixth historical user characteristic, wherein the operation and maintenance operation behaviors comprise system operation behaviors and database operation behaviors;

And determining a seventh historical user characteristic corresponding to the operation behavior of the service system, clustering the seventh historical user characteristic, and determining a service operation type behavior label according to a clustering result of the seventh historical user characteristic.

extracting historical login passwords, and grouping the historical login passwords according to users to obtain login password groups corresponding to each user;

calculating a login password value corresponding to each historical login password according to ASCII codes of characters in each historical login password;

and calculating the password similarity in the login password group corresponding to each user by using an aggregation function according to the login password value to obtain the password similarity characteristic.

extracting historical access records, and grouping the historical access records according to users to obtain access record groups corresponding to each user;

determining a daily number of accesses for each user based on the set of access records;

and determining the access volatility characteristic corresponding to each user by using an aggregation function according to the daily access times.

Acquiring historical attack records, and detecting the corresponding attack type of each historical attack record by using a preset text detection model;

grouping the historical attack records according to the users to obtain an attack record group corresponding to each user;

and determining the corresponding attack frequency characteristics of each user by utilizing an aggregation function based on the attack type corresponding to each attack record in the attack record group.

In a specific application scenario, optionally, the whole portrait generation module is used for:

extracting user portrait tags of different categories, and constructing overall behavior characteristics, wherein the overall behavior characteristics are used for representing the user characteristics of the different categories;

and clustering the overall behavior features, and determining the overall behavior portraits of the user according to the clustering result of the overall behavior features.

It should be noted that, other corresponding descriptions of each functional module related to the detection device based on the overall behavioral image of the user provided in the embodiment of the present application may refer to corresponding descriptions in the above method, which are not described herein again.

Based on the above method, correspondingly, the embodiment of the application also provides a storage medium, on which a computer program is stored, and when the program is executed by a processor, the method for detecting the overall behavior portraits of the user is realized.

Based on such understanding, the technical solution of the present application may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.), and includes several instructions for causing an electronic device (may be a personal computer, a server, or a network device, etc.) to perform the methods described in various implementation scenarios of the present application.

Based on the methods shown in fig. 1 and fig. 2 and the virtual device embodiment shown in fig. 3, in order to achieve the above objects, the embodiments of the present application further provide an apparatus, which may specifically be a personal computer, a server, a network device, etc., where the electronic apparatus includes a storage medium and a processor; a storage medium storing a computer program; a processor for executing a computer program to implement the above-described detection method based on the overall user behavior portraits as shown in fig. 1 and 2.

Optionally, the electronic device may also include a user interface, a network interface, a camera, radio Frequency (RF) circuitry, sensors, audio circuitry, WI-FI modules, and the like. The user interface may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), etc., and the optional user interface may also include a USB interface, a card reader interface, etc. The network interface may optionally include a standard wired interface, a wireless interface (e.g., bluetooth interface, WI-FI interface), etc.

It will be appreciated by those skilled in the art that the structure of the electronic device provided in this embodiment is not limited to the electronic device, and may include more or fewer components, or may be combined with certain components, or may be arranged with different components.

The storage medium may also include an operating system, a network communication module. An operating system is a program that manages and saves electronic device hardware and software resources, supporting the execution of information handling programs, as well as other software and/or programs. The network communication module is used for realizing communication among all the controls in the storage medium and communication with other hardware and software in the entity equipment.

From the above description of the embodiments, it will be apparent to those skilled in the art that the present application may be implemented by means of software plus necessary general hardware platforms, or may be implemented by hardware.

Those skilled in the art will appreciate that the drawings are merely schematic illustrations of one preferred implementation scenario, and that the elements or processes in the drawings are not necessarily required to practice the present application. Those skilled in the art will appreciate that elements of an apparatus in an implementation may be distributed throughout the apparatus in an implementation as described in the implementation, or that corresponding variations may be located in one or more apparatuses other than the present implementation. The units of the implementation scenario may be combined into one unit, or may be further split into a plurality of sub-units.

The foregoing application serial numbers are merely for description, and do not represent advantages or disadvantages of the implementation scenario. The foregoing disclosure is merely a few specific implementations of the present application, but the present application is not limited thereto and any variations that can be considered by a person skilled in the art shall fall within the protection scope of the present application.

Claims

1. A method for detecting an overall behavioral portrayal of a user, the method comprising:

2. The method of claim 1, wherein the generating user portrait tags of different categories based on the historical user characteristics comprises:

determining a first historical user characteristic corresponding to the user potential risk, clustering the first historical user characteristic, and determining a user potential risk label according to a clustering result of the first historical user characteristic;

determining a third historical user characteristic corresponding to the non-compliance audit behavior, clustering the third historical user characteristic, and determining a compliance audit class behavior label according to a clustering result of the third historical user characteristic;

3. The method of claim 2, wherein the generating user portrait tags of different categories based on the historical user characteristics comprises:

4. The method of claim 2, wherein the determining a second historical user characteristic corresponding to login authentication comprises:

5. The method of claim 2, wherein the determining a second historical user characteristic corresponding to login authentication comprises:

determining the daily access times of each user based on the access record group;

6. The method of claim 2, wherein the determining a second historical user characteristic corresponding to a network attack comprises:

grouping the historical attack records according to the users to obtain attack record groups corresponding to each user;

and determining the corresponding attack frequency characteristics of each user by using an aggregation function based on the attack type corresponding to each attack record in the attack record group.

7. The method according to claim 1, wherein the processing the user portrayal labels of different categories by using a clustering method to obtain the overall user behavior portrayal comprises:

extracting user portrait tags of different categories to construct overall behavior characteristics, wherein the overall behavior characteristics are used for representing the user characteristics of different categories;

and clustering the overall behavior features, and determining a plurality of overall behavior portraits of the user according to the clustering result of the overall behavior features.

8. A detection device based on a user's overall behavioral portraits, said device comprising:

9. A storage medium having stored thereon a program or instructions which, when executed by a processor, implement the method of any of claims 1 to 7.

10. An electronic device comprising a storage medium, a processor and a computer program stored on the storage medium and executable on the processor, characterized in that the processor implements the method of any one of claims 1 to 7 when executing the program.