CN117408734B - Customer information intelligent management system based on Internet of things equipment - Google Patents

Customer information intelligent management system based on Internet of things equipment Download PDF

Info

Publication number
CN117408734B
CN117408734B CN202311723818.1A CN202311723818A CN117408734B CN 117408734 B CN117408734 B CN 117408734B CN 202311723818 A CN202311723818 A CN 202311723818A CN 117408734 B CN117408734 B CN 117408734B
Authority
CN
China
Prior art keywords
client
node
principal component
data
contained
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311723818.1A
Other languages
Chinese (zh)
Other versions
CN117408734A (en
Inventor
刘超
肖智卿
周柏魁
许多
郑淇升
熊慧
梁文聪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Yunbai Technology Co ltd
Original Assignee
Guangdong Yunbai Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Yunbai Technology Co ltd filed Critical Guangdong Yunbai Technology Co ltd
Priority to CN202311723818.1A priority Critical patent/CN117408734B/en
Publication of CN117408734A publication Critical patent/CN117408734A/en
Application granted granted Critical
Publication of CN117408734B publication Critical patent/CN117408734B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2135Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2433Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0631Item recommendations
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16YINFORMATION AND COMMUNICATION TECHNOLOGY SPECIALLY ADAPTED FOR THE INTERNET OF THINGS [IoT]
    • G16Y10/00Economic sectors
    • G16Y10/45Commerce

Abstract

The invention relates to the technical field of data processing, in particular to an intelligent client information management system based on Internet of things equipment, which comprises a memory and a processor, wherein the processor executes a computer program stored in the memory to realize the following steps: acquiring behavior data of at least two clients to form a client data set; obtaining the primary discrete degree of each node in each dimension according to the data discrete distribution condition of the behavior data of the clients in each node in the isolated forest in each dimension; combining the path length of the behavior data of the clients contained in each node in the isolated forest and the data distribution condition to obtain the advanced discrete degree of each node in each dimension; and obtaining a stopping condition to construct an isolated forest of the client behavior, calculating an abnormality score of behavior data of each client, and obtaining a client behavior abnormality detection result according to the abnormality score. The invention can obtain more accurate detection results of abnormal customer behaviors.

Description

Customer information intelligent management system based on Internet of things equipment
Technical Field
The invention relates to the technical field of data processing, in particular to an intelligent client information management system based on Internet of things equipment.
Background
The intelligent management and monitoring of the customer behavior information can be realized by combining the Internet of things equipment with the customer information management system. And furthermore, the behavior data related to the clients are acquired and processed through the Internet of things equipment, and the behavior data of the clients are mined and analyzed through an intelligent algorithm and a data analysis technology, so that enterprises can be helped to better know the client demands, and the client experience and satisfaction are improved, so that the fine operation and personalized service are realized. In the existing method, an isolated forest algorithm is often adopted to detect the abnormality of the customer behavior information so as to know the interests, the preferences and the buying habits of the customers, and personalized recommendation and marketing can be accurately carried out. However, the depth of each isolated tree in the isolated forest algorithm is a fixed value, so that partial isolated trees are over-fitted or under-fitted, and the abnormal detection result of the customer behavior information is relatively inaccurate.
Disclosure of Invention
In order to solve the technical problem that the result of abnormality detection on the client behavior information by the existing method is inaccurate, the invention aims to provide an intelligent client information management system based on Internet of things equipment, and the adopted technical scheme is as follows:
acquiring behavior data of at least two clients to form a client data set, wherein the behavior data comprises data of at least two dimensions;
dividing the behavior data of the clients in the client data set by using an isolated forest algorithm, and obtaining the primary discrete degree of each node in each dimension according to the data discrete distribution condition of the behavior data of the clients in each node in each dimension;
obtaining the advanced discrete degree of each node in each dimension according to the path length and the data distribution condition of the behavior data of the clients contained in each node in the isolated forest and the data discrete distribution condition of the behavior data of the clients in each dimension;
obtaining a stopping condition according to the primary discrete degree and the advanced discrete degree, constructing an isolated forest of the customer behavior, calculating an abnormality score of behavior data of each customer according to the isolated forest, and obtaining a customer behavior abnormality detection result according to the abnormality score.
Preferably, before obtaining the primary degree of dispersion of each node in each dimension according to the data dispersion distribution of the behavior data of the clients contained in each node in each dimension, the method further comprises:
marking any one client as a target client, and performing principal component analysis on behavior data of the target client to obtain principal components of a preset number of target clients;
marking any principal component of a target client as a target principal component, marking any client except the target client as a reference client, respectively acquiring association coefficients between the target principal component of the target client and each principal component of the reference client, and taking the principal component of the reference client corresponding to the maximum value in all association coefficients as a matching principal component of the reference client corresponding to the target principal component of the target client; matching principal components of all clients in the same isolated tree node corresponding to the target principal components of the target client to form a principal component set of the target principal components of the target client; a principal component set is obtained for each principal component for each customer.
Preferably, the obtaining the advanced discrete degree of each node in each dimension according to the path length and the data distribution condition of the behavior data of the client contained in each node in the isolated forest and the data discrete distribution condition of the behavior data of the client in each dimension specifically includes:
any one node is marked as a selected node, and any one principal component of a client contained in the selected node is marked as a selected principal component;
obtaining a discrete characterization value of each client contained in the selected node according to the discrete distribution condition of the data of each client under the selected principal component and the equilibrium condition of the data in the principal component set of the selected principal component;
obtaining a correction coefficient of each client contained in the selected node in each tree according to the path length of each client contained in the selected node in each tree and the number of clients contained in the node where each client is located in each tree;
and obtaining the advanced discrete degree of the selected node under the corresponding dimension of the selected principal component according to the discrete representation value and the correction coefficient.
Preferably, the calculation formula of the high-level discrete degree of the selected node under the corresponding dimension of the selected principal component is specifically:
wherein,represents the higher degree of discretization of the (r) th node in the dimension corresponding to the (k) th principal component,/th node>Representing the number of trees containing the t-th customer,/->Representing the path length of the nth customer contained in the nth node at the mth tree,/>Representing the total depth of the mth tree of the mth client contained in the mth node,/>Representing the total number of customers contained within the node in which the nth customer contained within the nth node is located in the mth tree,representing a discrete representation value of the kth client contained in the kth node under the kth principal component, a->Representing the number of clients contained in the r-th node,/->Representing standard deviation of all data corresponding to the kth client contained in the kth node under the kth principal component,/for the kth client>Representing the mean value of all data corresponding to the kth client contained in the kth node under the kth principal component, +.>Representing the mean value of all data in the principal component set of the kth client under the kth principal component contained in the kth node,/for>Representing the correction coefficients of the nth customer in the mth tree contained in the mth node.
Preferably, the obtaining the primary discrete degree of each node in each dimension according to the data discrete distribution condition of the behavior data of the client in each dimension, which is included in each node, specifically includes:
and obtaining the primary discrete degree of each node under the dimension corresponding to each principal component according to the discrete distribution condition of the data of each client under each principal component and the equilibrium condition of the data in the principal component set of each principal component contained in each node.
Preferably, the calculation formula of the primary discrete degree of each node in the dimension corresponding to each principal component is specifically:
wherein,representing the primary degree of discretization of the (r) th node in the dimension corresponding to the (k) th principal component,/L>Representing the number of clients contained in the r-th node,/->Representing standard deviation of all data corresponding to the kth client contained in the kth node under the kth principal component,/for the kth client>Representing the mean value of all data corresponding to the kth client contained in the kth node under the kth principal component, +.>Representing the average of all data in the principal component set of the kth client under the kth principal component contained in the kth node.
Preferably, the stopping condition specifically includes:
for a first layer of any tree, calculating the product of the contribution degree of each node contained in the first layer under each principal component and the primary discrete degree of each node under the corresponding dimension of the principal component, and carrying out normalization processing on the accumulated sum of all products corresponding to all nodes contained in the first layer to obtain a judging characteristic value of the first layer;
for any layer except the first layer, obtaining a judging characteristic value of the current layer according to the contribution degree corresponding to each principal component of each node contained in the current layer and the high-level discrete degree of each node under the dimension corresponding to each principal component;
and stopping dividing the current tree when judging that the value of the characteristic value is smaller than a preset stopping threshold value.
Preferably, the method for acquiring the judgment feature value of the current layer specifically includes:
and calculating the product of the contribution degree of each node contained in the current layer under each principal component and the advanced discrete degree under the dimension corresponding to the principal component, and carrying out normalization processing on the accumulated sum of all the products of all the nodes contained in the current layer under the dimension corresponding to all the principal components to obtain the judgment characteristic value of the current layer.
Preferably, the calculating the anomaly score of the behavior data of each customer according to the isolated forest specifically includes:
wherein,abnormality score indicating behavior data of nth customer, +.>Representing the coefficient of behavior of the nth client,representing the total number of trees containing the nth customer,/->Representing the path length of the nth customer in the mth tree containing the customer,/for the nth customer>Representing the total depth of the mth tree containing the nth customer,/for>Representing the number of all clients contained per tree,/-for each tree>Representing the total number of clients that the nth client contains within the corresponding node in the mth tree containing the client.
Preferably, the obtaining the abnormal detection result of the client behavior according to the abnormal score specifically includes:
when the abnormality score corresponding to the client is larger than a preset abnormality threshold, the client behavior detection result is abnormal;
and when the abnormality score corresponding to the client is smaller than or equal to a preset abnormality threshold, the client behavior detection result is normal.
The embodiment of the invention has at least the following beneficial effects:
firstly, acquiring data of a client in a plurality of dimensions to obtain behavior data so as to facilitate subsequent analysis of abnormal behavior of the client through the behavior information of the client in the plurality of dimensions; then, using an isolated forest algorithm to divide the client data set as a sample set, analyzing the data discrete distribution condition of the client behavior data contained in each node in each dimension to obtain the primary discrete degree of each node in each dimension, and reflecting the data aggregation and discrete condition of each node in each dimension by using the primary discrete degree; furthermore, the path length, the data distribution condition and the data discrete distribution condition of the behavior data of the clients contained in each node in the isolated forest are combined to obtain the high-level discrete degree, and the discrete condition of the behavior data of the clients in the nodes is more accurately represented by the high-level discrete degree by combining the path length and the data distribution to further calculate the discrete degree in consideration of the condition that the abnormal data is more concentrated and the abnormal path of the nodes possibly occurs in the process of constructing the isolated forest. Finally, according to the construction thought of the isolated forest, if the data in each corresponding node are more aggregated after a certain division is finished, namely the client behavior data information contained in each node is more similar, and the degree of dispersion is lower, the division of the nodes should be stopped. Based on the method, the stopping condition is obtained according to the primary discrete degree and the advanced discrete degree, the isolated forest of the customer behavior can be constructed in a self-adaptive mode, the depth of each tree in the isolated forest is reasonable, the anomaly score is calculated according to the isolated forest, and a more accurate detection result of the anomaly of the customer behavior can be obtained.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions and advantages of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are only some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flowchart of steps of a method executed by a client information intelligent management system based on an internet of things device according to an embodiment of the present invention.
Detailed Description
In order to further explain the technical means and effects adopted by the invention to achieve the preset aim, the following is a detailed description of specific implementation, structure, characteristics and effects of the intelligent management system for client information based on the internet of things equipment according to the invention, which is provided by the invention, with reference to the accompanying drawings and the preferred embodiment. In the following description, different "one embodiment" or "another embodiment" means that the embodiments are not necessarily the same. Furthermore, the particular features, structures, or characteristics of one or more embodiments may be combined in any suitable manner.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
The specific scheme of the client information intelligent management system based on the Internet of things equipment provided by the invention is specifically described below with reference to the accompanying drawings.
The main purpose of the invention is; based on the client behavior habit, behavior data of the client in a plurality of dimensions are obtained, the distribution situation of the client between the data in each dimension is analyzed according to the difference situation of the behavior data of each client in each dimension, and then the depth of each tree is adaptively adjusted by considering the distribution situation of the client behavior data of each tree in the same depth, so that an isolated forest of the client behavior data is adaptively constructed, and the abnormal behavior information of the client is more accurately detected.
The embodiment of the invention provides an intelligent client information management system based on Internet of things equipment, which is used for realizing the steps shown in figure 1, and comprises the following specific steps:
step one, behavior data of at least two clients are obtained to form a client data set, wherein the behavior data comprise data of at least two dimensions.
In order to accurately analyze behavior information of a client, abnormal behavior information of the client is detected to provide personalized recommendation, in this embodiment, behavior information of the client in multiple dimensions is obtained, and meanwhile, the behavior information of the plurality of clients needs to be collected as a sample set to construct an isolated forest.
In particular, behavior data of at least two clients is acquired to constitute a client data set, said behavior data comprising data of at least two dimensions. In this embodiment, the number of clients when collecting data is set to 300, and the implementer can set according to the specific implementation scenario. In this embodiment, 6 dimensions of data are collected as behavior data of each client, where the data of each dimension are respectively an ambient temperature, an ambient humidity, a device usage duration, a browsing amount and a purchase amount, that is, each data corresponds to one dimension, and an implementer may set according to a specific implementation scenario.
Meanwhile, it should be noted that, in this embodiment, behavior data of each client in a week before the current moment is collected, a data collection time interval in each dimension is 1 hour, and an implementer can set according to a specific implementation scenario. Based on the time sequence data sequence corresponding to the data of each dimension, behavior information of the client in different aspects is represented.
And secondly, dividing nodes of the behavior data of the clients in the client data set by using an isolated forest algorithm, and obtaining the primary discrete degree of each node in each dimension according to the data discrete distribution condition of the behavior data of the clients in each dimension.
In a common method for constructing an isolated forest, the depth of each tree is a fixed value, and if the depth value of the isolated tree is set to be too small, the isolated tree is segmented too simply, so that a complex relation between client behavior information cannot be captured. If the depth value of the isolated tree is set too large, noise and randomness of the randomly oversubscribed data set are caused, and the phenomenon of overfitting occurs. Therefore, in this embodiment, the abnormal data is considered to be divided earlier, and under a certain depth of the isolated tree, if the data contained in the nodes in the isolated tree are more concentrated, the continuous division should not be needed, so that the analysis of the discrete degree of the data contained in the nodes of each layer in each tree is important.
Meanwhile, the behavior data of each client contains behavior habit information in at least two different aspects, when analyzing the data contained in each node in each tree, the data information distribution condition of each client under a plurality of dimensions needs to be considered for comprehensive analysis, and then whether each tree is divided into each layer or not is judged, and further, the self-adaptive depth isolation depth of each tree and the accurate abnormal scoring condition are obtained.
Specifically, the isolated forest algorithm is utilized to divide the nodes of the behavior data of the clients in the client data set, and then the discrete degree of the data in each node is researched by analyzing the client data information contained in each node after each division. The dividing process of the isolated tree is similar to the construction process of the binary tree, and it is assumed that 300 samples are arranged at the root node, and after two classification, the tree is divided into 120 samples and 180 samples, namely, the sample data contained in the two divided nodes are respectively 120 samples and 180 samples.
In the higher-dimension data, redundant dimensions or variables possibly exist, the variables possibly have no obvious distinguishing capability in analysis, in the subsequent analysis process, if the characteristics of the data in a plurality of dimensions are extracted respectively, the corresponding weight coefficients cannot be accurately determined, the contribution degree of each main component can be determined after each main component is obtained, the selected characteristics are more credible while the data redundancy is removed, and the efficiency and accuracy of data processing are improved.
Based on the above, principal component analysis is performed on the multidimensional behavior data of each client, so as to obtain a plurality of principal components corresponding to the behavior data of each client, specifically, any client is recorded as a target client, principal component analysis is performed on the behavior data of the target client, and principal components of a preset number of target clients are obtained. In this embodiment, the preset number of values is specifically determined according to the cumulative variance contribution rate of the principal components, that is, the principal components of the first z target clients are acquired, so that the cumulative variance contribution rate of each client may be greater than 85%, where the cumulative variance contribution rate is a known technique in principal component analysis, and will not be described in detail herein.
In this embodiment, the i-th client is described as the target client, and the principal component of the target client may be expressed as,wherein (1)>For the first principal component of the ith customer, < > in->For the second principal component of the ith customer, < > in->The p-th principal component for the i-th client,>z is a preset number, which is the z-th principal component of the i-th client. Meanwhile, the principal component corresponding to the behavior data of the client can be expressed as + ->X is the behavior data of the ith client; />The transpose of the linear combination coefficients representing the behavior data of the ith customer may be obtained from the principal component analysis process for each principal component for each customer.
And marking any principal component of the target client as a target principal component, marking any client except the target client as a reference client, respectively acquiring association coefficients between the target principal component of the target client and each principal component of the reference client, and taking the principal component of the reference client corresponding to the maximum value in all association coefficients as a matching principal component of the reference client corresponding to the target principal component of the target client.
In this embodiment, the p-th principal component of the target client is set as the target principal component, and the j-th client is set as the reference client. Calculating the corresponding characteristic parameters of the ith client under the p-th principal componentCharacteristic parameters corresponding to the jth client under the jth principal component +.>Pearson correlation coefficient between the two, and taking the pearson correlation coefficient as the correlation coefficient of the ith client and the jth client under the p-th principal component. According to the same method, each principal component of the target client and each principal component of the reference client have corresponding association coefficients, and the principal component corresponding to the maximum value is used as a matching principal component of the target principal component, so that the association degree between the target principal component and the matching principal component is represented to be larger.
After principal component analysis is performed on each client, the matching relationship between the obtained multiple principal components of each client is not clear, so that the principal components with the largest correlation between different clients need to be obtained by matching through the degree of correlation between the characteristic parameters. Matching principal components of all clients in the same isolated tree node corresponding to the target principal components of the target client to form a principal component set of the target principal components of the target client; a principal component set is obtained for each principal component for each customer. Further, the principal component set of the target principal components characterizes a set of principal components which are in the same isolated tree node as the target client and have the greatest degree of association between the target principal components of other clients than the target client and the target client.
Because the idea of the isolated forest algorithm is to divide an abnormal sample into single isolated nodes earlier, if data in each corresponding node is more aggregated after the division is finished for a certain time, namely, the client behavior data information contained in each node is more similar, and the degree of dispersion is lower, the division of the nodes should be stopped. Thus, a measure of the degree of discretization of the customer behavior data within each node is required. Based on the data discrete distribution condition of the behavior data of the clients contained in each node in each dimension, the primary discrete degree of each node in each dimension is obtained.
Specifically, according to the discrete distribution condition of data of each client under each principal component and the equilibrium condition of data in the principal component set of each principal component contained in each node, the primary discrete degree of each node under the dimension corresponding to each principal component is obtained.
In this embodiment, taking the kth principal component of the kth node as an example for explanation, the calculation formula of the primary discrete degree of the kth node in the dimension corresponding to the kth principal component may be expressed as:
;
wherein,representing the primary degree of discretization of the (r) th node in the dimension corresponding to the (k) th principal component,/L>Representing the number of clients contained in the r-th node,/->Representing standard deviation of all data corresponding to the kth client contained in the kth node under the kth principal component,/for the kth client>Representing the mean value of all data corresponding to the kth client contained in the kth node under the kth principal component, +.>Representing the average of all data in the principal component set of the kth client under the kth principal component contained in the kth node.
The multiple principal components of each customer respectively represent the linear combination of the behavior data of the customers under different dimensions, and the linear combinations of the principal components matched with different customers are similar, so that the degree of dispersion of the same node is measured by utilizing the difference condition between the same principal components of different customers.
Reflecting the data balance of the t-th client under the kth principal component, +.>The balance condition of all data of other clients except the t-th client in the same node under the same principal component is reflected, and the smaller the difference between the balance condition and the balance condition is, the more similar among the data of different clients in the same node under the same principal component is indicated, and further the data in the node is characterized to be gathered under the dimension corresponding to the k-th principal component, and the smaller the corresponding primary discrete degree value is.
The fluctuation condition of the data of the t-th client under the kth principal component is reflected, and the smaller the value is, the smaller the fluctuation of the data is, and the smaller the value is for the corresponding primary discrete degree. The larger the value is, the larger the fluctuation of the data is, and the larger the corresponding primary discrete degree value is. The primary discrete degree characterizes the discrete distribution condition of data of each node in each isolated tree under the corresponding dimension of each principal component.
And thirdly, obtaining the advanced discrete degree of each node in each dimension according to the path length and the data distribution condition of the behavior data of the clients contained in each node in the isolated forest and the data discrete distribution condition of the behavior data of the clients in each dimension.
In the process of constructing the isolated forest, each isolated tree is selected by randomly sampling all the customer behavior data, and then a plurality of abnormal data exist in samples contained in root nodes of part of the isolated trees, for example, the construction of a single isolated tree may select a plurality of customers with abnormal behavior information, for example, the purchase amount or browsing amount of the customers contained in a certain node in the isolated tree is extremely high or extremely low, and if only the discrete distribution condition of the customer behavior information contained in a single node is considered, the path length of a plurality of abnormal data may be excessively long, and the path length of a part of normal data is relatively short, so that the accuracy of the abnormal detection result of the data in the isolated forest may be affected.
Based on the above, the distribution condition of the behavior information of each client in different isolated trees is considered, and certain correction operation is performed on the basis of the primary discrete degree. The method comprises the steps of obtaining the advanced discrete degree of each node in each dimension according to the path length and the data distribution condition of the behavior data of the clients contained in each node in the isolated forest and the data discrete distribution condition of the behavior data of the clients in each dimension.
Specifically, any one node is marked as a selected node, any one client contained in the selected node is marked as a selected client, any one principal component of the selected client is marked as a selected principal component, and a discrete characterization value of each client contained in the selected node is obtained according to the discrete distribution condition of data of each client contained in the selected node under the selected principal component and the equilibrium condition of data in a principal component set of the selected principal component; obtaining a correction coefficient of each client contained in the selected node in each tree according to the path length of each client contained in the selected node in each tree and the number of clients contained in the node where each client is located in each tree; and obtaining the advanced discrete degree of the selected node under the corresponding dimension of the selected principal component according to the discrete representation value and the correction coefficient.
In this embodiment, taking the r node as the selected node, taking the t client included in the r node as the selected client, and taking the k principal component as the selected principal component, the calculation formula of the high-level discrete degree of the selected node in the dimension corresponding to the selected principal component, that is, in the dimension corresponding to the k principal component, can be expressed as:
;
;
wherein,represents the higher degree of discretization of the (r) th node in the dimension corresponding to the (k) th principal component,/th node>Representing the number of trees containing the t-th customer,/->Representing the path length of the nth customer contained in the nth node at the mth tree,/>Representing the total depth of the mth tree of the mth client contained in the mth node,/>Representing the total number of customers contained within the node in which the nth customer contained within the nth node is located in the mth tree,representing a discrete representation value of the kth client contained in the kth node under the kth principal component, a->Representing the number of clients contained in the r-th node,/->Representing standard deviation of all data corresponding to the kth client contained in the kth node under the kth principal component,/for the kth client>Representing the mean value of all data corresponding to the kth client contained in the kth node under the kth principal component, +.>Representing the mean value of all data in the principal component set of the kth client under the kth principal component contained in the kth node,/for>Representing the correction coefficients of the nth customer in the mth tree contained in the mth node.
When analyzing the discrete data condition in each node, the discrete characterization value needs to be corrected by considering the distribution condition of each client in the node in other trees at the same time, in consideration of the condition that a plurality of abnormal data may appear in the same tree. If the path length of the t-th client in other trees is shorter, i.e. the t-th client has been divided, or the total number of clients in the node of each tree is smaller, i.e. the t-th client isThe smaller the value of the (c) is, the higher the abnormal probability of the current node in the current tree is, and the larger the corresponding value of the modified discrete degree is, namely the higher the high-level discrete degree is.
The discrete condition of the nth customer under the kth principal component contained in the nth node is reflected, and the larger the value is, the larger the value of the corresponding high-level discrete degree is. The high-level discrete degree of the nth node under the dimension corresponding to the kth principal component combines the data distribution situation in the dividing process of the isolated tree, corrects the discrete situation of the client behavior information distribution contained in each node, and more accurately reflects the data discrete distribution situation of the nodes under different dimensions.
And step four, obtaining a stopping condition according to the primary discrete degree and the advanced discrete degree, constructing an isolated forest of the client behavior, calculating an abnormality score of behavior data of each client according to the isolated forest, and obtaining a client behavior abnormality detection result according to the abnormality score.
It should be noted that, in the process of constructing an isolated forest in this embodiment, the average value of all data under each principal component of the behavior data of each client is used as the feature of each dimension of the behavior data of each client to construct an isolated forest, and meanwhile, when each isolated tree is divided by each sampling, the number of clients sampled each time is 80% of the total number of all clients, that is, the number of training samples of each isolated tree. The sampling frequency, namely the value range of the number of the isolated trees, is set as [100, N ], and N is the number of clients when data are acquired. After the root node of each isolated tree is divided for the first time, discrete analysis is carried out on each tree respectively, and when the stopping condition is met, the depth of the isolated tree is not expanded any more, namely the isolated tree stops being divided at the current layer.
The primary discrete degree and the advanced discrete degree both reflect the data discrete condition of each node of each tree in the isolated forest under different dimensionalities, so that the data discrete condition of all nodes contained in each layer of each tree is synthesized, the discrete degrees of a plurality of main components corresponding to each client are weighted and averaged, and the synthesized data discrete condition of the current layer is acquired to judge whether the isolated forest stops dividing at the current layer.
Because the first layer of each isolated tree is divided only once, whether the data in the nodes are distributed and gathered is not required to be considered, and the path length of the nodes is influenced, the distribution condition of the client behavior information in different isolated trees is not required to be considered, namely, certain correction operation is not required to be carried out on the basis of primary discrete degree.
Based on this, a stop condition is obtained from the primary discrete degree and the advanced discrete degree. Specifically, for a first layer of any tree, calculating the product of the contribution degree of each node contained in the first layer under each principal component and the primary discrete degree of each node under the corresponding dimension of the principal component, and normalizing the accumulated sum of all the products corresponding to all the nodes contained in the first layer to obtain the judgment characteristic value of the first layer.
In this embodiment, taking the primary discrete degree of the u-th node of the first layer of any tree under the corresponding dimension of the k-th principal component as an example for explanation, the calculation formula of the judgment feature value of the first layer of the current tree may be expressed as:
;
wherein,judging characteristic value representing the first layer of the current tree,/->Representing the number of nodes comprised in the first level of the current tree,/->Representing the primary degree of discretization of the (u) th node contained in the first layer of the current tree in the dimension corresponding to the (k) th principal component,/the (u)>Representing the mean value of the contribution degree of all clients contained in the u-th node contained in the first layer of the current tree under the k-th principal component, and z represents the number of principal components.
The judging characteristic value of the first layer characterizes the degree of the isolation tree to be divided into the current layer to be continuously divided, namely, the aggregation degree of data distribution in the current layer node is reflected by analyzing the discrete conditions of the current layer data in multiple dimensions. And the average value of the contribution degrees of a plurality of clients under each principal component is used as a weight coefficient to weight the discrete degree of the data, so that the principal component with higher average contribution degree has higher weight coefficient of the discrete degree. The larger the discrete degree is, the less similar the client data information in the current layer is, so that the division is needed to be continued, and the larger the corresponding judgment characteristic value is. The smaller the discrete degree is, the more similar the client data information in the current layer is, the further the division should be stopped, and the smaller the value of the corresponding judgment characteristic value is.
Further, when analyzing the discrete condition of the node data contained in other layers except the first layer in each isolated tree, the distribution condition of the client behavior information contained in the nodes in different trees needs to be considered, and the discrete degree adopted when the judgment characteristic value is calculated is the advanced discrete degree, and the calculation method is the same as that of the judgment characteristic value of the first layer.
Specifically, for any layer except the first layer, according to the contribution degree corresponding to each principal component of each node contained in the current layer and the advanced discrete degree of each node under the dimension corresponding to each principal component, the judgment characteristic value of the current layer is obtained.
In this embodiment, taking the x-th layer except the first layer in the current tree as an example for explanation, the calculation formula of the judgment feature value of the x-th layer of the current tree may be expressed as follows:
;
wherein,judging characteristic value of x layer of current tree, < ->Represents the number of nodes contained in the x-th layer of the current tree,/->Representing the high level of discretization of the (u) th node contained in the (x) th layer of the current tree in the dimension corresponding to the (k) th principal component>Representing the mean value of the contribution degree of all clients contained in the ith node contained in the xth layer of the current tree under the kth principal component, and z represents the number of principal components.
The judgment characteristic value of the x layer reflects whether the isolation tree is continued after being divided into the current x layer, and the greater the discrete degree of data in the nodes contained in the x layer is, the greater the corresponding judgment characteristic value is, and the more the isolation tree is required to be continuously divided. The smaller the degree of dispersion of the intra-node data contained in the x-th layer is, the smaller the corresponding judgment characteristic value is, and the more the division is not required to be continued.
Based on the above, when the value of the feature value is judged to be smaller than the preset stop threshold, the current tree is stopped from being divided. In the present embodiment, the value of the stop threshold of the current tree is set toThe implementer may make settings according to the specific implementation scenario.
Finally, according to the steps, an isolated forest which is adaptive to the depth of the client behavior can be constructed, and then the abnormal score of the behavior data of each client is calculated according to the isolated forest, and the client behavior abnormal detection result is obtained according to the abnormal score.
Since the depth of each isolated tree is adaptively determined by the discrete degree of each node, the depth of each isolated tree formed by a plurality of normal data may be short, and this situation may cause inaccurate calculation results of abnormal scores of common data. In this embodiment, the degree of abnormality of the customer behavior information is quantified in combination with the number of nodes divided in the isolated tree, considering that the path length in the original algorithm is replaced with the relative path length of each customer sample in the isolated tree.
In this embodiment, taking the behavior data of any one client as an example for explanation, the calculation formula of the anomaly score of the behavior data of the client may be expressed as:
;
;
wherein,represents the nthAbnormality score of behavior data of customer, +.>Representing the coefficient of behavior of the nth client,representing the total number of trees containing the nth customer,/->Representing the path length of the nth customer in the mth tree containing the customer,/for the nth customer>Representing the total depth of the mth tree containing the nth customer,/for>Representing the number of all clients contained per tree,/-for each tree>Representing the total number of clients that the nth client contains within the corresponding node in the mth tree containing the client.
Reflecting the path length of the nth customer in the isolated tree, the larger the value is, the longer the average relative path of the nth customer in the isolated tree is,/>The number of samples contained in the node of the nth customer in the isolated tree is reflected, and the number of customers contained in the node corresponding to the behavior data of the customer is larger, the abnormal score of the nth customer is lower, and the value of the corresponding abnormal score is smaller.
When the abnormality score corresponding to the client is larger than a preset abnormality threshold, the behavior data of the client is abnormal data, and the corresponding client behavior detection result is abnormal; when the abnormality score corresponding to the client is smaller than or equal to a preset abnormality threshold, the behavior data of the client is normal data, and the corresponding client behavior detection result is normal. In this embodiment, the value of the anomaly threshold is set to 0.8, and the implementer can set according to the specific implementation scenario.
The above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the scope of the embodiments of the present application, and are intended to be included within the scope of the present application.

Claims (5)

1. The intelligent customer information management system based on the Internet of things equipment comprises a memory and a processor, and is characterized in that the processor executes a computer program stored in the memory to realize the following steps:
acquiring behavior data of at least two clients to form a client data set, wherein the behavior data comprises data of at least two dimensions;
dividing the behavior data of the clients in the client data set by using an isolated forest algorithm, and obtaining the primary discrete degree of each node in each dimension according to the data discrete distribution condition of the behavior data of the clients in each node in each dimension;
obtaining the advanced discrete degree of each node in each dimension according to the path length and the data distribution condition of the behavior data of the clients contained in each node in the isolated forest and the data discrete distribution condition of the behavior data of the clients in each dimension;
obtaining a stopping condition according to the primary discrete degree and the advanced discrete degree, constructing an isolated forest of the client behavior, calculating an abnormal score of behavior data of each client according to the isolated forest, and obtaining a client behavior abnormal detection result according to the abnormal score;
before obtaining the primary discrete degree of each node in each dimension according to the data discrete distribution condition of the behavior data of the clients contained in each node in each dimension, the method further comprises the following steps:
marking any one client as a target client, and performing principal component analysis on behavior data of the target client to obtain principal components of a preset number of target clients;
marking any principal component of a target client as a target principal component, marking any client except the target client as a reference client, respectively acquiring association coefficients between the target principal component of the target client and each principal component of the reference client, and taking the principal component of the reference client corresponding to the maximum value in all association coefficients as a matching principal component of the reference client corresponding to the target principal component of the target client; matching principal components of all clients in the same isolated tree node corresponding to the target principal components of the target client to form a principal component set of the target principal components of the target client; obtaining a principal component set of each principal component of each customer;
obtaining the advanced discrete degree of each node in each dimension according to the path length and the data distribution condition of the behavior data of the clients contained in each node in the isolated forest and the data discrete distribution condition of the behavior data of the clients in each dimension, wherein the advanced discrete degree comprises the following steps:
any one node is marked as a selected node, and any one principal component of a client contained in the selected node is marked as a selected principal component;
obtaining a discrete characterization value of each client contained in the selected node according to the discrete distribution condition of the data of each client under the selected principal component and the equilibrium condition of the data in the principal component set of the selected principal component;
obtaining a correction coefficient of each client contained in the selected node in each tree according to the path length of each client contained in the selected node in each tree and the number of clients contained in the node where each client is located in each tree;
obtaining the advanced discrete degree of the selected node under the corresponding dimension of the selected principal component according to the discrete characterization value and the correction coefficient;
the calculation formula of the high-level discrete degree of the selected node under the corresponding dimension of the selected principal component is specifically as follows:
wherein,represents the higher degree of discretization of the (r) th node in the dimension corresponding to the (k) th principal component,/th node>Representing the number of trees containing the t-th customer,/->Representing the path length of the nth customer contained in the nth node at the mth tree,/>Representing the total depth of the mth tree of the mth client contained in the mth node,/>Representing the total number of customers contained in the node in which the nth customer contained in the nth node is located in the mth tree, +.>Representing a discrete representation value of the kth client contained in the kth node under the kth principal component, a->Representing the number of clients contained in the r-th node,/->Representing standard deviation of all data corresponding to the kth client contained in the kth node under the kth principal component,/for the kth client>Representing the mean value of all data corresponding to the kth client contained in the kth node under the kth principal component, +.>Representing the mean value of all data in the principal component set of the kth client under the kth principal component contained in the kth node,/for>Representing the correction coefficient of the nth customer in the mth tree contained in the mth node;
the stop conditions specifically include:
for a first layer of any tree, calculating the product of the contribution degree of each node contained in the first layer under each principal component and the primary discrete degree of each node under the corresponding dimension of the principal component, and carrying out normalization processing on the accumulated sum of all products corresponding to all nodes contained in the first layer to obtain a judging characteristic value of the first layer;
for any layer except the first layer, obtaining a judging characteristic value of the current layer according to the contribution degree corresponding to each principal component of each node contained in the current layer and the high-level discrete degree of each node under the dimension corresponding to each principal component;
when the value of the characteristic value is judged to be smaller than a preset stopping threshold value, stopping dividing the current tree;
the method for acquiring the judging characteristic value of the current layer specifically comprises the following steps:
and calculating the product of the contribution degree of each node contained in the current layer under each principal component and the advanced discrete degree under the dimension corresponding to the principal component, and carrying out normalization processing on the accumulated sum of all the products of all the nodes contained in the current layer under the dimension corresponding to all the principal components to obtain the judgment characteristic value of the current layer.
2. The intelligent management system for client information based on the internet of things equipment according to claim 1, wherein the obtaining the primary discrete degree of each node in each dimension according to the data discrete distribution condition of the behavior data of the client in each dimension, specifically includes:
and obtaining the primary discrete degree of each node under the dimension corresponding to each principal component according to the discrete distribution condition of the data of each client under each principal component and the equilibrium condition of the data in the principal component set of each principal component contained in each node.
3. The intelligent management system for client information based on the internet of things equipment according to claim 2, wherein the calculation formula of the primary discrete degree of each node in the dimension corresponding to each principal component is specifically:
wherein,representing the primary degree of discretization of the (r) th node in the dimension corresponding to the (k) th principal component,/L>Representing the number of clients contained in the r-th node,/->Representing standard deviation of all data corresponding to the kth client contained in the kth node under the kth principal component,/for the kth client>Representing the mean value of all data corresponding to the kth client contained in the kth node under the kth principal component, +.>Representing the average of all data in the principal component set of the kth client under the kth principal component contained in the kth node.
4. The intelligent management system for client information based on the internet of things equipment according to claim 1, wherein the calculating the anomaly score of the behavior data of each client according to the isolated forest specifically comprises:
wherein,abnormality score indicating behavior data of nth customer, +.>Representing the behavior coefficient of the nth client, +.>Representing the total number of trees containing the nth customer,/->Representing the path length of the nth customer in the mth tree containing the customer,/for the nth customer>Representing the total depth of the mth tree containing the nth customer,/for>Representing the number of all clients contained in each tree,representing the total number of clients that the nth client contains within the corresponding node in the mth tree containing the client.
5. The intelligent client information management system based on the internet of things equipment according to claim 1, wherein the obtaining the client behavior anomaly detection result according to the anomaly score specifically comprises:
when the abnormality score corresponding to the client is larger than a preset abnormality threshold, the client behavior detection result is abnormal;
and when the abnormality score corresponding to the client is smaller than or equal to a preset abnormality threshold, the client behavior detection result is normal.
CN202311723818.1A 2023-12-15 2023-12-15 Customer information intelligent management system based on Internet of things equipment Active CN117408734B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311723818.1A CN117408734B (en) 2023-12-15 2023-12-15 Customer information intelligent management system based on Internet of things equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311723818.1A CN117408734B (en) 2023-12-15 2023-12-15 Customer information intelligent management system based on Internet of things equipment

Publications (2)

Publication Number Publication Date
CN117408734A CN117408734A (en) 2024-01-16
CN117408734B true CN117408734B (en) 2024-03-19

Family

ID=89494826

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311723818.1A Active CN117408734B (en) 2023-12-15 2023-12-15 Customer information intelligent management system based on Internet of things equipment

Country Status (1)

Country Link
CN (1) CN117408734B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020019403A1 (en) * 2018-07-26 2020-01-30 平安科技(深圳)有限公司 Electricity consumption abnormality detection method, apparatus and device, and readable storage medium
CN111784392A (en) * 2020-06-29 2020-10-16 中国平安财产保险股份有限公司 Abnormal user group detection method, device and equipment based on isolated forest
CN111798312A (en) * 2019-08-02 2020-10-20 深圳索信达数据技术有限公司 Financial transaction system abnormity identification method based on isolated forest algorithm
CN113961434A (en) * 2021-09-29 2022-01-21 西安交通大学 Method and system for monitoring abnormal behaviors of distributed block chain system users
CN116011894A (en) * 2023-03-28 2023-04-25 河北长发铝业股份有限公司 Aluminum alloy rod production data management system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020019403A1 (en) * 2018-07-26 2020-01-30 平安科技(深圳)有限公司 Electricity consumption abnormality detection method, apparatus and device, and readable storage medium
CN111798312A (en) * 2019-08-02 2020-10-20 深圳索信达数据技术有限公司 Financial transaction system abnormity identification method based on isolated forest algorithm
CN111784392A (en) * 2020-06-29 2020-10-16 中国平安财产保险股份有限公司 Abnormal user group detection method, device and equipment based on isolated forest
CN113961434A (en) * 2021-09-29 2022-01-21 西安交通大学 Method and system for monitoring abnormal behaviors of distributed block chain system users
CN116011894A (en) * 2023-03-28 2023-04-25 河北长发铝业股份有限公司 Aluminum alloy rod production data management system

Also Published As

Publication number Publication date
CN117408734A (en) 2024-01-16

Similar Documents

Publication Publication Date Title
CN111475680A (en) Method, device, equipment and storage medium for detecting abnormal high-density subgraph
Cevoli et al. Classification of Pecorino cheeses using electronic nose combined with artificial neural network and comparison with GC–MS analysis of volatile compounds
CN111833172A (en) Consumption credit fraud detection method and system based on isolated forest
CN110381079B (en) Method for detecting network log abnormity by combining GRU and SVDD
CN117241306B (en) Real-time monitoring method for abnormal flow data of 4G network
US20190087248A1 (en) Anomaly detection and automated analysis using weighted directed graphs
CN112437053B (en) Intrusion detection method and device
CN116684878B (en) 5G information transmission data safety monitoring system
CN114897109A (en) Mower abnormity monitoring and early warning method
CN113125903A (en) Line loss anomaly detection method, device, equipment and computer-readable storage medium
CN115081331A (en) Wind turbine generator running state abnormity detection method based on state parameter reconstruction error
CN110956331A (en) Method, system and device for predicting operation state of digital factory
CN114429238A (en) Wind turbine generator fault early warning method based on space-time feature extraction
Xiao et al. Predicting fruit maturity stage dynamically based on fuzzy recognition and color feature
CN117408734B (en) Customer information intelligent management system based on Internet of things equipment
CN113515678A (en) Abnormal data screening method
CN114756420A (en) Fault prediction method and related device
CN110472188A (en) A kind of abnormal patterns detection method of facing sensing data
CN113657726B (en) Personnel risk analysis method based on random forest
Choi et al. Comparison of various statistical methods for detecting disease outbreaks
CN115659271A (en) Sensor abnormality detection method, model training method, system, device, and medium
CN115630708A (en) Model updating method and device, electronic equipment, storage medium and product
Gupta Feature selection and analysis for standard machine learning classification of audio beehive samples
CN112926633A (en) Abnormal energy consumption detection method, device, equipment and storage medium
CN110619366A (en) Neural network-based fungus MALDI-TOF mass spectrum data identification method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant