CN115249077A - Business data decision method and device, electronic equipment and storage medium - Google Patents

Business data decision method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN115249077A
CN115249077A CN202110454446.1A CN202110454446A CN115249077A CN 115249077 A CN115249077 A CN 115249077A CN 202110454446 A CN202110454446 A CN 202110454446A CN 115249077 A CN115249077 A CN 115249077A
Authority
CN
China
Prior art keywords
external
data
abnormal
time sequence
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110454446.1A
Other languages
Chinese (zh)
Inventor
孙谦晨
庆祖良
蒋强
耿东山
陈劼
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Group Jiangsu Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Group Jiangsu Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Group Jiangsu Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN202110454446.1A priority Critical patent/CN115249077A/en
Publication of CN115249077A publication Critical patent/CN115249077A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis

Abstract

The invention provides a business data decision method, a device, electronic equipment and a storage medium, comprising the following steps: acquiring service charge data; and inputting the service charge data into a pre-trained multi-stage cascade abnormal index detection model to obtain an abnormal data detection result. According to the method, the traditional machine learning and deep learning models are subjected to cascade detection, whether the current cost is abnormal or not is judged from coarse granularity to fine granularity, and the steps are carried out layer by layer, so that not only can the obvious abnormality be rapidly positioned, but also the slight abnormality can be captured.

Description

Business data decision method and device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a method and an apparatus for business data decision, an electronic device, and a storage medium.
Background
With the continuous development of the data business market, in a plurality of application fields, the business model of the business service client presents continuous development situations of logic precipitation, version dispersion and logic complexity superposition with time. Although the tariff policy for the communication enterprise facing the client is continuously simplified facing a single user dimension, the tariff policy faces the whole hundred million client groups, diversified and continuously overlapped mixed service logics, and the correctness of a service calculation processing result is difficult to accurately verify.
Most of the existing schemes adopt trend comparison of continuous multiple-period results based on result data, and then review the abnormal and large fluctuation of the trend and confirm the correctness of the results. And the large data amount and the complex logic relationship, the trend-based verification method only can pay attention to large-scale and large-amplitude difference change on the data, and small-scale slight difference is easy to ignore. Specifically, there is a method for providing manual analysis and decision-making by providing trend change result data of a plurality of time points of data indexes, and the disadvantages are mainly expressed as:
1 normal data can not be filtered quickly, and tedious detection is carried out on all data, which consumes too much time.
2 the existing data indexes are predicted only by comparing historical data, the conclusion is relatively rough, and the result may have larger deviation or have larger deviation.
3 subtle, small data-level anomalous fluctuations often do not cause significant changes in the resulting data and are very easily ignored.
Disclosure of Invention
The invention provides a business data decision method, a business data decision device, electronic equipment and a storage medium, which are used for overcoming the defects in the prior art.
In a first aspect, the present invention provides a service data decision method, including:
acquiring service charge data;
inputting the service cost data into a pre-trained multi-stage cascade abnormal index detection model to obtain an abnormal data detection result; the multi-stage cascade abnormal index detection model is constructed by sequentially cascading a dynamic abnormal detection model, a first XGBOOST abnormal detection model constructed based on self time sequence information, a non-time sequence characteristic screening function model, a second XGBOOST abnormal prediction model constructed based on external self time sequence information and external non-time sequence characteristics, an LSTM index prediction model constructed based on external self time sequence information and external non-time sequence characteristics and a duty ratio abnormal discrimination model constructed based on business properties on the basis of business cost historical data.
In one embodiment, the dynamic anomaly detection model is obtained by:
determining a preset time window, and intercepting the historical service cost data based on the preset time window to obtain a historical service cost vector with the length of the preset time window;
respectively calculating a first mean value, a first median value, a maximum value and a minimum value of the historical business expense vector;
calculating a dynamic threshold value based on the first mean value, the first median value, the maximum value and the minimum value;
sequentially calculating to obtain absolute differences of all adjacent data in the historical service cost vector;
and constructing and obtaining the dynamic anomaly detection model according to the absolute difference value and the dynamic threshold value.
In one embodiment, the first XGBOOST anomaly detection model is obtained by:
acquiring self time sequence information of the historical service cost vector;
converting the historical service cost vector into a supervision model format based on self time sequence information to obtain characteristic data, and assigning the current-period cost data as a label;
and constructing the first XGBOOST anomaly detection model based on the XGBOOST algorithm by using the feature data and the label.
In one embodiment, the non-temporal feature screening function model is obtained by:
determining external features of the historical service cost data, and converting the external features into external feature vectors;
respectively calculating a second mean value, a second median value and a standard deviation of the external feature vector;
constructing a variable dispersion index based on the second mean value, the second median value and the standard deviation;
and determining a first variable discrete threshold, and constructing and obtaining the non-time sequence feature screening function model based on the variable discrete index and the first variable discrete threshold.
In one embodiment, the second XGBOOST anomaly prediction model constructed based on external self-timing information and external non-timing characteristics is obtained by:
screening the external characteristics based on the service characteristics to respectively obtain user level characteristics, product level characteristics and flow level characteristics;
merging the user-level features, the product-level features, and the traffic-level features into a derived feature vector;
splicing the external feature vector, the derived feature vector and the feature data to form a spliced vector;
constructing, by the stitching vector and the tag, the second XGBOOST anomaly detection model based on the XGBOOST algorithm.
In one embodiment, the LSTM index prediction model constructed based on external self-timing information and external non-timing characteristics is obtained by:
constructing the external features into category-type external features based on consistency of external information time, and constructing the external features into numerical-type external features based on growth rate of the external information time;
and combining the category type external features and the numerical type external features into the derived feature vector, and constructing and obtaining the LSTM index prediction model based on an LSTM algorithm.
In one embodiment, the proportion anomaly discrimination model is obtained by the following steps:
acquiring an actual value and a predicted value of the current service cost, and obtaining an error ratio of the actual value and an error value based on the actual value and the predicted value;
and determining a second variable discrete threshold, and constructing and obtaining the proportion abnormity discrimination model based on the error proportion and the second variable discrete threshold.
In a second aspect, the present invention further provides a service data decision apparatus, including:
the acquisition module is used for acquiring service charge data;
the detection module is used for inputting the service expense data into a pre-trained multi-stage cascade abnormal index detection model to obtain an abnormal data detection result; the multi-stage cascade abnormal index detection model is constructed by sequentially cascading a dynamic abnormal detection model, a first XGBOOST abnormal detection model constructed based on self time sequence information, a non-time sequence characteristic screening function model, a second XGBOOST abnormal prediction model constructed based on external self time sequence information and external non-time sequence characteristics, an LSTM index prediction model constructed based on external self time sequence information and external non-time sequence characteristics and a duty ratio abnormal discrimination model constructed based on business properties on the basis of business cost historical data.
In a third aspect, the present invention also provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the computer program to implement the steps of any one of the service data decision methods described above.
In a fourth aspect, the present invention also provides a non-transitory computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the steps of the business data decision method as described in any one of the above.
According to the business data decision method, the business data decision device, the electronic equipment and the storage medium, whether the current cost is abnormal or not is judged from coarse granularity to fine granularity through cascading detection of the traditional machine learning model and the deep learning model, and the current cost is gradually increased layer by layer, so that the obvious abnormality can be quickly positioned, and the slight abnormality can be captured.
Drawings
In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
FIG. 1 is a flow chart diagram of a business data decision method provided by the present invention;
FIG. 2 is a schematic diagram of a dynamic anomaly detection model building process provided by the present invention;
FIG. 3 is a schematic diagram of a sample construction and model building process provided by the present invention;
FIG. 4 is a schematic structural diagram of a service data decision device provided in the present invention;
fig. 5 is a schematic structural diagram of an electronic device provided in the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Aiming at the problems of low precision and the like in abnormal index detection in the prior art, the business data decision method provided by the invention constructs a multi-stage abnormal index detection model based on the cascade thought, reduces the detection time by eliminating a large number of normal samples, and reduces the feature dimension and screens important features by constructing an external non-time sequence feature screening function, so that the business requirements can be met better.
Fig. 1 is a schematic flow chart of a service data decision method provided by the present invention, as shown in fig. 1, including:
101, acquiring service charge data;
102, inputting the service cost data into a pre-trained multi-stage cascade abnormal index detection model to obtain an abnormal data detection result; the multi-stage cascade abnormal index detection model is constructed by sequentially cascading a dynamic abnormal detection model, a first XGBOOST abnormal detection model constructed based on self time sequence information, a non-time sequence characteristic screening function model, a second XGBOOST abnormal prediction model constructed based on external self time sequence information and external non-time sequence characteristics, an LSTM index prediction model constructed based on external self time sequence information and external non-time sequence characteristics and a duty ratio abnormal discrimination model constructed based on business properties on the basis of business cost historical data.
It should be noted that, the present invention makes a decision for determining abnormal data in the service cost data, and detects by constructing a multi-stage cascade abnormal index detection model, where the model is obtained by cascading a plurality of submodels, and performing detection and decision step by step to train, and specifically includes:
a dynamic anomaly detection model is constructed to quickly screen a large amount of normal data, so that the normal data is free from detection, and the detection time is reduced; a first XGBOOST anomaly detection model constructed based on self time sequence information; a non-time sequence feature screening function model is constructed, feature dimensions can be reduced, and important features are selected; constructing a second XGBOOST anomaly detection model based on external self-timing information and external non-timing characteristics; further constructing an LSTM index prediction model based on external self time sequence information and external non-time sequence characteristics; finally, constructing an occupation ratio abnormity discrimination model based on the service properties; and inputting the service cost data to be detected into the multi-stage cascade abnormal index detection model to obtain a final abnormal data detection result.
According to the method, the traditional machine learning and deep learning models are subjected to cascade detection, whether the current cost is abnormal or not is judged from coarse granularity to fine granularity, and the steps are carried out layer by layer, so that not only can the obvious abnormality be rapidly positioned, but also the slight abnormality can be captured.
Based on the above embodiment, the dynamic anomaly detection model is obtained by the following steps:
determining a preset time window, and intercepting the historical service cost data based on the preset time window to obtain a historical service cost vector with the length of the preset time window;
respectively calculating a first mean value, a first median value, a maximum value and a minimum value of the historical service cost vector;
calculating a dynamic threshold value based on the first mean value, the first median value, the maximum value and the minimum value;
calculating in sequence to obtain absolute difference values of all adjacent data in the historical service cost vector;
and constructing and obtaining the dynamic anomaly detection model according to the absolute difference value and the dynamic threshold value.
Specifically, firstly, preliminary screening is performed, a dynamic anomaly detection model is constructed, based on historical data of service cost, fluctuation of the historical data contains information whether the data is abnormal or not, a value in a reasonable data fluctuation range is normal data, otherwise, the data is an abnormal suspected point, therefore, different time windows are designed through cross validation, the dynamic anomaly detection model is constructed, and the construction process of the model is as follows:
1) Setting a time window k of X k =(x t-1 ,x t-2 ,…x t-k ) Historical traffic cost vector, x, representing lag k-period t Indicating the current service cost;
2) Separately calculate X k Mean value of
Figure BDA0003040061870000071
Median x mid Maximum value x max And the minimum value x min
Figure BDA0003040061870000072
x mid =median{X k }
x max =max{X k }
x min =min{X k }
3) From the above average values
Figure BDA0003040061870000073
Median x mid Maximum value x max And the minimum value x min Calculating a dynamic threshold λ 1
Figure BDA0003040061870000074
4) Calculating the absolute difference abs value =|x t -x t-1 |;
5) Constructing a dynamic anomaly detection model
Figure BDA0003040061870000075
Where 0 indicates data is not present and 1 indicates data is abnormal.
As shown in fig. 2, a flow chart for constructing a dynamic anomaly detection model can eliminate a large number of normal samples and only leave a few anomalies to enter the second stage of evaluation, so that the detection time can be reduced and the detection efficiency can be improved.
According to the invention, a large amount of normal data is rapidly screened by constructing a dynamic anomaly detection model, so that the detection of the large amount of normal data is avoided, and the detection time is reduced.
Based on any embodiment, the first XGBOOST anomaly detection model is obtained through the following steps:
acquiring self time sequence information of the historical service cost vector;
converting the historical service cost vector into a supervision model format based on self time sequence information to obtain characteristic data, and assigning the current cost data as a label;
and constructing the first XGBOOST anomaly detection model based on the XGBOOST algorithm by using the feature data and the label.
Specifically, on the basis of the above embodiment, the invention constructs the first XGBOOST anomaly detection model based on the timing information of the service cost itself. Because the service charge data is time series data and a classification model in the traditional sense cannot be constructed, the format of the data needs to be converted, and the existing service charge data k + 1-stage original data X k+1 =(x t ,x t-1 ,x t-2 ,…x t-k ) The data vector is converted into a supervised model format, as follows:
X=(x t-1 ,x t-2 ,…x t-k )
Y=x t
since Y represents a numerical value, Y needs to be re-assigned, and according to the historical cost index, the abnormal Y is re-labeled as 1, and the normal data is labeled as 0. And (3) reintegrating the cost data format, taking the self data of the previous k periods as characteristics, re-assigning the data of the current period as a label, constructing a supervised learning sample, and selecting an XGB algorithm to construct a classification model _ XGB after the construction of the supervised learning sample is finished.
The sample construction and model construction flow is shown in FIG. 3, and the constructed model _ XGB is superior to the first-stage dynamic anomaly detection model in precision. The model _ XGB predicts the normal cost of 0 and predicts the suspected abnormal cost of 1, the suspected abnormality enters the second stage of evaluation, and the data suspected to be abnormal are further detected, so that the abnormal data generated in the second stage are less, and the efficiency of the model is improved.
According to the invention, a classification model based on self time sequence information and external characteristics is constructed in a machine learning mode, and meanwhile, better classification characteristics are obtained in a characteristic screening and characteristic constructing mode, so that normal data can be accurately removed, and false alarm is prevented.
Based on any of the above embodiments, the non-temporal feature screening function model is obtained by the following steps:
determining external features of the historical service cost data, and converting the external features into external feature vectors;
respectively calculating a second mean value, a second median value and a standard deviation of the external feature vector;
constructing a variable dispersion index based on the second mean value, the second median value and the standard deviation;
and determining a first variable discrete threshold, and constructing and obtaining the non-time sequence feature screening function model based on the variable discrete index and the first variable discrete threshold.
Specifically, since the supervised learning samples are constructed based on the time series data of the cost itself in the foregoing embodiment, considering that the cost data is also influenced by external indexes, when constructing the samples, the indexes need to be included in the features, as shown in table 1:
TABLE 1
Figure BDA0003040061870000091
Figure BDA0003040061870000101
The external information displayed in the table mainly includes user's own information, subscription information, group information, parameter attributes, and the like. Although it is feasible to incorporate all the external variables into the features, too many features lead to the increase of the model building time and the reduction of the performance, so that the external features need to be screened to select effective features to build the model.
The invention starts from the characteristic dispersion, an unsupervised characteristic screening function is constructed, the external characteristic is represented by Z, and Z = (Z) 1 ,z 2 ,…z m ) Representing an extrinsic feature vector, for each feature Z in the vector Z i The following operations are performed:
1) Separately calculate z i Mean value ofNumber of bits and standard deviation
Figure BDA0003040061870000102
z mid 、z sd
2) Constructing a variable dispersion index:
Figure BDA0003040061870000103
3) Setting a first variable discrete threshold lambda 2
4) Constructing a non-time sequence characteristic screening function model:
Figure BDA0003040061870000104
where 0 indicates feature deletion and 1 indicates feature retention.
According to the invention, by constructing the non-time sequence feature screening function model, the feature dimension is reduced, important features are selected, the subsequent model training time is reduced, and the model performance is improved.
Based on any embodiment, the second XGBOOST anomaly prediction model constructed based on external self-timing information and external non-timing characteristics is obtained through the following steps:
screening the external features based on the service features to respectively obtain user level features, product level features and flow level features;
merging the user-level features, the product-level features, and the traffic-level features into a derived feature vector;
splicing the external feature vector, the derived feature vector and the feature data to form a spliced vector;
constructing, by the stitching vector and the tag, the second XGBOOST anomaly detection model based on the XGBOOST algorithm.
Specifically, in the evaluation of the third stage, the invention constructs a second XGBOOST anomaly detection model based on the screened external features and the self-timing information of the service cost.
Here, the filtered appearance features may derive valid features for classification based on business characteristics. The effective characteristics after screening include user age, user gender, user grade, user quantity, product ordering quantity, product category, product subdivision, average product discount rate, average product discount amount, group member number, group gender ratio, total daily flow, total monthly and the like, and the characteristics are more and are not listed. And deriving new service features according to the existing service features. The derivation characteristic idea provided by the invention is as follows:
1. user plane
Because the total cost is the sum of the user cost, the users have difference and commonality, and different users have different contributions to the total cost, the users can be clustered. According to the actual service scene, the invention groups users into three categories: low, medium and high value users; and deriving three derivative variables of the number proportion of each type of users, the cost proportion of each type of users and the average cost proportion of each type according to the number of each type of users and the total cost.
Deriving the number of each type of users as a ratio by the following calculation method:
1) Clustering is carried out according to the user characteristics, and the number of categories is 3;
2) Calculating the user ratio of each column
Figure BDA0003040061870000111
Wherein num i Number of each class, num whole Represents the total data volume, rate i Representing the number of each type of users;
3) Calculating the ratio of the sum of the fees of each type of users
Figure BDA0003040061870000121
Wherein m is i Representing the cost of i class users, m whole Representing the total cost, m _ rate i Representing the proportion of the cost of each type of users;
4) Calculating the ratio of the cost of each type of users to the number of the users, and calculating the average cost ratio of each type of users on the basis, wherein the calculation mode is as follows:
Figure BDA0003040061870000122
Figure BDA0003040061870000123
wherein i =1,2,3.
Features derived therefrom include rate i 、m_rate i 、m_avg i And rate _ avg i And (4) four characteristics.
2. Product layer
The product aspects mainly comprise product ordering quantity, product large category, product subdivision, product discount rate, product preferential amount and the like, and the derivation scheme based on the characteristics is as follows:
1) According to the product major categories, calculating the proportion of the ordering quantity of each major category and the variation coefficient of the ordering quantity of each major category in the following way:
Figure BDA0003040061870000124
wherein num _ c i Is the number of each major product, num _ c whole Is the total amount of product, rate _ c i Is a ratio of each major class. Coefficient of variation of
Figure BDA0003040061870000125
Wherein σ is the standard deviation of the large class order quantity, μ is the standard deviation;
2) In the same way, the order proportion rate _ c _ x under the product subdivision is calculated i And coefficient of variation c _ x;
3) Calculating the ratio of c and c _ x
Figure BDA0003040061870000126
The method is mainly used for the change of reaction products from subdivision to large-class combination;
4) Calculating the product preferential amountAnd (4) ranking the discount rate of the products in the discount rate of all the products in the preferential amount of the partial products, wherein the j th rank is assumed, the ranking of the discount rate of the products in the discount rate of all the products is calculated, and the i is assumed, the ranking is calculated
Figure BDA0003040061870000127
Can reflect the difference between the discount itself and the total discount itself.
The feature derived therefrom includes rate _ c i 、rate z And rate p Three features.
3. Flow level
The flow rate is mainly day flow rate, week flow rate, month flow rate, day flow rate, night flow rate and various software flow rate consumption. The main idea of flow characteristic derivation is to construct statistics and the ratio between the corresponding statistics from the metrics.
Here, taking the daily flow rate as an example, the standard deviation, the mean value, and the coefficient of variation of the daily flow rate are calculated and can be respectively expressed as: sd d ,mean d ,c d (ii) a The same method can calculate the standard deviation, the mean value and the coefficient of variation sd of the circulation flow w ,mean w ,c w (ii) a Standard deviation, mean and coefficient of variation sd of monthly flow m ,mean m ,c m (ii) a Standard deviation, mean and coefficient of variation sd of daytime flow a ,mean a ,c a (ii) a Standard deviation, mean and coefficient of variation sd of night flow n ,mean n ,c n
And respectively carrying out difference on standard deviation, mean value and variation coefficient of daily flow, weekly flow, monthly flow, daytime flow and night flow based on a permutation and combination idea, and calculating the change of the statistic values in different time ranges. Taking the standard deviation of daily flow and weekly flow as an example, calculating the ratio of the difference value and the sum of the daily flow and the weekly flow as an index of the difference between the daily flow and the weekly flow,
Figure BDA0003040061870000131
the remaining indices also construct a difference index in this manner.
In summary, features derived based on the user plane, the product plane, and the traffic plane are combined as feature E.
Then, the screened feature vector Z, the derived feature E and the service cost time sequence feature X are spliced together to form a new feature vector B = (Z =) 1 ,z 2 ,…z j ,e 1 ,e 2 ,…e h ,x t-1 ,x t-2 ,…x t-k ) And constructing a model _ XGB02 based on the XGB algorithm together with the original label Y. The model _ XGB02 predicts the normal cost of 0 and predicts the suspected abnormal cost of 1, the suspected abnormality enters the fourth stage of evaluation, the data suspected to be abnormal are further detected, the abnormal data generated in the fourth stage are further reduced, and the efficiency of the model is improved.
The XGB OST anomaly detection model is constructed based on external self time sequence information and external non-time sequence characteristics, and the detection performance is further improved.
Based on any one of the above embodiments, the LSTM index prediction model constructed based on external self-timing information and external non-timing characteristics is obtained by the following steps:
constructing the external features into category-type external features based on consistency of external information time, and constructing the external features into numerical-type external features based on a growth rate of the external information time;
and combining the category external features and the numerical external features into the derived feature vector, and constructing and obtaining the LSTM index prediction model based on an LSTM algorithm.
Specifically, the invention further constructs an LSTM index prediction model based on the screened external characteristics and the business cost time information in the fourth evaluation stage.
Before constructing the LSTM model, considering the timeliness of external information, considering whether the attributes in the early stage and the current stage change or not, considering the growth rate of the external characteristics in a numerical type manner, considering the consistency of the external characteristics in a category type manner, and constructing the following functions:
and (3) constructing a category type external feature:
Figure BDA0003040061870000141
numerical appearance feature construction:
Figure BDA0003040061870000142
and (4) the constructed new features are incorporated into the feature vector B again, a model _ LSTM model is constructed based on an LSTM algorithm, the current service cost is predicted, and the predicted value enters the next evaluation stage.
According to the invention, the LSTM index prediction model is constructed based on the screened external self time sequence information and external non-time sequence characteristics, so that the prediction value is more accurate.
Based on any one of the above embodiments, the proportion anomaly discrimination model is obtained by the following steps:
acquiring an actual value and a predicted value of the current service cost, and obtaining an error ratio of the actual value and an error value based on the actual value and the predicted value;
and determining a second variable discrete threshold, and constructing and obtaining the proportion abnormity discrimination model based on the error proportion and the second variable discrete threshold.
Specifically, in the last evaluation stage, a threshold is set based on the service information, and an abnormal proportion judgment model is constructed, wherein the construction process of the abnormal proportion judgment model is as follows:
1) Calculating the error ratio of the actual value to the predicted value:
Figure BDA0003040061870000151
2) Setting a discrete threshold lambda of a second variable 3
3) Constructing an abnormal proportion judgment model:
Figure BDA0003040061870000152
the LSTM can well capture the relation between time sequences, the predicted value is accurate, the predicted value can be considered to be a good estimation of the actual value, and when the error ratio of the predicted value and the actual value exceeds a set threshold value, the actual value can be considered to have a problem in calculation.
The invention constructs the proportion abnormality discrimination model based on the service information, and can be closer to the actual service requirement.
The service data decision device provided by the present invention is described below, and the service data decision device described below and the service data decision method described above may be referred to in a corresponding manner.
Fig. 4 is a schematic structural diagram of a service data decision device provided in the present invention, as shown in fig. 4, including: an acquisition module 41 and a detection module 42, wherein:
the obtaining module 41 is configured to obtain service cost data; the detection module 42 is configured to input the service charge data into a pre-trained multi-stage cascade abnormal index detection model to obtain an abnormal data detection result; the multi-stage cascade abnormal index detection model is constructed by sequentially cascading a dynamic abnormal detection model, a first XGBOOST abnormal detection model constructed based on self time sequence information, a non-time sequence characteristic screening function model, a second XGBOOST abnormal prediction model constructed based on external self time sequence information and external non-time sequence characteristics, an LSTM index prediction model constructed based on external self time sequence information and external non-time sequence characteristics and a duty ratio abnormal discrimination model constructed based on business properties on the basis of business cost historical data.
According to the method, the traditional machine learning and deep learning models are subjected to cascade detection, whether the current cost is abnormal or not is judged from coarse granularity to fine granularity, and the steps are carried out layer by layer, so that not only can the obvious abnormality be rapidly positioned, but also the slight abnormality can be captured.
Based on the above embodiment, the detection module 42 includes a first detection submodule 421, where the first detection submodule 421 is specifically configured to:
determining a preset time window, and intercepting the historical service cost data based on the preset time window to obtain a historical service cost vector with the length of the preset time window; respectively calculating a first mean value, a first median value, a maximum value and a minimum value of the historical service cost vector; calculating a dynamic threshold value based on the first mean value, the first median value, the maximum value and the minimum value; calculating in sequence to obtain absolute difference values of all adjacent data in the historical service cost vector; and constructing and obtaining the dynamic abnormity detection model according to the absolute difference value and the dynamic threshold value.
Based on any of the above embodiments, the detection module 42 includes a second detection sub-module 422, and the second detection sub-module 422 is specifically configured to:
acquiring self time sequence information of the historical service cost vector; converting the historical service cost vector into a supervision model format based on self time sequence information to obtain characteristic data, and assigning the current-period cost data as a label; constructing the first XGBOOST anomaly detection model based on the XGBOOST algorithm by the feature data and the label.
Based on any of the above embodiments, the detection module 42 includes a third detection submodule 423, where the third detection submodule 423 is specifically configured to:
determining external features of the historical service cost data, and converting the external features into external feature vectors; respectively calculating a second mean value, a second median value and a standard deviation of the external feature vector; constructing a variable dispersion index based on the second mean value, the second median value and the standard deviation; and determining a first variable discrete threshold, and constructing and obtaining the non-time sequence characteristic screening function model based on the variable dispersion index and the first variable discrete threshold.
Based on any of the above embodiments, the detection module 42 includes a fourth detection submodule 424, where the fourth detection submodule 424 is specifically configured to:
screening the external features based on the service features to respectively obtain user level features, product level features and flow level features; merging the user-level features, the product-level features, and the traffic-level features into a derived feature vector; splicing the external feature vector, the derived feature vector and the feature data to form a spliced vector; constructing, by the stitching vector and the tag, the second XGBOOST anomaly detection model based on the XGBOOST algorithm.
Based on any of the above embodiments, the detection module 42 includes a fifth detection sub-module 425, and the fifth detection sub-module 425 is specifically configured to:
constructing the external features into category-type external features based on consistency of external information time, and constructing the external features into numerical-type external features based on a growth rate of the external information time; and combining the category external features and the numerical external features into the derived feature vector, and constructing and obtaining the LSTM index prediction model based on an LSTM algorithm.
Based on any of the above embodiments, the detection module 42 includes a sixth detection submodule 426, and the sixth detection submodule 426 is specifically configured to:
acquiring an actual value and a predicted value of the current service cost, and obtaining an error ratio of the actual value and an error value based on the actual value and the predicted value; and determining a second variable discrete threshold, and constructing and obtaining the proportion abnormity discrimination model based on the error proportion and the second variable discrete threshold.
Fig. 5 illustrates a physical structure diagram of an electronic device, which may include, as shown in fig. 5: a processor (processor) 510, a communication interface (communication interface) 520, a memory (memory) 530 and a communication bus 540, wherein the processor 510, the communication interface 520 and the memory 530 communicate with each other via the communication bus 540. Processor 510 may invoke logic instructions in memory 530 to perform a business data decision method comprising: acquiring service charge data; inputting the service cost data into a pre-trained multi-stage cascade abnormal index detection model to obtain an abnormal data detection result; the multi-stage cascade abnormal index detection model is constructed by sequentially cascading a dynamic abnormal detection model, a first XGBOOST abnormal detection model constructed based on self time sequence information, a non-time sequence characteristic screening function model, a second XGBOOST abnormal prediction model constructed based on external self time sequence information and external non-time sequence characteristics, an LSTM index prediction model constructed based on external self time sequence information and external non-time sequence characteristics and a duty ratio abnormal discrimination model constructed based on business properties on the basis of business cost historical data.
Furthermore, the logic instructions in the memory 530 may be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
In another aspect, the present invention also provides a computer program product, which includes a computer program stored on a non-transitory computer-readable storage medium, the computer program including program instructions, when the program instructions are executed by a computer, the computer being capable of executing the business data decision method provided by the above methods, the method including: acquiring service charge data; inputting the service cost data into a pre-trained multi-stage cascade abnormal index detection model to obtain an abnormal data detection result; the multi-stage cascade abnormal index detection model is constructed by sequentially cascading a dynamic abnormal detection model, a first XGBOOST abnormal detection model constructed based on self time sequence information, a non-time sequence characteristic screening function model, a second XGBOOST abnormal prediction model constructed based on external self time sequence information and external non-time sequence characteristics, an LSTM index prediction model constructed based on external self time sequence information and external non-time sequence characteristics and a duty ratio abnormal discrimination model constructed based on business properties on the basis of business cost historical data.
In yet another aspect, the present invention also provides a non-transitory computer readable storage medium, on which a computer program is stored, the computer program being implemented by a processor to perform the service data decision method provided above, the method comprising: acquiring service charge data; inputting the service cost data into a pre-trained multi-stage cascade abnormal index detection model to obtain an abnormal data detection result; the multi-stage cascade abnormal index detection model is constructed by sequentially cascading a dynamic abnormal detection model, a first XGBOOST abnormal detection model constructed based on self time sequence information, a non-time sequence characteristic screening function model, a second XGBOOST abnormal prediction model constructed based on external self time sequence information and external non-time sequence characteristics, an LSTM index prediction model constructed based on external self time sequence information and external non-time sequence characteristics and a duty ratio abnormal discrimination model constructed based on business properties on the basis of business cost historical data.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. The business data decision method is characterized by comprising the following steps:
acquiring service charge data;
inputting the service cost data into a pre-trained multi-stage cascade abnormal index detection model to obtain an abnormal data detection result; the multi-stage cascade abnormal index detection model is constructed by sequentially cascading a dynamic abnormal detection model, a first XGB OST abnormal detection model constructed based on self time sequence information, a non-time sequence characteristic screening function model, a second XGB OST abnormal prediction model constructed based on external self time sequence information and external non-time sequence characteristics, an LSTM index prediction model constructed based on external self time sequence information and external non-time sequence characteristics and an occupation ratio abnormal discrimination model constructed based on business properties on the basis of business expense historical data.
2. The business data decision method of claim 1, wherein the dynamic anomaly detection model is obtained by:
determining a preset time window, and intercepting the historical service cost data based on the preset time window to obtain a historical service cost vector with the length of the preset time window;
respectively calculating a first mean value, a first median value, a maximum value and a minimum value of the historical service cost vector;
calculating a dynamic threshold value based on the first mean value, the first median value, the maximum value and the minimum value;
sequentially calculating to obtain absolute differences of all adjacent data in the historical service cost vector;
and constructing and obtaining the dynamic abnormity detection model according to the absolute difference value and the dynamic threshold value.
3. The business data decision method of claim 2, wherein the first XGBOOST anomaly detection model is obtained by:
acquiring self time sequence information of the historical service cost vector;
converting the historical service cost vector into a supervision model format based on self time sequence information to obtain characteristic data, and assigning the current-period cost data as a label;
and constructing the first XGBOOST anomaly detection model based on the XGBOOST algorithm by using the feature data and the label.
4. The traffic data decision method according to claim 3, wherein the non-temporal feature filtering function model is obtained by:
determining external features of the historical service cost data, and converting the external features into external feature vectors;
respectively calculating a second mean value, a second median value and a standard deviation of the external feature vector;
constructing a variable dispersion index based on the second mean value, the second median value and the standard deviation;
and determining a first variable discrete threshold, and constructing and obtaining the non-time sequence feature screening function model based on the variable discrete index and the first variable discrete threshold.
5. The business data decision method of claim 4, wherein the second XGBOOST anomaly prediction model constructed based on the external self-timing information and the external non-timing characteristics is obtained by:
screening the external features based on the service features to respectively obtain user level features, product level features and flow level features;
merging the user-level features, the product-level features, and the traffic-level features into a derived feature vector;
splicing the external feature vector, the derived feature vector and the feature data to form a spliced vector;
constructing, by the stitching vector and the tag, the second XGBOOST anomaly detection model based on the XGBOOST algorithm.
6. The business data decision method according to claim 5, wherein the LSTM index prediction model constructed based on external self-timing information and external non-timing characteristics is obtained by the following steps:
constructing the external features into category-type external features based on consistency of external information time, and constructing the external features into numerical-type external features based on growth rate of the external information time;
and combining the category type external features and the numerical type external features into the derived feature vector, and constructing and obtaining the LSTM index prediction model based on an LSTM algorithm.
7. The business data decision method according to claim 6, wherein the proportion anomaly discrimination model is obtained by:
acquiring an actual value and a predicted value of the current service cost, and obtaining an error ratio of the actual value and an error value based on the actual value and the predicted value;
and determining a second variable discrete threshold, and constructing and obtaining the proportion abnormity discrimination model based on the error proportion and the second variable discrete threshold.
8. A traffic data decision apparatus, comprising:
the acquisition module is used for acquiring service charge data;
the detection module is used for inputting the service expense data into a pre-trained multi-stage cascade abnormal index detection model to obtain an abnormal data detection result; the multi-stage cascade abnormal index detection model is constructed by sequentially cascading a dynamic abnormal detection model, a first XGBOOST abnormal detection model constructed based on self time sequence information, a non-time sequence characteristic screening function model, a second XGBOOST abnormal prediction model constructed based on external self time sequence information and external non-time sequence characteristics, an LSTM index prediction model constructed based on external self time sequence information and external non-time sequence characteristics and a duty ratio abnormal discrimination model constructed based on business properties on the basis of business cost historical data.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the computer program implements the steps of the business data decision method according to any one of claims 1 to 7.
10. A non-transitory computer readable storage medium, having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the steps of the business data decision method according to any one of claims 1 to 7.
CN202110454446.1A 2021-04-26 2021-04-26 Business data decision method and device, electronic equipment and storage medium Pending CN115249077A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110454446.1A CN115249077A (en) 2021-04-26 2021-04-26 Business data decision method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110454446.1A CN115249077A (en) 2021-04-26 2021-04-26 Business data decision method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115249077A true CN115249077A (en) 2022-10-28

Family

ID=83696683

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110454446.1A Pending CN115249077A (en) 2021-04-26 2021-04-26 Business data decision method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115249077A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115511106A (en) * 2022-11-15 2022-12-23 阿里云计算有限公司 Method, device and readable storage medium for generating training data based on time sequence data

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115511106A (en) * 2022-11-15 2022-12-23 阿里云计算有限公司 Method, device and readable storage medium for generating training data based on time sequence data
CN115511106B (en) * 2022-11-15 2023-04-07 阿里云计算有限公司 Method, device and readable storage medium for generating training data based on time sequence data

Similar Documents

Publication Publication Date Title
CN112990972B (en) Recommendation method based on heterogeneous graph neural network
Harrison et al. Investment timing with incomplete information and multiple means of learning
CN110634060A (en) User credit risk assessment method, system, device and storage medium
CN113610331B (en) Cost accounting method and device based on information waterfall and storage medium
CN110619535B (en) Data processing method and device
CN111652661B (en) Mobile phone client user loss early warning processing method
CN108197795B (en) Malicious group account identification method, device, terminal and storage medium
CN113538154A (en) Risk object identification method and device, storage medium and electronic equipment
CN113761359A (en) Data packet recommendation method and device, electronic equipment and storage medium
CN112330441A (en) Method for evaluating business value credit loan of medium and small enterprises
CN115249077A (en) Business data decision method and device, electronic equipment and storage medium
CN110059126B (en) LKJ abnormal value data-based complex correlation network analysis method and system
CN108171570A (en) A kind of data screening method, apparatus and terminal
CN116501979A (en) Information recommendation method, information recommendation device, computer equipment and computer readable storage medium
CN114330720A (en) Knowledge graph construction method and device for cloud computing and storage medium
CN115904920A (en) Test case recommendation method and device, terminal and storage medium
CN109933579B (en) Local K neighbor missing value interpolation system and method
CN111581508A (en) Service monitoring method, device, equipment and storage medium
Chen et al. Research on Audit Simulation of Accounting Computerization Based on Internet Complex Discrete Dynamic Modeling Technology
CN112348583B (en) User preference generation method and generation system
CN110263802B (en) Credit data analysis method based on density clustering and related equipment
CN114238615B (en) Enterprise service result data processing method and system
CN115329192A (en) Feature generation method, information prediction model training method, and device
CN117291409A (en) House multisource information management method, system and storage medium
CN117669692A (en) Fair user modeling method, system, device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination