CN109977151B

CN109977151B - Data analysis method and system

Info

Publication number: CN109977151B
Application number: CN201910245141.2A
Authority: CN
Inventors: 张帆; 路明奎
Original assignee: Nine Chapter Yunji Technology Co Ltd Beijing
Current assignee: Nine Chapter Yunji Technology Co Ltd Beijing
Priority date: 2019-03-28
Filing date: 2019-03-28
Publication date: 2020-02-07
Anticipated expiration: 2039-03-28
Also published as: CN109977151A

Abstract

The invention provides a data analysis method and system, and relates to the field of data analysis. The data analysis method comprises the following steps: acquiring basic characteristics based on service data and/or service scenes to be analyzed, and determining a characteristic derivation mode; deriving the basic characteristics according to the determined characteristic derivation mode to obtain derived characteristics; and creating a business model according to the derived features to execute analysis processing operation. By the scheme, the conformity of the service model with the service scene and the service requirement can be improved, and the accuracy of data analysis is improved.

Description

Data analysis method and system

Technical Field

The present invention relates to the field of data analysis, and in particular, to a data analysis method and system.

Background

In order to improve the accuracy of a data analysis result, a conventional data analysis system needs to train a special service model for different service scenes and requirements, but the existing service model has a low degree of conformity with the service scenes and the service requirements, and the accuracy of data analysis cannot be improved.

Disclosure of Invention

The embodiment of the invention provides a data analysis method and a data analysis system, which aim to solve the problems that the existing business model is not high in fitting degree with a business scene and business requirements, and the accuracy of data analysis cannot be improved.

In order to solve the above technical problem, an embodiment of the present invention provides a data analysis method, including:

acquiring basic characteristics based on service data and/or service scenes to be analyzed, and determining a characteristic derivation mode;

deriving the basic characteristics according to the determined characteristic derivation mode to obtain derived characteristics;

and creating a business model according to the derived features to execute analysis processing operation.

Specifically, the characteristic derivation mode comprises at least one of the following: business objective-based derivation, deep learning-based derivation, feature combination-based derivation, time-variant-based derivation, decision tree model-based derivation, and numerical transformation-based derivation.

Optionally, the deriving the basic features according to the determined feature derivation manner to obtain derived features includes:

acquiring service experience data corresponding to a service scene based on the service scene and a service target;

and according to the service experience data, constructing features related to the service targets through the basic features to obtain derived features.

and performing deep learning on the basic features through at least one of a sparse self-coding algorithm, a factorization machine algorithm and a deep cross neural network algorithm to obtain derived features.

Further, when the basic features are deeply learned through the sparse self-coding algorithm, the obtaining of the derived features includes:

inputting the basic features into a sparse self-coding neural network, acquiring a vector consisting of activation values of all units of a hidden layer of the sparse self-coding neural network, and taking the vector as a derived feature.

Further, when the basic features are deeply learned through the factorization machine algorithm, the obtaining of the derivative features includes:

and acquiring cross features introduced by operating the factorization machine algorithm model, and determining the cross features as derivative features.

Further, when the basic features are deeply learned through the deep cross neural network algorithm, the obtaining of the derived features includes:

inputting the basic features into a deep cross neural network, obtaining a first input result and a second input result through the cross network and the deep network respectively, and combining the first input result and the second input result to obtain final derivative features.

performing feature combination on the basic features to obtain derived features;

wherein, the mode of the characteristic combination comprises the following steps: at least one of a polynomial manner, a mathematical operation manner, and an aggregation function manner.

Further, when the manner of the feature combination includes a manner of using a polynomial, the deriving the derived feature includes:

and generating a new feature matrix according to the basic features and preset degrees, and taking each component element in the feature matrix as a derivative feature.

Further, when the manner of combining the features includes a manner of using a mathematical operation, the obtaining of the derived features includes:

calculating the basic characteristics by using a data operation rule to obtain derivative characteristics;

wherein, the data operation rule comprises: at least one of an addition, a subtraction, a multiplication, and a division.

Further, when the manner of the feature combination includes a manner of employing an aggregation function, the obtaining of the derived feature includes:

applying the aggregation function to the continuous variable and the discrete variable to generate derivative features;

wherein the aggregation function comprises: the method comprises the following steps of counting number, mean value, summation, minimum value, maximum value, standard deviation, median, mode, average time between continuous events, different values of the type variable, percentage of the value to be a preset value, skewness and kurtosis.

Further, the applying the aggregation function to the continuous variable and the discrete variable to generate the derived feature includes at least one of the following ways:

for the numerical variable, adopting a first preset aggregation function to aggregate the main key to obtain a derivative characteristic;

for the category type variable, a second preset aggregation function is used for aggregating the main key to obtain a derivative characteristic;

wherein the first preset aggregation function at least comprises: counting a number function, an average function, a summation function, a minimum function and a maximum function;

the second preset aggregation function at least comprises: a statistical number function, a mean function, and a summation function.

and constructing a characteristic derivative function, and deriving the basic characteristics by combining the determined characteristic derivative mode to obtain derivative characteristics.

Optionally, the deriving the basic features according to the determined feature derivation manner to obtain derived features, including at least one of:

deriving a multilayer depth variable for one basic feature based on the incidence relation among the basic features to obtain derived features;

and generating a derivative feature by utilizing a depth feature synthesis mode based on the interest indexes of the basic features.

acquiring a timestamp variable in the basic features;

and generating a derivative feature according to the timestamp variable.

Further, the generating of the derived features from the timestamp variables comprises at least one of:

extracting different time dimensions of the timestamp variables to obtain derivative characteristics;

acquiring a sliding window derivative variable aiming at the timestamp variable, and dividing the sliding window derivative variable by an index value of the current time to obtain a ratio derivative characteristic;

acquiring a sliding window derivative variable aiming at the timestamp variable, and constructing a statistical class characteristic based on the sliding window derivative variable to obtain a statistical class derivative characteristic based on a sliding window;

acquiring a sliding window derivative variable aiming at the timestamp variable, constructing statistical class characteristics based on the sliding window derivative variable to obtain each statistical class derivative characteristic based on a sliding window, and dividing each statistical class derivative characteristic by an index value of the current time to obtain a derivative characteristic;

obtaining a difference characteristic based on a difference value between the current time and a first moment, and obtaining a sliding window derived variable aiming at the difference characteristic, wherein the current time is later than the first moment;

the method for acquiring the sliding window derived variable aiming at the timestamp variable comprises the following steps:

based on the current time in the timestamp variables, sliding according to a preset time window, and generating n sliding window derivative variables relative to the current time, wherein n is the length of the preset time window.

Optionally, the deriving the features according to the basic features to obtain derived features includes:

constructing a gradient lifting decision tree model aiming at the basic characteristics according to a business target for data analysis processing;

and combining the basic features based on the gradient lifting decision tree model to obtain corresponding derivative features.

converting the class type features in the basic features into numerical type features, and taking the numerical type features as derivative features;

wherein the categorical features include: the category value range is less than or equal to the preset value.

Further, the manner of converting the class-type feature into a numerical-type feature includes: at least one of a serial number encoding, a one-hot encoding, a binary encoding, and a contrast encoding.

Optionally, the creating a business model according to the derived features to perform analysis processing operations includes:

performing characteristic screening on the derived characteristics to obtain screened target characteristics;

and creating a business model according to the target characteristics so as to run the business model to execute analysis processing operation.

An embodiment of the present invention provides a data analysis system, including:

the determining module is used for acquiring basic characteristics based on the service data and/or the service scene to be analyzed and determining a characteristic derivation mode;

the acquisition module is used for deriving the basic features according to the determined feature derivation mode to obtain derived features;

and the execution module is used for creating a business model according to the derived characteristics so as to execute analysis processing operation.

Optionally, the obtaining module includes:

the system comprises a first acquisition unit, a second acquisition unit and a third acquisition unit, wherein the first acquisition unit is used for acquiring service experience data corresponding to a service scene based on the service scene and a service target;

and the second acquisition unit is used for constructing the characteristics related to the service target through the basic characteristics according to the service experience data to obtain derived characteristics.

Optionally, the obtaining module is configured to:

Further, when the obtaining module performs deep learning on the basic features through a sparse self-coding algorithm, the obtaining module is configured to:

Further, when the obtaining module performs deep learning on the basic features through a factorization machine algorithm, the obtaining module is configured to:

Further, when the obtaining module performs deep learning on the basic features through a deep cross neural network algorithm, the obtaining module is configured to:

Optionally, the obtaining module is configured to:

Further, when the manner of the feature combination includes a manner of adopting a polynomial, the obtaining module is configured to:

Further, when the manner of the feature combination includes a manner of using mathematical operations, the obtaining module is configured to:

Further, when the manner of the feature combination includes a manner of adopting an aggregation function, the obtaining module includes:

the third acquisition unit is used for applying the aggregation function to the continuous variable and the discrete variable to generate derivative characteristics;

Further, the third obtaining unit is configured to implement at least one of the following manners:

Optionally, the obtaining module is configured to:

Optionally, the obtaining module is configured to implement at least one of the following manners:

Optionally, the obtaining module includes:

a fourth obtaining unit, configured to obtain a timestamp variable in the basic feature;

and the generating unit is used for generating a derivative characteristic according to the timestamp variable.

Further, the generating unit is configured to implement at least one of the following manners:

Optionally, the obtaining module includes:

the construction unit is used for constructing a gradient lifting decision tree model aiming at the basic characteristics according to a business target for data analysis processing;

and the fifth obtaining unit is used for combining the basic features based on the gradient lifting decision tree model to obtain corresponding derivative features.

Optionally, the obtaining module is configured to:

Optionally, the execution module includes:

the screening unit is used for carrying out characteristic screening on the derived characteristics to obtain screened target characteristics;

and the execution unit is used for creating a business model according to the target characteristics so as to run the business model to execute analysis processing operation.

The embodiment of the invention provides a data analysis system, which comprises a memory, a processor and a computer program, wherein the computer program is stored on the memory and can run on the processor; wherein the processor implements the steps in the data analysis method described above when executing the computer program.

An embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the steps in the data analysis method described above.

The invention has the beneficial effects that:

according to the scheme, the basic characteristics are obtained based on the service data and/or the service scene to be analyzed, the characteristic derivative mode is determined, the basic characteristics are derived according to the determined characteristic derivative mode to obtain the derivative characteristics, then the service model is created based on the derivative characteristics to execute analysis processing operation, the degree of fit of the service model with the service scene and service requirements can be improved, and the accuracy of data analysis is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive exercise.

FIG. 1 is a flow chart illustrating a method for data analysis according to an embodiment of the present invention;

FIG. 2 shows a model schematic of a deep crossover neural network;

FIG. 3 is a diagram of a gradient boosting decision tree for a click prediction problem;

fig. 4 is a block diagram of a data analysis system according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1, fig. 1 is a flowchart of a data analysis method according to an embodiment of the present invention, the data analysis method is applied to a data analysis system, and includes:

step 11, acquiring basic characteristics based on service data and/or service scenes to be analyzed, and determining a characteristic derivation mode;

step 12, deriving the basic characteristics according to the determined characteristic derivation mode to obtain derived characteristics;

and step 13, creating a business model according to the derived features so as to execute analysis processing operation.

The derived features refer to new features obtained by feature learning using raw data. The derivation characteristics are generally due to two reasons: the change of the data causes a plurality of original characteristics to appear in the data; when feature learning is performed, an algorithm generates derived features according to a certain relationship among the features, and sometimes the derived features can reflect the relationship among the data features better.

The feature derivation can be implemented based on a preset module (common analysis module) created in the APS (workflow-based data analysis system), for example, a feature derivation module is preset, and when the business model is trained by using the workflow, the feature derivation in the feature engineering can be implemented based on the feature derivation module to generate more strongly correlated features.

The existing model training process generally comprises data processing → feature engineering → algorithm selection → parameter tuning → algorithm evaluation ", wherein the feature engineering comprises feature derivation and feature screening, and how to obtain the derived features is specifically described in the following from the perspective of different feature derivation modes.

It should be noted that the characteristic derivation manner includes at least one of the following: business objective-based derivation, deep learning-based derivation, feature combination-based derivation, time-variant-based derivation, decision tree model-based derivation, and numerical transformation-based derivation.

Method based on business target derivation

Specifically, in this case, the specific implementation manner of step 12 is:

It should be noted that the feature related to the service object refers to a feature strongly related to the service object, and the strongly related feature refers to a feature having a correlation degree with the service object reaching a preset value.

It should be noted that, in this way, generally, for a specific field and a specific service, a feature strongly related to a service target of data analysis processing is constructed by means of experience of the related field, so as to significantly improve a service model effect (accuracy).

Specifically, business features are quantified based on business experience data to influence the final result, i.e., for the business target problem to be solved, and it is determined which new features (derived features) that are strongly correlated with the problem to be solved, help solve the problem targeted by the business target based on the basic features.

Relationship of basic features to derived features: the derived features are custom features generated for solving the problem of the business target, namely the features derived according to the business meaning; the basic feature is the already existing data information and does not require excessive processing.

The business target for data analysis processing can be customer churn prediction, bank abnormal transaction early warning, credit default prediction, financial product intelligent recommendation, insurance claim amount prediction, credit card overdue customer prediction and the like.

1. Feature derivation based on customer churn prediction

Bank customer churn prediction is performed to find the churn probability of customers who are likely to churn. Relationships with customers are continuously maintained by focusing on the needs of customers with high churn probabilities, thereby saving customers to improve their churn problems. Wherein the concerned customer group is the high-property customer, and the property comprises deposit, financing and the like.

Related derivative features affecting customer churn are generally required to be constructed according to domain knowledge, namely, derivative features are customized based on empirical data and target problems, derivative models are not required to be constructed, and final derivative features are obtained, for example, the final derivative features can be: the amount of the financial products due in the next 3 months, the accumulated purchasing periodic times, the accumulated purchasing periodic amount, the accumulated loan times, the accumulated loan amount, the accumulated purchasing financial times, the accumulated purchasing financial amount, the latest 3 months of purchasing financial products, the latest 3 months of purchasing periodic products and the latest 3 months of purchasing periodic products; the customer transaction behaviors comprise the number of transactions in the current month, the transaction amount in the last time, the transaction amount in the last 3 months, the transaction amount in the last 4 th month to 6 th month, the transaction amount in the last 7 th month to 9 th month, the transaction amount in the last 10 th month to 12 th month, the average transaction amount in the last 3 months, the average transaction amount in the last 4 th month to 6 th months, the average transaction amount in the last 7 th month to 9 th months, the average transaction amount in the last 10 th month to 12 th months, the transaction amount in the last time, the transaction times in the last 3 months of online banking, the transaction amount in the last 3 months of online banking, the transaction times in the last 3 months of mobile banking, the transaction amount in the last 3 months of mobile banking, the transaction times in the last 3 months of third party, the transaction amount in the last 3, The transaction amount of the last 6 months online banking transaction, the transaction times of the last 6 months mobile banking transaction, the transaction amount of the last 6 months mobile banking transaction, the transaction times of the last 6 months third party transaction, the transaction amount of the last 6 months third party transaction, the transaction times of the last year online banking transaction, the transaction amount of the last year online banking transaction, the transaction times of the last year mobile banking transaction, the transaction amount of the last year mobile banking transaction, the transaction times of the last year third party transaction, the transaction amount of the last year third party transaction, and the like

2. Feature derivation based on bank abnormal transaction early warning

For example, in different scenarios, based on the context of the transaction (if a rule is set, if it is determined that all account numbers meeting the requirements of opening an account and canceling an account and transferring more than 10W in the past 24-hour real-time transaction are illegal account numbers, and it is no sense to see a transaction, and it is necessary to observe the transaction flow within 24 hours based on the account numbers, which is the context of the transaction), different rules are required to determine whether the transaction is abnormal, for example, in the peak time period of the transaction such as twenty-one, the system automatically adjusts the rule, increases the determination threshold, and has a certain transaction frequency and amount, and in the regular time period, the transaction is determined to be abnormal, and in the peak time period of the transaction, the transaction is not determined to be abnormal. In which rules are automatically adjusted through machine learning, e.g., based on dynamic changes of the data set, how many transactions belong to an anomaly within one minute, and the determination rules are different at different time periods.

3. Feature derivation based on credit default prediction

The default credit scenario is based on the customer's application information, previous application information at the current lending institution, previous application information at other lending institutions, installment information, credit card transaction information, cash loan transaction information, which can be considered as basic characteristics determined based on the default credit scenario, to predict whether the customer will default to the current loan.

For example, in this case, the derived features derived from the above-mentioned basic features by the determined feature derivation means are: CREDIT _ increment _ policy (CREDIT amount to customer INCOME ratio), private _ increment _ policy (monthly installment amount to customer INCOME ratio), CREDIT _ TERM (monthly installment amount to CREDIT amount ratio), day _ applied _ policy (number of customer DAYS worked to customer age ratio), increment _ PER _ policy (average INCOME amount of members of a family PER month), APP _ CREDIT _ PERC (application amount to actual CREDIT amount ratio), paymer _ PERC (actual payment amount to amount due in the period of the customer)

4. Credit card overdue customer prediction

In this scenario, the basic features include, for example, identity features, cross-border transaction statistics, consumption stability, etc., and the way to calculate and derive the above basic features is as follows:

example 1, the calculation and derivation logic to derive the "marriage" or not "derivation features in the identity feature class is as follows:

the marital state is judged based on the consumption of the cardholder in the maternal and infant class and the consumption in the marriage and love industry, the marital state is classified into marriage and possible marriage, and the judgment is as follows:

married: the consumption number of the children's clothes and the maternal and infant merchants in the consumption year is more than 5 times; the coaching mechanisms of primary schools and children consume too much; the waste is consumed in the business places of middle school and children;

possibly marrying: wedding agents, wedding websites, wedding dress, wedding celebration, diamond merchant have consumed, and the single stroke is more than 1000 yuan.

Example 2, the calculation and derivation logic to obtain the derived features of "transaction currency count over nearly one month" in the cross-border transaction statistic class is as follows:

the type of foreign currency (currency other than RMB) traded by the card holder, and the number of types of foreign currency.

Example 3, the calculation and derivation logic to derive the derived features for "longest trade days interval" in the consumption stability class is as follows:

after all transactions with the single sum of more than 50 and the monthly transaction number of more than 2 are filtered, the longest days between two consumptions of the cardholder in about 12 months are counted.

The calculation and derivation logic for the characteristics of the "most common transaction channel type" in the consumption stability class is as follows: the most common transaction initiation methods are computer, mobile phone, POS machine, ATM, etc.

Deep learning derivation based mode

Specifically, in this case, the specific implementation manner of step 12 is:

Since different algorithms are used in different implementations, the following describes a specific implementation of step 12 from the perspective of different algorithms.

1. Deep learning of basic features through sparse self-coding algorithm

In this case, the specific implementation manner of step 12 is:

It should be noted that, this method means that, when a new feature x is input into the trained sparse self-coding neural network, a vector a formed by the activation values of the units of the hidden layer can represent x (because x can be recovered according to a based on sparse self-coding), that is, a is a feature value of x under the new feature (i.e., the derived feature).

2. Deep learning of basic features by factoring machine algorithm

In this case, the specific implementation manner of step 12 is:

It should be noted that the factorizer solves the problem of how to combine features in the case of sparse data. Taking advertisement click prediction as an example, in the actual process, because some category type variable characteristic dimensions (country/date/advertisement type) are very large, if One-Hot coding is adopted, the problem of dimension explosion is easily caused. The factorization machine introduces cross terms on the basis of an LR (logistic regression) model, can efficiently learn the correlation among features, and has good expression capability on sparse data.

Specifically, the algorithm is shown in formula one.

Formula I,

Wherein y is a predicted business objective; w₀The error is the weighted value error; w_iThe weight value of the ith characteristic; x_iIs the ith characteristic; n is the number of features;<V_i,V_j>are cross term weight values.

3. Deep learning of base features by Deep Cross (Deep Cross) neural network algorithm

In this case, the specific implementation manner of step 12 is:

This method is to start with an Embedding Layer (Embedding Layer) and a Stacking Layer (Stacking Layer), then to start with a Cross Network (Cross Network) and a parallel Deep Network (Deep Network), and finally to start with a Combination Output Layer (Combination Output Layer) that combines the outputs of both networks. The idea of the combined features of DCN (deep crossover neural network) is embodied in a crossover network, and as can be seen from fig. 2, the feature structure of the crossover network causes the degree of crossover features to increase with increasing depth. However, certain parameters of the cross-network limit the capacity of the model, and in order to capture highly nonlinear interactions, the model introduces a parallel deep network.

Thirdly, based on the characteristic combination derivation mode

Specifically, in this case, the specific implementation manner of step 12 is:

The specific implementation of step 12 is different depending on the different feature combinations, and the specific implementation of step 12 will be described below in terms of different feature combinations.

1. The manner of combining features includes the manner of using polynomials

In this case, the specific implementation manner of step 12 is:

It should be noted that the process of constructing the polynomial feature is usually to generate a new feature matrix according to a specified degree (i.e. the above-mentioned predetermined degree, for example, 2 degrees), and the feature matrix is composed of all polynomial combinations of the feature. For example, if the input samples are two-dimensional and are in the form of [ a, b ], then the 2-degree polynomial feature matrix is [1, a, b, a ^2, b ^2, ab ]. Features where ab is thus formed from a combination of features are called interactive features (interactions) because they can capture the interaction between variables. In some scenarios, the individual features themselves may not have a great influence on the target variable, but the interactive features formed by the individual features may have strong correlation with the target variable.

Pairwise combinations of first-order discrete features: in order to improve the fitting capability of complex relationships, first-order discrete features are often combined pairwise in feature engineering to form high-order combined features. Taking the advertisement click prediction question as an example, as shown in table 1 below, the original data has two discrete features of language (chinese/english) and genre (movie/tv series), and in order to improve the fitting capability, the language and genre may constitute a second-order feature, for example, "language ═ chinese type ═ movie" may be used as a single feature. However, under the recommendation problem, the number of users and the number of articles can reach the order of tens of millions, dimension explosion can be generated when the user ID and the article ID are combined in pairs, and users and articles need to be represented by k-dimensional low-dimensional vectors respectively and then combined.

TABLE 1 derived features in advertisement click prediction questions

2. The means for combining features includes means for using mathematical operations

In this case, the specific implementation manner of step 12 is:

specifically, the data operation rule includes: at least one of an addition, a subtraction, a multiplication, and a division.

This approach refers to mathematical operations on the basic features using addition, subtraction, multiplication, and division to obtain new derived features.

3. The manner of combining features includes employing aggregation functions

In this case, the specific implementation manner of step 12 is:

wherein the aggregation function comprises: at least one of a statistical number, a mean, a sum, a minimum, a maximum, a standard deviation, a median, a mode, an average time between consecutive events, a number of different values of the categorical variable, a percentage of a preset value (e.g., a percentage of True), a skewness, and a kurtosis.

In general, using an aggregation function to quickly generate a large number of derived features for continuous variables and discrete variables based on primary keys (e.g., customer ID, product ID, etc.) is a common method of feature engineering; the primary key is an identifier that uniquely identifies a row of data in a data set or a combination of fields that can uniquely represent each record in a data table.

a11, polymerizing the main key by adopting a first preset polymerization function aiming at the numerical variable to obtain a derivative characteristic;

specifically, the first preset aggregation function at least includes: a statistical number function, a mean function, a summation function, a minimum function, and a maximum function.

A12, for the category type variable, using a second preset aggregation function to aggregate the main key to obtain a derivative characteristic;

specifically, the second preset aggregation function at least includes: a statistical number function, a mean function, and a summation function.

Fourthly, a mode of derivation based on time variable

Specifically, in this case, the specific implementation manner of step 12 is:

acquiring a timestamp variable in the basic features;

and generating a derivative feature according to the timestamp variable.

It should be noted that in the time series problem, due to the sparsity of the feature quantity of the data itself, a large number of time-class derived features need to be constructed according to the time series variables, so as to improve the expression capability of the model, introduce nonlinearity, and enhance the fitting capability. Taking a single-index anomaly detection scene as an example, the original data only has four variables, namely an index name, a timestamp, an index value and a label (whether the original data is abnormal or not), and a large number of derivative features need to be constructed on the basis of the timestamp.

b11, extracting different time dimensions of the timestamp variables to obtain derivative features;

for example, the derived features obtained are: year, month, day, hour, minute, second, day of the week, day of the year, week of the year, whether weekend, etc.

B12, acquiring a sliding window derivative variable aiming at the timestamp variable, and dividing the sliding window derivative variable by an index value of the current time to obtain a ratio derivative characteristic;

it should be noted that, the manner of obtaining the sliding window derived variable for the timestamp variable is as follows:

It should be noted that the number of derived variables of the finally obtained sliding window is related to the timing unit; for example, if the length of the preset time window is 2 minutes, the current time is slid forward according to the length of the time window of 2 minutes, and 2 sliding window derived features for the timestamp variable are obtained; for example, if the length of the preset time window is 120 seconds, the current time is slid forward according to the length of the time window of 120 seconds, and 120 sliding window derivative features for the timestamp variable are obtained.

B13, acquiring a sliding window derivative variable aiming at the timestamp variable, and constructing a statistical class characteristic based on the sliding window derivative variable to obtain a statistical class derivative characteristic based on a sliding window;

it should be noted that the ways of performing statistics on the features include, but are not limited to: mean, median, standard deviation, sum, maximum minus minimum, skewness, kurtosis, exponentially weighted moving average mean, exponentially weighted moving average variance, number greater than mean, number less than mean, location where maximum occurs for the first time, location where minimum occurs for the first time, whether there is a duplicate for maximum, whether there is a duplicate for minimum, maximum continuous length greater than mean, maximum continuous length less than mean, mean of adjacent value absolute errors, mean of adjacent value errors, whether variance is greater than standard deviation, sum of squares, sum of adjacent value absolute errors, and the like.

B14, acquiring a sliding window derivative variable aiming at the timestamp variable, constructing statistical class features based on the sliding window derivative variable to obtain each statistical class derivative feature based on a sliding window, and dividing each statistical class derivative feature by an index value of the current time to obtain a derivative feature;

b15, obtaining a difference feature based on the difference between the current time and the first moment, and obtaining a sliding window derived variable aiming at the difference feature;

it should be noted that, the current time is later than the first time, that is, the first time is a time before the current time, and the sliding window derivative variable for the differential feature is determined as the sliding window derivative feature for the differential feature;

specifically, the manner of obtaining the sliding window derived variable for the differential feature is as follows:

and according to the current time in the difference characteristics, sliding according to a preset time window to generate m sliding window derivative variables relative to the current time, wherein m is the length of the preset time window. It should be noted that the obtaining manner of the sliding window derived variable for the difference feature is similar to the obtaining manner of the sliding window derived variable for the timestamp variable, and is not described herein again.

Fifth, mode of derivation based on decision tree model

Specifically, in this case, the specific implementation manner of step 12 is:

It should be noted that in practical problems, many high-dimensional features often need to be faced, and if simply combining two features, problems such as excessive parameters and overfitting easily exist, so that feature combination is often found in a way of gradient boosting decision trees. Taking the click prediction problem as an example, as shown in fig. 3, it is assumed that the original input features contain information about 4 aspects of age, gender, user type and item type, and a decision tree is constructed according to the original input and the tags. Thus, each path from the root node to the leaf node can be viewed as a combination of features, for example, as can be seen from fig. 3, where "user type" is paid "and" age "is 40" is a combined feature (i.e., derived feature), and if both conditions are satisfied in the original input features of the user, the variable is marked as 1, otherwise, the variable is 0.

Sixthly, deriving mode based on numerical value conversion

Specifically, in this case, the specific implementation manner of step 12 is:

It should be noted that the category type features mainly refer to features that only limited options are taken for value, such as gender (male/female), blood type (a/B/AB/O), etc., for example, when the category feature refers to the nature, the original input of the category type features with the range of value less than or equal to 2 is usually in the form of a character string, a few models except a decision tree, etc., can directly process the input in the form of a character string, and for the models such as logistic regression, support vector machine, etc., the category type features must be processed and converted into numerical type features to correctly work.

Specifically, the manner of converting the class feature into a numerical feature includes: at least one of a serial number encoding, a one-hot encoding, a binary encoding, and a contrast encoding.

Sequence number Encoding (Ordinal Encoding) is commonly used to process data having a size relationship between classes. For example, body weight can be divided into three categories, lower, standard, higher, and there is a ranking relationship of "higher > standard > lower". The serial number code assigns a numerical ID to the class type feature according to a magnitude relationship, such as higher by 3, standard by 2, and lower by 1.

One-hot Encoding (One-hot Encoding) is commonly used to handle features that do not have a size relationship between classes. For example, the Chinese zodiac, which has 12 values in total, can be changed into a 12-dimensional vector by the unique heat code, and the Chinese zodiac mouse is represented as (1,0,0,0,0,0,0,0,0,0, 0). However, for the case that the category has more values, the use of the one-hot coding needs to pay attention to the use of the sparse vector to save space and reduce dimensionality in cooperation with feature selection.

Binary Encoding (Binary Encoding) is divided into two steps, each category is given a category ID by using serial number Encoding, and then Binary Encoding corresponding to the category ID is taken as a result. Taking three blood types of lower blood type, standard blood type and higher blood type as examples, the lower ID is 1, and the binary system is 001; the standard ID is 2, and the binary expression is 010; the upper ID is 3 and the binary representation is 100. Binary coding essentially utilizes binary to carry out Hash mapping on ID, finally 0/1 feature vectors are obtained, and dimension is less than that of single hot coding, thus saving storage space.

For example, the numerical characteristics obtained by converting the blood type information by the above three encoding methods are shown in table 2.

Table 2 blood type information numerical characteristic comparison table converted by three coding modes

It is further noted that the Helmert contrast code compares each level of the classification variable with the average of the subsequent levels. Thus, the first comparison compares the mean of the dependent variables of level 1 with the mean of all subsequent levels (level 2, level 3.), the second comparison compares the mean of the dependent variables of level 2 with the mean of all subsequent levels (level 3, level 4.), and the third comparison compares the mean of the dependent variables of level 3 with the mean of all subsequent levels (level 4.).

Further, it should be noted that, the implementation manner of step 12 is as follows:

It should be noted here that, in this implementation, the above-mentioned feature derivation method is used, and a feature derivation function is combined to derive the basic feature, so as to obtain a derived feature; the characteristic derivation function is a different concept from the first preset aggregation function and the second preset aggregation function mentioned above.

It should be noted that, the automatic feature engineering adopted in this way can generate deep-level derivative features by setting some basic parameters.

Optionally, it should be further noted that the implementation manner of step 12 includes at least one of the following manners:

c11, deriving a multilayer depth variable for one basic feature based on the incidence relation among the basic features to obtain derived features;

and C12, generating a derivative feature by utilizing a depth feature synthesis mode based on the interest indexes of the basic features.

It should be noted that the implementation of C11 and C12 combines several of the above-mentioned feature derivation methods to obtain the depth-derived features.

For example, when customer churn in a financial scenario is predicted, several methods of the seven feature derivation methods can be selected to combine into a derivation strategy to generate related derived features based on the service characteristics of customer churn prediction. The service characteristics comprise service characteristics and attribute characteristics related to a scene, such as time, transaction behaviors (transaction data, times and time), product behaviors, product holding information, customer circle relationship depiction, RFM attributes and the like. The number of derivative methods selected is determined based on the scene and the data.

For example, feature derivation based on a business target for data analysis processing is adopted based on a derivation strategy, and after the business target is acquired, relevant derived features such as the average transaction amount in the last march, the number of days between the latest transaction time and the current day are constructed. Generating high-order and interactive features by adopting a feature combination mode for some features strongly related to a predicted target based on a derivative strategy; and selecting a specified aggregation function (such as count/mean/sum) to generate a large number of derivative features according to the mode of applying the aggregation function to the logarithm type and the classification type variables (namely the mode of semi-automatic feature engineering). And calculating the days before two time points by adopting a characteristic derivation mode based on the timestamp variable for the time-type variable based on a derivation strategy, and generating statistical derivative characteristics such as kurtosis, skewness and the like by a sliding window. And discretizing the categorical variables to generate the features by adopting a feature coding mode, such as a one-hot coding or binary coding method, for the categorical variables such as the zodiac signs, the occupation, the marital states and the like based on the derivative strategy.

The feature derivation modes can be adaptively combined according to service scenes/service data to generate more derived features from different dimensions, dig more feature information, introduce nonlinearity, and enhance the expression capability of the model (the training effect of the model, such as accuracy).

Further, a specific implementation manner of step 13 in the embodiment of the present invention is as follows:

It should be noted that, the method for screening the derived features to obtain the screened target features may adopt one or more of the following implementation manners:

mode A, Filter — correlation coefficient method, using which the correlation coefficient of each feature to the target value and the P value of the correlation coefficient are calculated first;

mode B, Filter-variance selection method, using which the variance of each feature is calculated first, and then according to a threshold, the feature whose variance is greater than the threshold is selected;

mode C, Filter-chi-square test, classical chi-square test is to test the correlation of qualitative independent variables to qualitative dependent variables. Assuming that the independent variable has N values and the dependent variable has M values, constructing statistics by considering the difference between the observed value of the sample frequency number of the independent variable equal to i and the dependent variable equal to j and the expectation

The method D, Wapper-recursive feature elimination method, wherein the recursive feature elimination method uses a base model to perform multiple rounds of training, after each round of training, the features of a plurality of weight coefficients are eliminated, and then the next round of training is performed based on a new feature set;

mode E, Ebedded-feature selection method based on penalty term, the feature selection process and the learner training process are integrated into one, and both are completed in the same optimization process, i.e. feature selection is automatically performed in the learner training process. Typical algorithms are ridge regression (ridge regression), LASSO regression (LeastAbsource Shrinkage and selection Operator), and the like. And (3) using a basic model with a punishment item, and reducing the dimension in addition to screening out the features. The principle of L1 penalty term dimensionality reduction is to retain one of a number of features that have equal relevance to the target value, so unselected features do not represent insignificant. Therefore, optimization can be combined with the L2 penalty term;

mode F, Embedded-a tree model-based feature selection method, which mainly selects features based on GBDT, random forest and the like in a tree model as a base model;

mode H, PCA-principal component analysis (dimensionality reduction), PCA is an unsupervised learning method that more like a preprocessing method, can reduce original data into dimensionality, and maximize variance between the reduced-dimensionality data. The variance is interesting, and sometimes we consider reducing the variance (for example, we consider variance-bias equalization when training the model), and sometimes we try to increase the variance;

mode I, LDA-linear discriminant analysis, LDA is a supervised learning, and the principle of LDA is that labeled data (points) are projected into a space with lower dimensionality by a projection method, so that the projected points are classified into clusters, and the points of the same class are closer to each other in the projected space.

The following examples of the above-described implementations are described in practical applications.

1. Take customer churn prediction as an example

And screening customer characteristics from the customer parameters based on the correlation between the customer parameters and the customer loss possibility, the computing resources, the cost and the like, and inputting the customer characteristics into the computing model. The method comprises the steps of obtaining the correlation between customer parameters and customer loss probability through analysis of the customer parameters, conducting rough screening and further screening, calculating feature importance through a model subsequently, and adjusting input parameters again based on the calculated feature importance.

The specific method of parameter screening includes chi-square test, Pearson correlation coefficient method, extreme tree feature selection method, recursive feature elimination method, etc. to select important parameters related to the loss possibility of customers in customer parameters and remove redundant parameters, so as to input a prediction model for prediction.

The chi-square test is a feature selection method, the chi-square statistic between the independent variable and the target variable is calculated, and the variable with a relatively large chi-square value is reserved. In addition the value of the characteristic variable must be non-negative. For example, if a field and the attrition correlation are close to 0, we consider that this field does not have any predictive power and does not fit into the model.

The Pearson Correlation Coefficient (Pearson product-moment Correlation Coefficient) is also called Pearson product-moment Correlation Coefficient, and is a linear Correlation Coefficient. The pearson correlation coefficient is a statistic used to reflect the degree of linear correlation of two variables. For example, if two fields are close to 1 in correlation, we consider the two fields to be the same and only put one of the fields into the model.

The extreme tree feature selection method is a class of methods (embedded class methods) in feature selection. The method is based on a trained machine learning model, and variables are screened according to the feature importance of the model.

Recursive Feature Elimination (Recursive Feature Elimination) is a Feature selection method, and based on the variable coefficient output by the algorithm or the Feature importance, the variables with small importance are deleted, and then fitting and deleting are performed, and the steps are repeated.

The chi-square test and the Pearson correlation coefficient method are coarse screening methods, and the extreme tree and recursive feature elimination are further screening based on the coarse screening. The relevance between each client parameter and the client loss possibility can be checked and calculated by a chi-square, and the client parameter with high relevance is screened out; then, calculating the correlation among the client parameters according to the pearson correlation, and eliminating the client parameters with strong correlation with other client parameters; and then further screening client parameters according to the importance of each feature obtained by an extreme tree feature selection method and a recursive feature elimination method.

After parameter filtering, the final parameters (i.e. customer features) of the input model are M, where M is a positive integer, for example: the time length, gender, age, total amount of assets, the number of deposited products, the amount of money due to financial management in the current month, the transaction amount in the current month, the number of times regular products were purchased in the last 3 months, the average transaction amount in the last 3 months, the transaction amount in the last 3 months and the like of the bank clients.

2. Take customer churn prediction as an example

Customer characteristics are screened from the customer information based on the correlation of the customer information with the customer purchase probability, computing resources, costs, and the like. Performing coarse screening, and then performing further screening, wherein the coarse screening of the features is performed through at least one of mutual information, chi-square test and F test and feature test, wherein the feature test performs an aggregation process on the features selected by different methods, such as intersection, union or optimal set; then further refined feature selection is performed: by recursive feature elimination, feature model elimination, and the like, the two are selected based on different processing mechanisms and algorithm models, specifically, extreme trees, random forests, bayes, and the like.

Furthermore, the dimension reduction of the features is completed, including feature orthogonalization, principal component analysis of the features, matrix decomposition and the like.

And subsequently, calculating the feature importance through the model, and adjusting the input features again based on the calculated feature importance.

The client information includes the following information and the like:

transaction behavior characteristics: the monthly/quarterly and the like, the accumulated transaction times, the bank loan account-entering times, the transfer account-transferring times, the payroll income times, the cash deposit times, the loan repayment times, the transfer account-transferring times, the cash payment times, the consumption payment times, the life payment times and the last transaction amount of the client;

RFM behavioral pattern (RFM, Recency Frequency Monetary): time of last purchasing financing products, times of last buying financing products in March, amount of the financing products in March, time of last purchasing fund products, times of last purchasing fund products in March, amount of the latest consuming card swiping distance to today's day, times of last consumption in March, amount of the latest consumption in March, type top1 of the latest consumption in March, latest dynamic account date of all accounts in the name, and latest dynamic account number of all accounts in the name from today's day;

customer attribute characteristics: gender, age, home address, industry, job title, academic calendar, marital status, general family population, cell phone number, affiliation institution, customer manager, age of relationship with me (length of time to become a bank customer);

asset liability characteristics: maximum asset concentration, deposit time point balance, financing time point balance, fund time point balance, national debt time point balance, whether national debt is signed, whether trust is signed, whether precious metal is signed, whether third party inventory is signed, whether insurance is signed, deposit month-day is uniform, financing month-day is uniform, national debt month-day is uniform, deposit quarter-day is uniform, financing season-day is uniform, fund quarter-day is uniform, national debt season-day is uniform, deposit year-day is uniform, financing year-day is uniform, fund year-day is uniform, national debt year-day is uniform;

credit attribute features: the method comprises the following steps of (1) collecting a core client number, a client current grade, a client credit grade, a current loan five-grade classification, a comprehensive credit line, a client last grade, grade change time, a client last credit grade, credit grade change time, a last loan five-grade classification, a loan classification change date, a social insurance balance and a public fund monthly payment;

holding the behavior characteristics: the method comprises the following steps of (1) total amount of assets, deposit balance, held product number, deposited product number, loan product number, current debit card type number, current credit card type number, current debit card number, current credit card number, financing product number, fund product number, signed service type product number, signed channel type product number, accumulated purchase periodic time number, accumulated purchase periodic sum, held periodic deposit product number, accumulated loan time number, accumulated loan application time number, accumulated purchase financing time number, accumulated fund purchase time number, total assets of a customer in the previous quarter, whether a periodic product is held, maximum asset balance and maximum asset category;

the relationship circle is characterized: transfer usage, whether line is crossed, the number of strokes transferred in this month, the amount transferred in this month, the bank of the opposite party;

the accumulation is performed based on the month, the quarter, and the like.

Based on the correlation between the customer parameters and the possibility of purchasing products, the computing resources, the cost and the like, the customer characteristics are screened from the customer information and input into the computing model. The method comprises the steps of obtaining the correlation degree between customer information and the possibility that a customer purchases products through analysis of the customer information, conducting primary screening, further selecting through an algorithm, calculating feature importance through a model, and adjusting input information again based on the calculated feature importance.

Before rough screening, features (fields with correlation to the prediction result) can be added according to business common knowledge (based on business requirements), and since the features before rough screening are conventional features, user-defined features (features which are designed by users based on business understanding and business requirements and have strong interpretability for prediction targets) need to be added. For example, the following additional fields are added for a financial deposit product.

Newly adding fields: whether target clients are regularly target clients in one year, whether target clients are large in one year, whether target clients are regularly newly added in one year, whether target clients are frequently added in one year, whether target clients are cancelled in one year, whether target clients are regularly lost in one year, two product interest rates are fixed, a deposit product interest rate is notified in one day, a deposit product interest rate is notified in seven days, a month product interest rate, a three month product interest rate, a six month product interest rate, a nine month product interest rate, a one year product interest rate, a two year product interest rate, a three year product interest rate, and a five year product interest rate.

For a financial deposit product, the selected features of the final features include: the total assets of the client in the previous quarter, whether the client holds regular products, the last transaction amount of the client, the monthly and daily average change rate of the deposit, the annual and daily average of the deposit, the accumulated purchase regular amount, the number of products holding the regular deposit, the maximum asset balance, the maximum asset class and the like.

It should be noted that, in the data analysis processing process, when data analysis is performed on mass data, some ways for feature derivation are provided, and then a selection (which may be automatically selected by a system or selected by a user) way is arranged and combined (for example, in a product intelligent recommendation scenario, after a mode of purchasing products in each order is obtained, a mode of purchasing products in different orders by each customer is obtained, so as to generate derived features as much as possible, dig out more feature information, introduce nonlinearity, enhance the expression capability of the model (training effect of the model), thereby improve the degree of engagement between the business model and the business scenario and business requirements, and improve the accuracy of data analysis.

Referring to fig. 4, fig. 4 is a block diagram of a data analysis system according to an embodiment of the present invention. As shown in fig. 4, the data analysis system 40 includes:

a determining module 41, configured to obtain a basic feature based on service data and/or a service scenario to be analyzed, and determine a feature derivation manner;

an obtaining module 42, configured to derive the basic features according to the determined feature derivation manner, so as to obtain derived features;

and the execution module 43 is used for creating a business model according to the derived characteristics so as to execute analysis processing operation.

Optionally, the obtaining module 42 includes:

Optionally, the obtaining module 42 is configured to:

Further, when the obtaining module deeply learns the basic features through a sparse self-coding algorithm, the obtaining module 42 is configured to:

Further, when the obtaining module 42 performs deep learning on the basic features through a factorization machine algorithm, the obtaining module 42 is configured to:

Further, when the obtaining module 42 performs deep learning on the basic features through a deep cross neural network algorithm, the obtaining module 42 is configured to:

Optionally, the obtaining module 42 is configured to:

Further, when the manner of the feature combination includes a manner of using a mathematical operation, the obtaining module 42 is configured to:

Optionally, the obtaining module 42 is configured to:

Optionally, the obtaining module 42 is configured to implement at least one of the following manners:

Optionally, the obtaining module 42 includes:

Optionally, the obtaining module 42 is configured to:

Optionally, the executing module 43 includes:

The real-time embodiment of the invention also provides a data analysis system, which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor; wherein the processor implements the steps in the data analysis method described above when executing the computer program.

The present invention further provides a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the computer program implements the processes of the data analysis method embodiment, and can achieve the same technical effects, and in order to avoid repetition, the detailed description is omitted here. The computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk.

While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. A method of data analysis, comprising:

creating a business model according to the derived features to perform analysis processing operations;

wherein the characteristic derivation comprises at least one of: business target-based derivation, deep learning-based derivation, feature combination-based derivation, time variable-based derivation, decision tree model-based derivation, and numerical transformation-based derivation;

the deriving the basic features according to the determined feature deriving mode to obtain derived features, which comprises the following steps:

deriving a multilayer depth variable for one basic feature based on the incidence relation among the basic features to obtain derived features; or generating a derivative feature by utilizing a depth feature synthesis mode based on the interest indexes of the basic features;

deriving a multilayer depth variable for one basic feature based on the association relationship between the basic features to obtain derived features, wherein the deriving includes: deriving a multilayer depth variable for one basic feature by using multiple feature derivation modes in a combined manner based on the incidence relation among the basic features to obtain derived features;

the generating of the derivative features by the interest indicator based on the basic features in a depth feature synthesis mode specifically includes: generating derived features by combining and using a plurality of feature derivation modes based on the interest indexes of the basic features in a depth feature synthesis mode;

the number of the multiple feature derivation modes used in combination is determined based on the service data and the service scene;

the service features of the service scene comprise service features and attribute features.

2. The data analysis method of claim 1, wherein the deriving the base features according to the determined feature derivation manner to obtain derived features comprises:

3. The data analysis method of claim 1, wherein the deriving the base features according to the determined feature derivation manner to obtain derived features comprises:

4. The data analysis method of claim 3, wherein when the base features are deeply learned by the sparse self-coding algorithm, the obtaining derived features comprises:

5. The data analysis method of claim 3, wherein when the base features are deeply learned by the factorizer algorithm, the obtaining derived features comprises:

6. The data analysis method of claim 3, wherein when the basic features are deeply learned by the deep cross neural network algorithm, the obtaining derived features comprises:

inputting the basic features into a deep cross neural network, obtaining a first input result and a second input result through the cross network and the deep network respectively, and combining the first input result and the second input result to obtain derivative features.

7. The data analysis method of claim 1, wherein the deriving the base features according to the determined feature derivation manner to obtain derived features comprises:

8. The method of claim 7, wherein when the manner of combining the features comprises employing a polynomial, the deriving the derived features comprises:

9. The method of claim 7, wherein when the manner of combining the features comprises employing a mathematical operation, the deriving the derived features comprises:

10. The method of claim 7, wherein when the manner of combining the features comprises employing an aggregation function, the deriving the derived features comprises:

11. The method of claim 10, wherein the applying the aggregation function to the continuous variable and the discrete variable to generate the derived features comprises at least one of:

12. The data analysis method of claim 1, wherein the deriving the base features according to the determined feature derivation manner to obtain derived features comprises:

13. The data analysis method of claim 1, wherein the deriving the base features according to the determined feature derivation manner to obtain derived features comprises:

acquiring a timestamp variable in the basic features;

and generating a derivative feature according to the timestamp variable.

14. The data analysis method of claim 13, wherein generating derived features from the timestamp variables comprises at least one of:

15. The data analysis method of claim 1, wherein the deriving features from the basic features to obtain derived features comprises:

16. The data analysis method of claim 1, wherein the deriving features from the basic features to obtain derived features comprises:

17. The data analysis method of claim 16, wherein converting the class-type features into numerical-type features comprises: at least one of a serial number encoding, a one-hot encoding, a binary encoding, and a contrast encoding.

18. The data analysis method of claim 1, wherein the creating a business model from the derived features to perform analysis processing operations comprises:

19. A data analysis system, comprising:

the execution module is used for creating a business model according to the derived characteristics so as to execute analysis processing operation;

the acquisition module is configured to implement:

20. The data analysis system of claim 19, wherein the acquisition module comprises:

21. The data analysis system of claim 19, wherein the acquisition module is configured to:

22. The data analysis system of claim 21, wherein when the obtaining module is performing deep learning on the base features through the sparse self-coding algorithm, the obtaining module is configured to:

23. The data analysis system of claim 21, wherein when the obtaining module learns the base features deeply through the factoring machine algorithm, the obtaining module is configured to:

24. The data analysis system of claim 21, wherein when the obtaining module learns the base features deeply through a deep cross neural network algorithm, the obtaining module is configured to:

25. The data analysis system of claim 19, wherein the acquisition module is configured to:

26. The data analysis system of claim 25, wherein when the manner of combining the features comprises employing a polynomial manner, the obtaining module is configured to:

27. The data analysis system of claim 25, wherein when the manner of combining the features comprises employing a mathematical operation, the obtaining module is configured to:

28. The data analysis system of claim 25, wherein when the manner of combining the features comprises employing an aggregation function, the obtaining module comprises:

29. The data analysis system of claim 28, wherein the third obtaining unit is configured to at least one of:

30. The data analysis system of claim 19, wherein the acquisition module is configured to:

31. The data analysis system of claim 19, wherein the acquisition module comprises:

32. The data analysis system of claim 31, wherein the generation unit is configured to at least one of:

33. The data analysis system of claim 19, wherein the acquisition module comprises:

34. The data analysis system of claim 19, wherein the acquisition module is configured to:

35. The data analysis system of claim 34, wherein the manner of converting the class-type features to numerical-type features comprises: at least one of a serial number encoding, a one-hot encoding, a binary encoding, and a contrast encoding.

36. The data analysis system of claim 19, wherein the execution module comprises:

37. A data analysis system comprising a memory, a processor and a computer program stored on the memory and executable on the processor; characterized in that the processor, when executing the computer program, carries out the steps in the data analysis method according to any one of claims 1 to 18.

38. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the data analysis method according to any one of claims 1 to 18.