CN113159449A - Structured data-based prediction method - Google Patents

Structured data-based prediction method Download PDF

Info

Publication number
CN113159449A
CN113159449A CN202110521123.XA CN202110521123A CN113159449A CN 113159449 A CN113159449 A CN 113159449A CN 202110521123 A CN202110521123 A CN 202110521123A CN 113159449 A CN113159449 A CN 113159449A
Authority
CN
China
Prior art keywords
feature
vector
attention
prediction
exponential
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110521123.XA
Other languages
Chinese (zh)
Inventor
蔡少峰
郑凯平
陈刚
张美慧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Original Assignee
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT filed Critical Beijing Institute of Technology BIT
Priority to CN202110521123.XA priority Critical patent/CN113159449A/en
Publication of CN113159449A publication Critical patent/CN113159449A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Business, Economics & Management (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Development Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Game Theory and Decision Science (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a prediction method based on structured data, belonging to the technical field of artificial intelligence learning prediction and comprising the steps of obtaining x ═ x of the structured data tuple1,x2,...xj,...xm>; will attribute value xjConversion to an embedded vector representation ej(ii) a Modeling feature interactions of the x based on the embedding vector using a plurality of exponential neurons; aggregating all of the feature interactions to construct a feature vector for the x; and performing classification prediction based on the feature vector. The invention overcomes the limitation that the input of the logarithmic neuron must be positive by modeling the cross characteristic of the exponential neuron, improves the flexibility and the applicable scene of the neuron, and improves the cross characteristicCharacterizing the effectiveness of modeling; the multi-head gating attention mechanism can dynamically and selectively model cross features of any order according to input data, improves accuracy and efficiency of feature modeling, and further improves accuracy and efficiency of target prediction; the dynamic capture of interactive items of input samples through a gating mechanism provides model decision interpretability and new insight.

Description

Structured data-based prediction method
Technical Field
The invention relates to prediction, in particular to a prediction method based on structured data, and belongs to the technical field of artificial intelligent learning prediction.
Background
To date, most enterprises rely on structured data for data storage and predictive analysis. Relational database management systems (RDBMS) have become the mainstream database systems employed in the industry, relational databases have become the standard for storing and querying structured data in practice, which is critical to the operation of most businesses. Structured data often contains a large amount of information that can be used to make data-driven decisions or to identify risks and opportunities. Extracting insights from the data for decision making requires advanced analysis, especially deep learning, which is much more complex than statistical aggregation.
Formally, structured data refers to the type of data that can be represented in a table. It can be seen as a logical table consisting of n rows (tuples/samples) and m columns (attributes/features), which is extracted from the relational database by core relational operations such as selection, projection and join. Predictive modeling is the learning of the functional dependence (predictive function) of the dependent property y on the decision property x, i.e., f: x → y. Where x is commonly referred to as a feature vector and y is the prediction target. The main challenge in predicting for structured data is in fact how to model dependencies and correlations between these properties by crossing features, so-called feature interactions. These cross features create new features by capturing the interactions of the original input features. In particular, a cross feature may be defined as
Figure BDA0003064014910000011
I.e. the product of the input features and their corresponding respective interaction weights. Weight wiRepresenting the contribution of the ith feature to the cross feature; in the feature interaction, wi0 corresponds to a feature xiFailure, the interaction order of the cross feature isMeans its non-zero interaction weight wiThe number of (2). This cross-feature for relational modeling is the core of structured data learning, which enables the learning model to represent more complex functions, not just a simple linear aggregation for predictive analysis of input features.
The existing methods for performing relational modeling on data and for predicting targets are mainly classified into 2 types: implicit modeling and explicit modeling. Typical implicit modeling methods are Deep Neural Networks (DNNs), such as CNNs, LSTM, etc. DNNs are only suitable for some specific data types, for example, CNNs for image applications and LSTM for sequence data applications. However, applying DNNs to structured data in relational tables may not produce meaningful results. In particular, there are inherent dependencies and dependencies between attribute values of structured data, and the interaction relationships between such properties are essential for predictive analysis. Although in theory, DNN can approximate any objective function as long as there is sufficient data and capacity, conventional DNN network layers are additive in capturing interactions, and therefore, to model such multiplied interactions requires excessively large and increasingly difficult to understand models, often built up from multiple layers with nonlinear activation functions between the layers. Previous studies also suggest that implicitly modeling such cross-features with DNNs may require a large number of hidden units, which greatly increases computational cost and also makes DNNs more difficult to interpret; as described in the document Alexandr Andoni, Rina panogrhy, Gregory valid, and Li zhang.2014, Learning polymers with Neural networks in Proceedings of the 31th International Conference on Machine Learning, icml.
In relational analysis, a preferred alternative to DNNs is to explicitly model feature interactions to achieve better performance and interpretability in feature attribution. However, the number of possible feature interactions is large in combination. Thus, the core problem of explicit cross feature modeling is how to identify the correct feature set while determining the corresponding interaction weights. Most existing studies are limited to pre-definition by capturing the order of interactionBy a range of numbers of crossing features. However, as the maximum order increases, the number of cross features still grows nearly exponentially. AFN (Weiyu Cheng, Yanyan Shen, and Linpen Huang. 2020.Adaptive facial interpretation Network: Learning Adaptive-Order features interaction. In 34th AAAI Conference on Adaptive interpretation.) further, it models cross-features using logarithmic neurons (J.Wesley Hines.1996.A logarithmic Network architecture for non-bundled non-linear function adaptation. in Proceedings of International Conference Networks (ICNN' 96). IEEE, 1245. each neuron converts a Feature into a logarithmic space, thereby converting the features into a learnable coefficient 1250, in particular, a power of a plurality of features
Figure RE-GDA0003099550400000021
In this way, each log neuron can capture a specific arbitrary order feature interaction term, but AFN has its inherent limitations, as the input features of the interaction term are limited to positive values due to the use of log-transforms. In addition, the interaction order of each interaction term is unconstrained and remains static after training.
We believe that cross-features should only consider certain input features and that feature interactions should dynamically model a single input. The rationale is that not all input features are constructive to cross terms, and modeling with uncorrelated features may introduce noise, thereby reducing effectiveness and interpretability. In particular, the deployment of the learning model in practical applications not only emphasizes accuracy, but also emphasizes efficiency and interpretability. It is noteworthy that understanding the general behavior and overall logic of the learning model (global interpretability), and providing reasons for the particular decisions made (local interpretability), is crucial for critical decision making in high-risk applications, such as the healthcare or financial industry. Although many black-box models (e.g., DNNs) have strong predictive capabilities, they model the input in an implicit way that is confusing and sometimes may learn some unexpected patterns. In this regard, explicitly adaptively modeling feature relationships with a minimal component feature set provides reasonable a priori knowledge in terms of effectiveness, efficiency, and interpretability.
Disclosure of Invention
The present invention is directed to a method for predicting structured data, which includes the following steps:
obtaining the structured data tuple x ═<x1,x2,…xj,…xm>,xjRepresenting the jth attribute value, and m representing the number of the structured data attributes;
will attribute value xjConversion to an embedded vector representation ej,j∈{1,2,…,m};
Modeling feature interactions of the x based on the embedding vector using a plurality of exponential neurons;
aggregating all of the feature interactions to construct a feature vector for the x;
and performing classification prediction based on the feature vector.
Preferably, the attribute value x is setjConversion to an embedded vector representation ejThe process of (2) is as follows: when said x isjWhen the value is numerical, the value range is firstly scaled to (0, 1) according to the attribute value range]Multiplying the interval with the pre-learned embedded vector; when said x isjWhen the type is classified, the corresponding pre-learned embedded vector is directly indexed according to the value of the embedded vector.
Preferably, the order is not fixed when the features modeling the x interact with each other.
Preferably, the number of the exponential neurons is K × o, where K denotes the number of the attention heads, o denotes the number of the exponential neurons per attention head, and K and o are both natural numbers; all the exponential neurons of each attention head share the weight matrix W of their bilinear attention functionatt
The ith index neuron y of each attention headiIs represented as follows:
Figure BDA0003064014910000031
Figure BDA0003064014910000032
wherein i, < > represents a Hadamard product, an exp (·) function and a corresponding exponent wijApplication by element, ejRepresents the embedded vector, i, j, m, n, corresponding to the jth attribute value of the structured dataeI is more than or equal to 1 and less than or equal to o, j is more than or equal to 1 and less than or equal to m, m represents the number of the structured data attributes,
Figure BDA0003064014910000041
nethe size of the embedding is indicated,
Figure BDA0003064014910000042
represents yiTo ejThe derivative is taken as a function of the time,
Figure BDA0003064014910000043
denotes yiTo wijTaking the derivative, diag (·) is a diagonal matrix function;
Figure BDA0003064014910000044
represents said yiIs obtained by the following formula:
wi=zi⊙vi
wherein,
Figure BDA0003064014910000045
represents a learnable attention weight vector, ziAs a gate, the attention re-alignment weights are represented, dynamically generated by bilinear attention alignment scores, as follows:
Figure BDA0003064014910000046
wherein,
Figure BDA0003064014910000047
indicating that attention is directed to the query vector, T denotes the transpose operation,
Figure BDA0003064014910000048
a weight matrix representing a bi-linear attention function, α -entmax (·) representing a sparse softmax, sparsity increasing with increasing α, a being a hyper-parameter for controlling sparsity,
Figure BDA0003064014910000049
preferably, the aggregation is vector stitching.
Preferably, before performing classification prediction based on the feature vectors, nonlinear feature interaction of elements is captured through a multi-layer perceptron MLP, and a vector representation h of a coding relation is obtained:
Figure BDA00030640149100000410
wherein n ishThe size of the nonlinear feature interaction is represented and is a natural number.
Preferably, the classification prediction is performed by the following formula:
Figure BDA00030640149100000411
wherein,
Figure BDA00030640149100000412
and
Figure BDA00030640149100000413
respectively representing weight and offset, npRepresenting the number of predicted targets.
Preferably, the target prediction is performed by combining the prediction method with DNN.
Preferably, the plurality of indices areV of neuroniAnd ranking the added and averaged sequence as the importance ranking of each attribute in the structured data to the target prediction.
Preferably, w of the plurality of exponential neurons isiAnd the added and averaged sequence is used as the importance ranking of each attribute value in the current meta group to the current target prediction result.
In another aspect, the present invention further provides an electronic device, including:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method of structured data based prediction as described above.
In another aspect, the present invention also provides a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform a method of structured data based prediction as described above.
In another aspect, the present invention also provides a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, cause the computer to perform the aforementioned structured data based prediction method.
Advantageous effects
Compared with the prior art, the prediction method based on the structured data provided by the invention has the following characteristics:
1. the cross feature is modeled by the exponential neuron, so that the limitation that the input of the logarithmic neuron must be positive is overcome, and the applicable scene of the neuron is improved;
2. the provided exponential neuron can model cross features of any order, and the effectiveness of cross feature modeling is improved;
3. through the exponential neuron and the multi-head gating attention mechanism, the cross features of any order can be dynamically and selectively modeled according to input data, and the accuracy and the efficiency of feature modeling are improved;
4. the cross feature modeling method follows a white box design, and the modeling process is more transparent, so that the method is more explanatory in the relation analysis processing;
5. by paying attention to the gating mechanism of the recalibration weight, the interaction item corresponding to the input sample can be captured dynamically, the model decision interpretability is provided, the trust of people is obtained, new insights are provided, and the understanding of people in some fields is promoted.
6. By global weighting v of all index neuronsiAnd the addition, the average and the sequencing can deepen the understanding of the influence factors and the importance degree of the decision.
7. Through the dynamic feature interaction weight w of all index neuronsiAnd the average and the sorting can deepen the understanding of the influence factors and the importance degree of the current input decision.
Drawings
FIG. 1 is a flow chart of a method according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart of a preferred embodiment of the method according to the first embodiment of the present invention;
FIG. 3 is a representation of global feature attributes for the same, Shape, and methods of the present invention for datasets Frappe and Diabetes130, respectively;
FIG. 4 is a graph of ARM-Net local feature attributes and local feature importance weights given by Lime (top right) and Shap (bottom right) for a representative input example on the Frappe dataset;
FIG. 5 is a graph of ARM-Net (left) local feature attributes and local feature importance weights given by Lime (top right) and Shap (bottom right) for a representative input instance on the Diabetes130 dataset; .
Detailed Description
Preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings.
For convenience of the following description, the structured data is represented as a logicThe table T includes n rows and m columns, and each row may be expressed as a tuple (x, y) ═ x1,x2,…xj,…xmY), where y is the dependent attribute (prediction target), x (x ═ x<x1,x2,…xj,…xm>) Is to determine the attribute (feature vector), xjRepresenting the jth attribute value.
The embodiment of the invention realizes the prediction method based on the structured data, and specifically comprises the following contents:
s1, obtaining the structural data tuple x ═<x1,x2,…xj,…xm>,xjRepresenting the jth attribute value, and m representing the number of the structured data attributes;
for example, when a company wants to predict monthly sales, x is provided to include attribute fields (month, regionID, storenid, productnid), and m is 4, and 4 attributes are month, region ID, store ID, and product ID, respectively.
S2, matching the attribute value xjConversion to an embedded vector representation ej,j∈{1,2,…,m};
Any existing method may be used herein to convert each attribute value of the current tuple into an embedding vector, such as an FM method, a bi-directional embedding method, etc.
Preferably, the numerical attribute and the classification attribute of the structured data can be processed respectively: for the x, the four attributes are all classified attributes, all embedded vectors corresponding to the classification can be obtained through training, such as embedded vectors of 1-12 months, when a prediction task is executed, if month is 3, the embedded vectors corresponding to 3 months can be directly used.
S3, modeling the feature interaction of x based on the embedded vector using a plurality of exponential neurons;
exponential neurons, unlike logarithmic neurons, do not require that the input must be positive, thereby reducing the requirement for input data, and one exponential neuron models one feature interaction, i.e., cross-features.
Furthermore, the interaction order is not limited during modeling, but is determined in a self-adaptive mode according to the current data, and therefore the accuracy and the efficiency of the acquired feature interaction can be improved.
Further, setting the number of index neurons as K multiplied by o, wherein K represents the number of attention heads, o represents the number of index neurons of each attention head, and K and o are natural numbers; all the exponential neurons of each attention head share their bilinear attention function φattWeight matrix W ofatt
The ith index neuron y of each attention headiIs represented as follows:
Figure BDA0003064014910000071
Figure BDA0003064014910000072
wherein i, < > represents a Hadamard product, an exp (·) function and a corresponding exponent wijApplication by element, ejRepresents the embedded vector, i, j, m, n, corresponding to the jth attribute value of the structured dataeI is more than or equal to 1 and less than or equal to o, j is more than or equal to 1 and less than or equal to m, m represents the number of the structured data attributes,
Figure BDA0003064014910000073
nethe size of the embedding is indicated,
Figure BDA0003064014910000074
represents yiTo ejThe derivative is taken as a function of the time,
Figure BDA0003064014910000075
denotes yiTo wijTaking the derivative, diag (·) is a diagonal matrix function;
Figure BDA0003064014910000076
and representing the dynamic feature interaction weight of the yi, which is obtained by the following formula:
wi=zi⊙vi; (3)
wherein,
Figure BDA0003064014910000077
represents a learnable attention weight vector, ziAs a gate, the attention re-alignment weights are represented, dynamically generated by bilinear attention alignment scores, as follows:
Figure BDA0003064014910000078
Figure BDA0003064014910000079
wherein,
Figure BDA00030640149100000710
representing a learnable attention query vector, T represents a transpose operation,
Figure BDA00030640149100000711
a weight matrix representing a bilinear attention function, α -entmax (·) representing a sparse softmax, sparsity increasing with increasing α, a being a hyper-parameter for controlling sparsity,
Figure BDA00030640149100000712
s4, aggregating all the feature interactions to construct a feature vector of the x;
the polymerization can be carried out by various methods, such as addition and averaging, additionWeights, etc. the embodiment adopts a splicing method, namely, the feature interaction vectors output by all the exponential neurons are spliced to obtain a large vector, and for the exponential neurons, the obtained feature vector dimension is K.o.ne. The vector is too large, the nonlinear feature interaction of the vector can be further captured, and the vector dimension is reduced, for example, the vector representation h of the coding relation is obtained by using the nonlinear feature interaction of the multi-layer perceptron MLP capture element:
Figure BDA0003064014910000081
wherein n ishThe feature embedding size representing the nonlinear feature interaction is a natural number.
And S5, performing classified prediction based on the feature vectors.
The classification prediction can be performed by the following formula:
Figure BDA0003064014910000082
wherein,
Figure BDA0003064014910000083
and
Figure BDA0003064014910000084
respectively representing weight and offset, npRepresenting the number of predicted targets. For the monthly sales prediction task, the total sales of the prediction target amount can be set to be in multiple categories, for example, the specific sales amount is divided into 5 intervals. For other application scenarios, such as cancer prediction, we can set the classification as binary. I.e. the number of classes (prediction targets) is set according to the specific application scenario. Taking a binary classification task as an example, the corresponding objective function is a binary cross entropy:
Figure RE-GDA0003099550400000091
wherein
Figure BDA0003064014910000086
And
Figure BDA0003064014910000087
respectively, a predicted label and a true label, N is the training instance, i.e. the number of training tuples, and σ (-) is the sigmoid function. With the objective function specified, popular gradient-based optimizers (e.g., SGD, Adam (D. P. Kingma and Jimmy Ba.2015.Adam: A Method for storage optimization. In 3rd International Conference on Learning responses, ICLR.), etc.) can be used to efficiently train the network of the present invention, such as to train the network shown in FIG. 2, and then predict the input data tuples (examples) based on the trained network.
Furthermore, the method is used in a certain scene, and after the network is trained by using the corresponding structured training data, the global weight v of all the exponential neuronsiWhen a definite value is obtained, v of all index neurons is determinediAfter the sum and the average, the m elements are sorted, and the sorting embodies the importance of each attribute on the target prediction, namely the global interpretability. Similarly, the attribute combination related to the characteristic interaction of all index neurons is output in an ordered way after statistics of the occurrence frequency, namely ziThe attribute combination corresponding to the related non-zero element can obtain the high-frequency interactive item (interactive attribute, frequency and order) corresponding to the target prediction data set, the interactive attribute reflects the attribute combination with close influence relation, the frequency reflects the influence degree of the corresponding high-frequency interactive item on target prediction, the order reflects the attribute which is mostly irrelevant to interaction and is automatically filtered out as noise, and the efficiency of exponential neuron interactive modeling is effectively improved.
Further, when the trained network is used for prediction, due to a gating mechanism, noise filtering is performed on input data, and an attribute (z) concerned by each interaction can be obtainediAttribute corresponding to medium non-zero element) and its proportion weight (w)iElement value of corresponding attribute in) of all index neuronsiSum and averageThen, the m elements are sorted, and the sorting shows the influence degree of each attribute value of the current input element group data on the current target prediction, namely the local interpretability.
Furthermore, the deep neural network DNN with enough hidden units is a general approximator and has strong capability in capturing nonlinear feature interaction, so that the method (ARM-Net for short) can be combined with the DNN to carry out more effective prediction, and at the moment, the prediction result is predicted
Figure BDA0003064014910000091
Comprises the following steps:
Figure BDA0003064014910000092
wherein w1And w2Are the integrated weights of ARM-Net and DNN respectively,
Figure BDA0003064014910000093
is an offset, and npAs is the number of predicted targets for the learning task. The entire integrated model can then be easily trained end-to-end by optimizing the objective function (e.g., equation 7 above). We represent the integrated model of ARM-Net and DNN as ARM-Net +.
The effectiveness, interpretability and efficiency of the structured data relation modeling are improved by the prediction method provided by the invention:
1. effectiveness of
Most existing feature interaction modeling studies either statically capture the possible cross features at a predefined maximum interaction order or model the cross features in an implicit manner. However, in different input instances, different relationships should have different composition properties. Some relationships are informative, while others may be noise only. Therefore, modeling cross-features in a static manner is not only parameter and computationally inefficient, but may also be inefficient. In particular, the output of each exponential neuron
Figure BDA0003064014910000101
Capture a particular cross feature of arbitrary order and represent any combination of interacting features, possibly by deactivating other features. By utilizing the proposed exponential neurons and multi-head gating attention mechanism, the invention can model the feature interaction in a self-adaptive manner, thereby obtaining better prediction performance.
2. Interpretability of
Interpretability measures how well decisions made by the model can be understood by humans, resulting in user confidence and providing new insights. There have been post-hoc interpretation methods to explain how the black-box model works, including perturbation-based methods, gradient-based methods, and attention-based methods. However, the interpretation given by another model is often unreliable, which may be misleading. In addition, the present invention follows a white-box design and the modeling process is more transparent and thus more explanatory in the relational analysis process.
In particular, each feature interaction item
Figure BDA0003064014910000102
Interaction weight of
Figure BDA0003064014910000103
Is an attention value globally shared from among instances
Figure BDA0003064014910000104
And dynamically recalibrated by attention alignment of each instance. Thus, the shared attention weight value vector encodes the global interaction weights in the instance population and prior to attribute domain alignment. Thus, we can vector all values of the exponential neurons
Figure BDA0003064014910000105
Are aggregated to obtain global interpretability. E.g. of all index neurons
Figure BDA0003064014910000106
To addAnd averaging, this result may indicate the general interest of the invention for each attribute domain in the population, i.e., the characteristic importance of the attribute domain, i.e., the result ranking may indicate the importance ranking of different attributes to the predicted target. At the same time, the proposed gated attention mechanism also adds to the local interpretability, i.e. providing feature attribution on a per input basis. Notably, each index neuron specifies a sparse set of attribute fields that are dynamically used by attention alignment. Thus, we can identify cross features that are dynamically captured, while for each instance (i.e., one tuple of structured data), a relative feature importance table can be obtained by aggregating the interaction weights of all the exponential neurons. To understand the internal modeling process, a global/local analysis of the captured cross feature terms may also be performed.
3. Efficiency of
In addition to effectiveness and interpretability, model complexity is another important criterion for model deployment in practical applications. To simplify the analysis and reduce the number of hyper-parameters, we set the size of all the embedding, attention vectors to neAnd the parameter scale of all MLPs in the ARM network is expressed as nw. Recall that m, K, o represent the attribute field of each attention head, the attention head, and the number of exponential neurons of each attention head, respectively. The vector is embedded with O (Mn)e) A feature embedding parameter, each instance being embedded using only M attribute fields, where M is the number of distinct features,
Figure BDA0003064014910000111
then the overall sparsity. Since m is usually small and vector embedding is simply embedding lookups and rescaling, the complexity is negligible.
For ARM modules, K.o exponential neurons can be at complexity O (Komn)e) Calculating; the parameter size of the value/query vector is O (Kon)e) The complexity of the computation of bilinear attention alignment for all m input embedded is O (Komn)e). For the prediction module, the complexity is O (n)w) This is mainly the nonlinear feature interaction function of equation 7Number phiMLPBrought about. Thus, the overall parameter size and computational complexity for processing each input are O (mn), respectivelye+nw) And O (Komn)e+nw). This is linear with the number of attribute fields and is therefore efficient and scalable.
Test results
The inventive methods (ARM-Net, ARM-Net +) and the existing five-class feature interaction modeling methods were compared using five real datasets (app recommendations (Frappe), movie recommendations (MovieLens), classified click-through rate predictions (Avazu, Criteo), and health of medicine (Diabetes 130)).
The statistical data of the five data sets and the optimal hyper-parameters searched in the ARM network of the method are shown in the table1: data set statistics and ARM-Net optimal parameter configuration (Table1: Dataset statistics and best ARM-Net configurations), the number of Tuples (instances) of different datasets (datasets), the number of attribute Fields (Fields) and different feature numbers (Features), and the optimal hyper-parameters (ARM-Net hyper-parameters) of the network of the invention corresponding to the datasets are given in the Table.
Table 1:Dataset statistics and best ARM-Net configurations.
Figure BDA0003064014910000112
The five-type feature interactive modeling method comprises the following steps:
(1) linear Regression (LR) linearly aggregating the input attributes with their respective importance weights without considering feature interactions;
(2) methods for modeling second order feature interactions, i.e., FM, AFM;
(3) methods to capture higher order feature interactions, namely HOFM, DCN, CIN and AFN;
(4) the neural network based approach, i.e., DNN, and the graph neural networks GCN and GAT.
(5) Models of explicit cross feature modeling and implicit feature interaction modeling, namely Wide & Deep, KPNN, NFM, Deep fm, DCN +, xDeepFM and AFN +, are integrated by DNNs.
AUC (area under ROC curve, larger is better) and Logloss (cross entropy, smaller is better) are used as evaluation indexes. For AUC and Logloss, the improvement at the 0.001 level was considered significant on the baseline dataset used. We split the data set into 8:1:1 for training, validation and testing, respectively, report the average of five independently run evaluation metrics, and take a strategy of early stop on the validation set.
In the test, an Adam optimizer is adopted, the learning rate search range is 0.1-1 e-3, and the batch size of all models is 4096. In particular, we take a batch size of 1024 for the smaller dataset Diabetes130 and an evaluation every 1000 training steps for the larger dataset Avazu. The experiment was performed on a Xeon (R) Silver 4114CPU @2.2GHz (10 cores), 256G memory and GeForce RTX 2080Ti server. The model was implemented in PyTorch 1.6.0 and cuda 10.2.
The results of the comparison are shown in Table 2: overall prediction performance with the same training data set (Table 2: over prediction performance with the same training settings).
As can be seen from table 2:
1. explicit interactive modeling using a single model.
The ARM network is compared to a baseline model of a single structure, which can explicitly capture first, second and higher order cross features. Based on the results in table2, we have the following findings:
first, ARM-Net consistently outperforms the baseline model of explicit modeling interactions in AUC. Better predictive properties demonstrate the effectiveness of ARM-Net across datasets and domains, including application recommendations (Frappe), movie tag recommendations (MovieLens), click-through rate predictions (Avazu and Criteo), and medical readmission predictions (Diabetes 130).
Secondly, higher order models (e.g., HOFM and CIN) generally have better prediction performance than lower order models (e.g., LR and FM), which verifies the importance of higher order cross features to prediction, and the absence of higher order cross features can greatly reduce the modeling capability of the models.
Third, both AFN and ARM-Net are significantly superior to the fixed-order baseline model, which verifies the effectiveness of modeling arbitrary-order feature interactions in an adaptive and data-driven manner.
Finally, the AUC of ARM-Net is significantly higher than the baseline model AFN, which generally performs best.
Table 2:Overall prediction performance with the same training settings.
Figure BDA0003064014910000131
The good performance of the ARM network is mainly attributed to the exponential neurons and the gated attention mechanism. Specifically, the limitation of the positive input of the logarithmic transformation in AFN limits its representation, whereas ARM-Net avoids this problem by modeling feature interactions in the exponential space. Furthermore, the multi-headed gating attention of ARM-Net does not statically model interactions as in AFN, but selectively filters noise characteristics and dynamically generates interaction weights to reflect the characteristics of each input instance. Thus, ARM-Net can capture more efficient cross-signatures to achieve better prediction performance on a per-input basis, and the parameters of ARM-Net are more efficient due to this runtime flexibility. As shown in table1, the best ARM-Net requires only tens to hundreds of exponential neurons for datasets of different sizes, while the best AFN typically requires more than a thousand neurons to achieve the best results, e.g., on large datasets Avazu, the ARM network and the AFN require 32 and 1600 neurons, respectively.
2. Neural network based models and integrated models.
Based on the results in table2, we have the following findings:
(1) although feature interactions are not explicitly modeled, optimal neural network-based models generally have stronger predictive performance relative to other single-structure baseline models. In particular, the attention-based graph network GAT obtained significantly higher AUC on Avazu and Diabetes130 than other single structure models. However, its performance is not as stable as ARM-Net, and varies widely between different data sets, e.g., GAT performs much worse on Frappe and MovieLens than on DNN and ARM-Net.
(2) Model integration of DNNs significantly improves their respective prediction performance. This can be consistently observed throughout the baseline model, e.g., DCN +, xDeepFM, and AFN +, suggesting that the nonlinear interaction captured by DNNs is complementary to the interaction captured explicitly.
(3) ARM-Net achieves performance comparable to DNN, and ARM-Net + further improves performance, achieving the best overall performance on all benchmark datasets.
Taken together, these results further demonstrate the effectiveness of ARM-Net for selectively and dynamically modeling arbitrary order feature interactions.
For the results of the explanatory tests
The present invention demonstrates the interpretability results of ARMOR through user application usage prediction on Frappe and readmission prediction of diabetic patients on Diabetes130 in two representative areas. In particular, the learning task on Frappe is to predict the usage state of an application based on a given usage context. The context comprises 10 attribute fields, { user _ id, item _ id, daytime, weekday, weekkend, location, is _ free, weather, county, city }, and mainly describes the use mode of the mobile terminal user; for Diabetes130, the learning task is to predict the likelihood of readmission by analyzing factors and other information related to the diabetic patient's readmission. There are 43 attribute fields for prediction, and we show 10 most important attribute fields for illustration. Interpretations of The attribute fields of both datasets are public (Linas Baltrunas, Karen Church, Alexandros Karatzoglou, and Nuria Oliver.2015.Frappe: interpreting The use and permission of Mobile App Recommendations In-The-wild. arXiv prepropressin: 1505.03014(2015). also Beata Strack, Jonathan P Desharzo, Chris genings, Juan L Olmo, Sebastan Venturi, Krzztof J Cios, and John Clore.2014. Immun HbA1c statistical responses: interpretation of 70,000 datatract).
For both data sets, the global feature importance of the various attribute fields obtained by aggregating the value vectors of the index neurons is first demonstrated and the global features of ARM-Net are attributed to a comparison with two widely adopted interpretation methods, Lime (Marco T-lio Ribeiro, Samer Singh, and cars Guestin.2016. "where cover I Trust Young": displaying the prediction of the same Classification. In Proceedings of the 22nd ACM SIGKDD.ACM, 1135. In 1144.) and Shap (Scott M. Lundberg and Su-In Lee.2017.A field applied prediction Model prediction In prediction of Information 30: analysis of the System 4765. the global features of ARM-Net are compared with the two widely adopted interpretation methods, Libero, Samer Singh, and Carlo, and the name of the same. The two methods adopt an interpretation method of input disturbance based on linear regression and game theory to identify the characteristic importance of the model to be interpreted. Specifically, the results of the interpretation of Lime and Shap on the Frappe and Diabetes130 datasets were based on best performing single structure baseline models DNN and GAT (Petar Velickovic, Guillem Cucurull, Aranta Casanova, Adriana Romero, Pietro, respectively
Figure BDA0003064014910000153
and Yoshua Bengio.2018.graph attachment networks.In 6th International Conference on Learning retrieval, ICLR.), and the importance of the global characteristics given by the two methods is obtained by aggregating the local characteristic attributes of all the examples of the test data set. Then, we display the top level interactivity terms (Interaction Term) captured by ARM-Net at the corresponding Frequency (Frequency) and order (Orders), which represent the average number of occurrences of each instance and the number of features captured for each interactivity Term, respectively. We also account for local interpretations by showing the ARM module by aggregating assigned feature interaction weights, and again compare the ARM-Net local feature attribution results to Lime and Shap.
Global interpretability. We illustrate global feature attribution in fig. 3 and summarize the high frequency interaction terms of the two datasets captured by ARM-Net in table 3 and table 4, respectively.
Table 3:Top Global Interaction Terms for Frappe.
Figure BDA0003064014910000151
Table 4Top Global Interaction Terms for Diabetes130.
Figure BDA0003064014910000152
From FIG. 3, it can be seen that the most important features identified by ARM-Net on the Frappe data set are { user _ id, item _ id, is _ free }. Global attention to these attributes is justified because user _ id and item _ id identify the user and item, two main features used in learning tasks such as collaborative filtering, and is _ free indicates whether the user pays for the application, which is highly related to the user's preference for the application. Similarly, on the Diabetes130 dataset, the most important features determined by ARM-Net include { emergency score, hospitalization score, number of diagnoses }, which is consistent with the attribute domain coefficients estimated for the logistic regression model in the literature (Beata Strack, Jonathan P Deshazo, Chris genings, Juan L Olmo, Sebastian Ventura, Krzysztof J Cios, and John Clore.2014.Impact of HbA1c medial regression on regional recommendations: analysis of 70,000clinical database performance records. BioMed research international 2014 (2014)). We also note that the global feature importance provided by ARM-Net is consistent with the two common interpretation methods (i.e., Lime and Shap). At the same time, we note that the importance of the global features provided by ARM-Net is relatively more reliable, because ARM-Net essentially supports global feature attribution, its modeling process is more transparent, and Lime and Shap are generally used as a medium to interpret other "black box" models by approximation.
From the top level global interaction item on the Frappe dataset in table 3, it can be found that: first, the attribute fields that interact item modeling most frequently include use _ id, item _ id, and is _ free, which is consistent with the global feature importance in FIG. 3. Second, these interaction terms often occur in interaction modeling, such as the frequency of interaction terms (workday, place, is _ free), (item _ id, is _ free, city) and (user _ id, is _ free) being 3.71, 3.36 and 2.88, respectively, indicating that these cross features (with different interaction weights) are used multiple times in each instance (note that the inference of each instance has K.o interaction terms). Third, the order of the interaction terms is at most 2 and 3, which suggests that it is necessary to identify a suitable set of attributes for interaction modeling, and capturing cross features by enumerating all possible combinations of features is extremely inefficient and ineffective, which may introduce noise.
From the top-level global interaction terms listed in table 4 for the Diabetes130 datasets, it can be observed that the most commonly modeled property fields in the interaction terms are quite diverse, indicating that different exponential neurons do capture different cross features, which is more parameter efficient when modeling feature interactions. Furthermore, the order of the top-level interaction terms is less than 3, and there are many first-order terms, which indicates that for some datasets, such as Diabetes130, it may not be necessary to model the high-order cross features.
Local interpretability. FIG. 4 shows the local characteristic attribution of ARM-Net for one representative input example on the Frappe dataset, where the interaction weights for three representative exponential neurons and the average weight for all neurons are shown. We can note that different exponential neurons selectively capture different cross features in a sparse manner. For example, Neuron3 captures feature interaction terms (item \ id, weekend, count), which indicates that Neuron3 responds to these three attributes for this particular instance. In addition, the aggregate interaction weight display item _ id, is _ free, and user _ id of this example are the three most distinctive attributes, consistent with the global interpretation result in FIG. 3. We also demonstrated local attribute attribution by Lime (Marco T lio Ribeiro, Sameer Singh, and cars Guestin.2016. "while Should I Trust You". We may note that while both Home and Shap are the same as ARM-Net, taking item _ id, user _ id, and city as the three most important features, Home also gives other features a large importance weight, such as is _ free, county. This indicates that the external interpretation methods may not be consistent nor necessarily reliable, as they are only approximations of the model to be interpreted.
Figure 5 shows similar local feature ascribing results on the Diabetes130 data set. We can see that different exponential neurons focus on different cross features. Specifically, Neuron1 and Neuron2 focus more on emergency _ score and diag _1_ category, respectively, and Neuron3 focuses more on num _ diagnoses. Additionally, for this particular diabetic patient, the last five features, namely, emergency _ score, inpatient _ score, diag _1_ category, num _ diagnoses, and diabetes _ med, are the most useful attributes in the prediction of readmission. With this local interpretation, ARM-Net can support more personalized analysis and management.
As the machine learning model plays more and more important roles in various fields such as medical care, financial investment and recommendation systems, the demands on the transparency and the interpretability of the model are higher and higher, which is beneficial to debugging the learning model and is also beneficial to the verification and the improvement of the model. Furthermore, an interpretable model may also facilitate understanding in certain areas, so that trust can be generated in the analysis results.
A simple and effective method, either global or local interpretability, is feature attribution, which determines the feature importance of an input instance based on the weight and size of the features used. It is worth mentioning that based on the game theory model, the sharey value assesses the importance of each feature in the prediction, and LIME uses a linear model to locally approximate the model by input perturbation, thereby providing a local interpretation that is not limited to a particular model. The Grad-CAM provides a visual interpretation of the gradient-weighted class-activation-based mapping for the CNN-based model to highlight local regions.
Meanwhile, a model interpretation method aiming at a specific field is provided by combining with the professional knowledge of the field. For example, in the fields of medical analysis and finance, depth models are increasingly employed to achieve high prediction performance; however, this critical and high risk application underscores the need for interpretability. In particular, attention machines are widely employed to facilitate interpretability of depth models by visualizing attention weights. By integrating the attention mechanism into the model design, many studies successfully achieved interpretable medical analysis. In particular, Dipole supports access level interpretation in diagnostic prognostics with three attention mechanisms. Retain and TRACER may support interpretation of access levels and feature levels. However, one inherent limitation of most existing methods is that their interpretability is based on a single input feature, ignoring the feature interactions necessary for relational analysis.
And (5) feature interactive modeling. Cross-feature explicitly models feature interactions between attribute domains by multiplying the corresponding constituent features, which is important for predictive analysis of different applications, such as application recommendations and click predictions. Many existing efforts use DNNs to implicitly capture cross-signatures. However, implicitly modeling the multiplied feature interactions with DNNs requires a large number of hidden units, which makes the modeling process inefficient and difficult to interpret in practice.
Many models propose explicitly capturing cross-features, which generally results in better prediction performance. In these studies, some models captured second-order feature interactions, others modeled higher-order feature interactions within a predefined maximum order. Recent working AFN proposes to model arbitrary order cross features with logarithmic neurons, but this also has input limitations for logarithmic transformation and flexibility in operation. The ARM-Net of the invention provides a method for modeling characteristic interaction by using index neurons in a self-adaptive manner based on a gating multi-head attention mechanism, and the model is accurate, efficient and strong in interpretability. The core idea is to selectively and dynamically establish attribute dependence and correlation models through cross features. The input features are first converted into an exponential space, and then the interaction weight and the interaction order of each cross feature are adaptively determined. To dynamically model the arbitrary order cross-signatures and selectively filter the noise signature, we propose a new sparse attention mechanism to generate interaction weights for a given input tuple. Therefore, the ARM-Net can identify the cross features with the largest information quantity in an input perception mode, so that more accurate prediction and better interpretability are obtained in the reasoning process. Extensive experimental studies on real datasets confirm that ARM-Net consistently has superior predictive performance, global interpretability, and local interpretability for a single instance, compared to existing models.
The units described in the embodiments of the present disclosure may be implemented by software, and may also be implemented by hardware. Where the name of an element does not constitute a limitation on the element itself.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof.
The above description is only for the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A prediction method based on structured data is characterized by comprising the following contents:
obtaining the structured data tuple x ═<x1,x2,…xj,…xm>,xjRepresenting the jth attribute value, and m representing the number of the structured data attributes;
will attribute value xjConversion to an embedded vector representation ej,j∈{1,2,…,m};
Modeling feature interactions of the x based on the embedding vector using a plurality of exponential neurons;
aggregating all of the feature interactions to construct a feature vector for the x;
and performing classification prediction based on the feature vector.
2. The method of claim 1, wherein the attribute value x is a function of a number of attributesjConversion to an embedded vector representation ejThe process of (2) is as follows: when said x isjWhen the value is numerical, the value range is firstly scaled to (0, 1) according to the attribute value range]Multiplying the interval with the pre-learned embedded vector; when said x isjWhen the type is classified, the corresponding pre-learned embedded vector is directly indexed according to the value of the embedded vector.
3. The method of claim 1 or 2, wherein the order is non-fixed when modeling the feature interaction of x.
4. The method of claim 3, wherein the number of exponential neurons is K x o, where K denotes the number of attention heads, o denotes the number of exponential neurons per attention head, and K and o are both natural numbers; all the exponential neurons of each attention head share the weight matrix W of their bilinear attention functionatt
The ith exponential neuron y of each attention headiIs represented as follows:
Figure FDA0003064014900000011
Figure FDA0003064014900000012
wherein i, < > represents a Hadamard product, an exp (·) function and a corresponding exponent wijApplication by element, ejRepresenting the embedded vector i, j, m, n corresponding to the jth attribute value of the structured dataeAre natural numbers, i is more than or equal to 1 and less than or equal to o, j is more than or equal to 1 and less than or equal to m, m represents theThe number of structured data attributes,
Figure FDA0003064014900000013
nethe size of the embedding is indicated,
Figure FDA0003064014900000014
denotes yiTo ejThe derivative is taken as a function of the time,
Figure FDA0003064014900000021
denotes yiTo wijTaking the derivative, diag (·) is a diagonal matrix function;
Figure FDA0003064014900000022
represents said yiIs obtained by the following formula:
wi=zi⊙vi
wherein,
Figure FDA0003064014900000023
represents a learnable attention weight vector, ziAs a gate, the attention re-alignment weights are represented, dynamically generated by bilinear attention alignment scores, as follows:
Figure FDA0003064014900000024
Figure FDA0003064014900000025
wherein,
Figure FDA0003064014900000026
representing a learnable attention query vector, T represents a transpose operation,
Figure FDA0003064014900000027
a weight matrix representing a bilinear attention function, α -entmax (·) representing a sparse softmax, sparsity increasing with increasing α, a being a hyper-parameter for controlling sparsity,
Figure FDA0003064014900000028
5. the method of claim 4, wherein the aggregation is vector stitching.
6. The method of claim 5, wherein the non-linear feature interaction of the elements is captured by a multi-layer perceptron MLP before performing the classified prediction based on the feature vectors, and a vector representation h of the coding relationship is obtained:
Figure FDA0003064014900000029
wherein n ishThe size of the nonlinear feature interaction is represented and is a natural number.
7. The method of claim 6, wherein the classification prediction is performed by:
Figure FDA00030640149000000210
wherein,
Figure FDA00030640149000000211
and
Figure FDA00030640149000000212
respectively representing weight and offset, npRepresenting the number of predicted targets.
8. The method of claim 7, wherein the method is combined with DNN for target prediction.
9. The method of any one of claims 3-8, wherein v of the plurality of exponential neurons is measurediAnd ranking the added and averaged sequence as the influence degree of each attribute in the structured data on the target prediction.
10. The method of any one of claims 3-8, wherein w of the plurality of exponential neurons is determinediAnd ranking the added and averaged sequence as the influence degree of each attribute value in the current tuple on the target prediction result.
CN202110521123.XA 2021-05-13 2021-05-13 Structured data-based prediction method Pending CN113159449A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110521123.XA CN113159449A (en) 2021-05-13 2021-05-13 Structured data-based prediction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110521123.XA CN113159449A (en) 2021-05-13 2021-05-13 Structured data-based prediction method

Publications (1)

Publication Number Publication Date
CN113159449A true CN113159449A (en) 2021-07-23

Family

ID=76874739

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110521123.XA Pending CN113159449A (en) 2021-05-13 2021-05-13 Structured data-based prediction method

Country Status (1)

Country Link
CN (1) CN113159449A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115203471A (en) * 2022-09-15 2022-10-18 山东宝盛鑫信息科技有限公司 Attention mechanism-based multimode fusion video recommendation method
CN117555049A (en) * 2024-01-09 2024-02-13 成都师范学院 Lightning proximity forecasting method and device based on space-time attention gate control fusion network

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115203471A (en) * 2022-09-15 2022-10-18 山东宝盛鑫信息科技有限公司 Attention mechanism-based multimode fusion video recommendation method
CN115203471B (en) * 2022-09-15 2022-11-18 山东宝盛鑫信息科技有限公司 Attention mechanism-based multimode fusion video recommendation method
CN117555049A (en) * 2024-01-09 2024-02-13 成都师范学院 Lightning proximity forecasting method and device based on space-time attention gate control fusion network
CN117555049B (en) * 2024-01-09 2024-03-29 成都师范学院 Lightning proximity forecasting method and device based on space-time attention gate control fusion network

Similar Documents

Publication Publication Date Title
Divakaran et al. Temporal link prediction: A survey
Bacchi et al. Machine learning in the prediction of medical inpatient length of stay
Liu et al. Costco: A neural tensor completion model for sparse tensors
Cai et al. Arm-net: Adaptive relation modeling network for structured data
Buddhakulsomsiri et al. Association rule-generation algorithm for mining automotive warranty data
US11989667B2 (en) Interpretation of machine leaning results using feature analysis
US20040049473A1 (en) Information analytics systems and methods
CN113159450A (en) Prediction system based on structured data
US10956825B1 (en) Distributable event prediction and machine learning recognition system
CN112598111B (en) Abnormal data identification method and device
CA3080840A1 (en) System and method for diachronic machine learning architecture
CN113159449A (en) Structured data-based prediction method
Ye et al. Bug report classification using LSTM architecture for more accurate software defect locating
EP4437702A1 (en) System and methods for monitoring related metrics
Shi et al. Learned index benefits: Machine learning based index performance estimation
CN115080587B (en) Electronic component replacement method, device and medium based on knowledge graph
Zhao et al. AMEIR: Automatic behavior modeling, interaction exploration and MLP investigation in the recommender system
Strickland Data analytics using open-source tools
CN113191441A (en) Adaptive relation modeling method for structured data
CN110740111B (en) Data leakage prevention method and device and computer readable storage medium
Hanif Applications of data mining techniques for churn prediction and cross-selling in the telecommunications industry
Singh Learn PySpark: Build Python-based Machine Learning and Deep Learning Models
Alshara [Retracted] Multilayer Graph‐Based Deep Learning Approach for Stock Price Prediction
Xu et al. Dr. right!: Embedding-based adaptively-weighted mixture multi-classification model for finding right doctors with healthcare experience data
Sayeed et al. Smartic: A smart tool for Big Data analytics and IoT

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination