CN113610107A - Feature optimization method and device - Google Patents

Feature optimization method and device Download PDF

Info

Publication number
CN113610107A
CN113610107A CN202110751068.3A CN202110751068A CN113610107A CN 113610107 A CN113610107 A CN 113610107A CN 202110751068 A CN202110751068 A CN 202110751068A CN 113610107 A CN113610107 A CN 113610107A
Authority
CN
China
Prior art keywords
feature
mapping
optimized
dimension
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110751068.3A
Other languages
Chinese (zh)
Inventor
宋万鹏
陈冬雨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tongdun Technology Co ltd
Tongdun Holdings Co Ltd
Original Assignee
Tongdun Technology Co ltd
Tongdun Holdings Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tongdun Technology Co ltd, Tongdun Holdings Co Ltd filed Critical Tongdun Technology Co ltd
Priority to CN202110751068.3A priority Critical patent/CN113610107A/en
Publication of CN113610107A publication Critical patent/CN113610107A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Error Detection And Correction (AREA)

Abstract

The present disclosure provides a feature optimization method and apparatus, the method includes inputting features to be optimized into a pre-constructed feature optimization model; performing dimensionality reduction mapping on the feature to be optimized through the encoder, and mapping the feature to be optimized into a first intermediate feature; performing dimension-ascending mapping on the first intermediate feature through the decoder, and mapping the first intermediate feature into a second intermediate feature with the same dimension as the feature to be optimized; optimizing the feature to be optimized based on the feature difference between the second intermediate feature and the feature to be optimized, and determining an optimized feature. The feature optimization method can efficiently acquire the features with cross relations in the features to be optimized, reduce feature dimensions and improve calculation efficiency.

Description

Feature optimization method and device
Technical Field
The disclosure relates to the technical field of wind control, and in particular relates to a feature optimization method and device.
Background
The multi-head loan characteristic is a characteristic commonly used in the field of credit wind control modeling, the existing characteristic processing is mainly obtained by processing loan application behavior data of a user in a plurality of financial institutions, meanwhile, according to the categories and different time slices of the financial institutions, the characteristics are subjected to derivation by operators such as addition, subtraction, multiplication, division, statistics and aggregation, and finally, a characteristic set of the credit wind control modeling is obtained, so that modeling operation is performed.
The main disadvantages of the existing characteristic processing method are as follows:
firstly, the characteristic derivation and selection mainly depends on manual experience, and a great deal of manpower and time are needed;
secondly, the information which can be mined in a manual mode is limited, and valuable information of a deeper level in the characteristics cannot be mined;
thirdly, the number of the features obtained by a manual mode is large, correlation exists among the features, mutual interference is easy to occur in the modeling process, the calculated amount is large, and the feature screening efficiency is low.
Disclosure of Invention
The embodiment of the disclosure provides a feature optimization method and device, which can efficiently acquire features having a cross relationship in features to be optimized, reduce feature dimensions, and improve calculation efficiency.
In a first aspect of the embodiments of the present disclosure, a method for feature optimization is provided, where the method includes:
inputting the characteristics to be optimized into a pre-constructed characteristic optimization model, wherein the characteristic optimization model comprises an encoder and a decoder;
performing dimensionality reduction mapping on the feature to be optimized through the encoder, and mapping the feature to be optimized into a first intermediate feature;
performing dimension-ascending mapping on the first intermediate feature through the decoder, and mapping the first intermediate feature into a second intermediate feature with the same dimension as the feature to be optimized;
optimizing the feature to be optimized based on the feature difference between the second intermediate feature and the feature to be optimized, and determining an optimized feature.
In an alternative embodiment of the method according to the invention,
the encoder includes a batch normalization processing layer, a first mapping layer and a second mapping layer,
the method for performing the dimension reduction mapping on the feature to be optimized through the encoder comprises the following steps:
carrying out batch standardization processing on the features to be optimized through the batch standardization layer to obtain standardized features;
mapping, by the first mapping layer, the normalized features to third intermediate features that are the same dimension as the first mapping layer;
mapping, by the second mapping layer, the third intermediate feature to a first intermediate feature having the same dimension as the second mapping layer,
and the dimension of the second mapping layer is smaller than that of the first mapping layer, and the dimension of the first mapping layer is smaller than that of the feature to be optimized.
In an alternative embodiment of the method according to the invention,
the method of mapping the normalized features to third intermediate features of the same dimension as the first mapping layer by the first mapping layer comprises:
determining the third intermediate feature based on an activation function corresponding to an encoder, a first weight matrix corresponding to the first mapping layer, a first bias parameter corresponding to the first mapping layer, and the normalized feature;
the method of mapping the third intermediate feature to a first intermediate feature having the same dimension as the second mapping layer by the second mapping layer comprises:
determining the first intermediate feature based on the activation function corresponding to the encoder, the second weight matrix corresponding to the second mapping layer, the second bias parameter corresponding to the second mapping layer, and the third intermediate feature.
In an alternative embodiment of the method according to the invention,
the dimension of the first mapping layer is 512 dimensions, and the dimension of the second mapping layer is 128 dimensions.
In an alternative embodiment of the method according to the invention,
the method further includes training a feature optimization model,
the method for training the feature optimization model comprises the following steps:
based on the pre-obtained training features, performing dimensionality reduction mapping on the training features through an encoder of a feature optimization model to be trained, and mapping the training features into fourth intermediate features;
performing ascending-dimension mapping on the fourth intermediate feature through a decoder of a feature optimization model to be trained, and mapping the fourth intermediate feature into a fifth intermediate feature with the same dimension as the training feature;
and iteratively optimizing a loss function of the feature optimization model to be trained through a back propagation algorithm based on the feature errors of the fifth intermediate feature and the training feature so as to enable the feature errors to meet a preset convergence condition, and completing the training of the feature optimization model.
In a second aspect of the embodiments of the present disclosure, there is provided a feature optimization apparatus, including:
the device comprises a first unit, a second unit and a third unit, wherein the first unit is used for inputting the characteristics to be optimized into a pre-constructed characteristic optimization model, and the characteristic optimization model comprises an encoder and a decoder;
a second unit, configured to perform dimension reduction mapping on the feature to be optimized through the encoder, and map the feature to be optimized as a first intermediate feature;
a third unit, configured to perform, by the decoder, ascending-dimension mapping on the first intermediate feature, and map the first intermediate feature into a second intermediate feature having a dimension that is the same as that of the feature to be optimized;
and the fourth unit is used for optimizing the feature to be optimized and determining the optimized feature based on the feature difference between the second intermediate feature and the feature to be optimized.
In an alternative embodiment of the method according to the invention,
the encoder includes a batch normalization processing layer, a first mapping layer and a second mapping layer,
the second unit comprises a normalization unit, a first mapping unit and a second mapping unit,
the standardization unit is used for carrying out batch standardization processing on the features to be optimized through the batch standardization layer to obtain standardized features;
the first mapping unit is configured to map, by the first mapping layer, the normalized feature into a third intermediate feature having the same dimension as the first mapping layer;
the second mapping unit is configured to map, by the second mapping layer, the third intermediate feature into a first intermediate feature having the same dimension as the second mapping layer,
and the dimension of the second mapping layer is smaller than that of the first mapping layer, and the dimension of the first mapping layer is smaller than that of the feature to be optimized.
In an alternative embodiment of the method according to the invention,
the first mapping unit is further configured to:
determining the third intermediate feature based on an activation function corresponding to an encoder, a first weight matrix corresponding to the first mapping layer, a first bias parameter corresponding to the first mapping layer, and the normalized feature;
the second mapping unit is further configured to:
determining the first intermediate feature based on the activation function corresponding to the encoder, the second weight matrix corresponding to the second mapping layer, the second bias parameter corresponding to the second mapping layer, and the third intermediate feature.
In an alternative embodiment of the method according to the invention,
the dimension of the first mapping layer is 512 dimensions, and the dimension of the second mapping layer is 128 dimensions.
In an alternative embodiment of the method according to the invention,
the apparatus further comprises a fifth unit for training the feature optimization model;
the fifth unit is configured to:
based on the pre-obtained training features, performing dimensionality reduction mapping on the training features through an encoder of a feature optimization model to be trained, and mapping the training features into fourth intermediate features;
performing ascending-dimension mapping on the fourth intermediate feature through a decoder of a feature optimization model to be trained, and mapping the fourth intermediate feature into a fifth intermediate feature with the same dimension as the training feature;
and iteratively optimizing a loss function of the feature optimization model to be trained through a back propagation algorithm based on the feature errors of the fifth intermediate feature and the training feature so as to enable the feature errors to meet a preset convergence condition, and completing the training of the feature optimization model.
The present disclosure provides a method of feature optimization, the method comprising:
inputting the characteristics to be optimized into a pre-constructed characteristic optimization model, wherein the characteristic optimization model comprises an encoder and a decoder;
performing dimensionality reduction mapping on the feature to be optimized through the encoder, and mapping the feature to be optimized into a first intermediate feature;
dimension reduction mapping is carried out on the features to be optimized based on the encoder, so that mutual interference of high-dimensional features in the subsequent calculation process is avoided, and the calculation amount is reduced. In addition, the deep relation among the features to be optimized can be automatically learned by carrying out dimension reduction mapping on the features to be optimized, and some hidden nonlinear information features cannot be mined by the conventional feature derivation method.
Performing dimension-ascending mapping on the first intermediate feature through the decoder, and mapping the first intermediate feature into a second intermediate feature with the same dimension as the feature to be optimized;
based on the fact that the feature output by the encoder is subjected to the ascending-dimension mapping by the decoder, the difference between the input and the output of the feature optimization model can be compared, and therefore the output result of the feature optimization model implies deep valuable information.
Optimizing the feature to be optimized based on the feature difference between the second intermediate feature and the feature to be optimized, and determining an optimized feature.
By iteratively optimizing the quantization features, the interference of the correlation among the features on the model can be effectively avoided.
Drawings
FIG. 1 is a schematic flow chart diagram of a feature optimization method according to an embodiment of the present disclosure;
fig. 2 is a schematic structural diagram of a feature optimization device according to an embodiment of the disclosure.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.
The terms "first," "second," "third," "fourth," and the like in the description and in the claims of the present disclosure and in the drawings described above, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein.
It should be understood that, in various embodiments of the present disclosure, the sequence numbers of the processes do not mean the execution sequence, and the execution sequence of the processes should be determined by the functions and the inherent logic of the processes, and should not constitute any limitation on the implementation process of the embodiments of the present disclosure.
It should be understood that in the present disclosure, "including" and "having" and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It should be understood that in the present disclosure, "plurality" means two or more. "and/or" is merely an association describing an associated object, meaning that three relationships may exist, for example, and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. "comprises A, B and C" and "comprises A, B, C" means that all three of A, B, C comprise, "comprises A, B or C" means that one of A, B, C comprises, "comprises A, B and/or C" means that any 1 or any 2 or 3 of A, B, C comprises.
It should be understood that in this disclosure, "B corresponding to a", "a corresponds to B", or "B corresponds to a" means that B is associated with a, from which B can be determined. Determining B from a does not mean determining B from a alone, but may be determined from a and/or other information. And the matching of A and B means that the similarity of A and B is greater than or equal to a preset threshold value.
As used herein, "if" may be interpreted as "at … …" or "when … …" or "in response to a determination" or "in response to a detection", depending on the context.
The technical solution of the present disclosure is explained in detail below with specific examples. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments.
Fig. 1 schematically illustrates a flow chart of a feature optimization method according to an embodiment of the present disclosure, and as shown in fig. 1, the method includes:
step S101, inputting the characteristics to be optimized into a pre-constructed characteristic optimization model;
for example, the embodiment of the present disclosure takes the feature to be optimized as the multi-head loan feature as an example, and it should be noted that the embodiment of the present disclosure does not limit the category and the number of the feature to be optimized.
The multi-headed loan feature is a statistical index describing the loan behavior of a user at different classes of financial institutions, and can be used to indicate the 5 loan applications of a certain user in the banking industry within the past 30 days, for example. The multi-head borrowing and lending characteristics are also characteristics commonly used in the field of credit wind control modeling, wherein the existing multi-head borrowing and lending characteristics are mainly obtained by processing loan application behavior data of a plurality of financial institutions based on users, and meanwhile, the multi-head borrowing and lending characteristics are derived through various calculation methods (such as addition, subtraction, multiplication, division, statistics, aggregation and the like) according to the categories and different time slices of the financial institutions, and finally, a characteristic set required by credit wind control modeling is obtained, so that modeling operation is performed.
Illustratively, the feature optimization model of the embodiments of the present disclosure is constructed based on a neural network for reducing the dimensionality of the input features and obtaining an efficient feature representation of the input features;
it is to be understood that the function of the feature optimization model of the embodiment of the present disclosure may include optimizing the input features, which may be constructed based on a neural network, for example, the feature optimization model of the embodiment of the present disclosure may include a self-encoder, it should be noted that the feature optimization model may include a self-encoder, which is only exemplary, and the embodiment of the present disclosure does not limit the specific type of the feature optimization model.
Where the self-encoder is one of a neural network, an unsupervised learning algorithm, a back-propagation algorithm may be used to train the network so that the dimensions of the output from the encoder are the same as the dimensions of the input. In practical application, specific settings can be added in the self-encoder, and valuable expressions and information about input features can be learned.
The feature optimization model of the embodiments of the present disclosure may include a self-encoder, and for convenience of description, the feature optimization model is referred to as a self-encoder in the following.
The self-encoder of the disclosed embodiment may include an input layer, an encoding layer, a decoder, and an output layer,
illustratively, the input layer may be composed of features to be optimized, wherein the features to be optimized may include different user information, such as account age, behavior interval, amount of multi-head loan, etc. of the user, and the embodiments of the present disclosure do not limit the type and amount of the features to be optimized. Step S102, performing dimension reduction mapping on the feature to be optimized through the encoder, and mapping the feature to be optimized into a first intermediate feature;
in an alternative embodiment, the encoder includes a batch normalization layer, a first mapping layer and a second mapping layer,
the method for performing the dimension reduction mapping on the feature to be optimized through the encoder comprises the following steps:
carrying out batch standardization processing on the features to be optimized through the batch standardization layer to obtain standardized features;
mapping, by the first mapping layer, the normalized features to third intermediate features that are the same dimension as the first mapping layer;
mapping, by the second mapping layer, the third intermediate feature to a first intermediate feature having the same dimension as the second mapping layer,
and the dimension of the second mapping layer is smaller than that of the first mapping layer, and the dimension of the first mapping layer is smaller than that of the feature to be optimized.
Illustratively, the encoder of the embodiments of the present disclosure may include three layers, respectively, a batch normalization processing layer, a first mapping layer, and a second mapping layer.
The batch standardization processing layer of the encoder can perform dimension standardization processing on the features to be optimized to obtain standardized features.
Specifically, the batch normalization processing layer of the encoder may normalize the feature to be optimized to a distribution with a mean value of 0 and a standard deviation of 1, and for example, the method for performing the dimension normalization processing on the feature to be optimized by the batch normalization processing layer of the encoder may be represented by the following formula:
Figure BDA0003146279440000071
wherein the content of the first and second substances,
Figure BDA0003146279440000072
the feature to be optimized after the dimension standardization processing, namely the standardized feature,
Figure BDA0003146279440000073
represents the feature to be optimized before dimension normalization, μkRepresenting the mean, σ, of each dimension in the feature to be optimizedkRepresents the standard deviation of each dimension in the feature to be optimized, represents a constant different from 0, and belongs to [1, D ]],i∈[1,m]D represents the dimension of the feature to be optimized, m represents the number of the feature to be optimized, and the feature to be optimized can be represented as X ∈ RDWherein X represents the feature to be optimized, and D represents the dimension of the feature to be optimized.
Dimension standardization processing is carried out on the features to be optimized, so that the features with different dimensions/dimensions can be compared under the same dimension/dimension, namely, data with different dimensions are unified under a reference system, and the comparison is meaningful; secondly, the convergence speed of the corresponding model can be ensured to be accelerated during operation, and the operation speed of the model can be improved.
Specifically, the first mapping layer of the encoder may include a fully-connected layer with a dimension of 512 dimensions, and the second mapping layer of the encoder may include a fully-connected layer with a dimension of 128 dimensions, where the first mapping layer and the second mapping layer have similar structures and are both capable of performing dimension-reduction mapping on the input features.
In an alternative embodiment, the method of mapping the normalized feature to a third intermediate feature having the same dimension as the first mapping layer by the first mapping layer comprises:
determining the third intermediate feature based on an activation function corresponding to an encoder, a first weight matrix corresponding to the first mapping layer, a first bias parameter corresponding to the first mapping layer, and the normalized feature;
the method of mapping the third intermediate feature to a first intermediate feature having the same dimension as the second mapping layer by the second mapping layer comprises:
determining the first intermediate feature based on the activation function corresponding to the encoder, the second weight matrix corresponding to the second mapping layer, the second bias parameter corresponding to the second mapping layer, and the third intermediate feature.
Exemplarily, taking the determination of the third intermediate characteristic as an example, the method for determining the third intermediate characteristic based on the activation function corresponding to the encoder, the first weight matrix corresponding to the first mapping layer, the first bias parameter corresponding to the first mapping layer, and the normalized characteristic may be as follows:
h=α(ωx+b)
wherein h denotes the third intermediate feature, α denotes an activation function corresponding to the encoder, ω denotes a first weight matrix corresponding to the first mapping layer, x denotes the normalized feature, and b denotes a first bias parameter corresponding to the first mapping layer.
The above manner is only an exemplary description of the method for determining the third intermediate feature, and the method for determining the first intermediate feature in the embodiment of the present disclosure may refer to the above contents, except that each parameter in the formula for determining the first intermediate feature corresponds to the second mapping layer, which is not described herein again in the embodiment of the present disclosure.
It can be understood that the traditional feature development method needs a lot of time, the feature derivation mainly depends on manual experience, a lot of manpower and time are needed, information obtained by mining derived features through manual experience is limited, meanwhile, high-dimensional features are easy to interfere with each other in the subsequent modeling process, the calculation amount is large, and the feature screening efficiency is low.
In the embodiment of the disclosure, the deep-level relationship between the features to be optimized is automatically learned through the multi-layer network structure in the self-encoder, and some hidden nonlinear information features which cannot be mined by the conventional feature derivation method can be mined, for example, two accounts which are seemingly unrelated but actually belong to a certain specific organization, so that the two accounts are further analyzed, which cannot be done in a manual manner.
The features to be optimized may be further processed by a decoder after being encoded by the encoder.
Step S103, performing ascending dimension mapping on the first intermediate feature through the decoder, and mapping the first intermediate feature into a second intermediate feature with the same dimension as the feature to be optimized;
specifically, a first intermediate feature may be subjected to dimension-up mapping by a decoder, and the first intermediate feature may be mapped to a second intermediate feature having the same dimension as the feature to be optimized.
For example, the method for performing the ascending-dimension mapping on the first intermediate feature to obtain the second intermediate feature may be as follows:
Figure BDA0003146279440000091
wherein the content of the first and second substances,
Figure BDA0003146279440000092
a second intermediate characteristic is represented that is,
Figure BDA0003146279440000095
representing the corresponding activation function of the decoder,
Figure BDA0003146279440000093
representing a weight matrix with the same dimension as the feature to be optimized, h representing a first intermediate feature,
Figure BDA0003146279440000094
indicating the corresponding offset parameters of the decoder.
And the decoder performs dimension-ascending mapping on the first intermediate characteristic to obtain a second intermediate characteristic with the dimension same as that of the characteristic to be optimized. The self-encoder is used as a special type of neural network, the dimension of the output of the self-encoder is the same as that of the input, and the loss function of the self-encoder can be optimized iteratively by comparing the difference between the input and the output, so that the output result obtained from the encoder implies deep-level valuable information.
S104, optimizing the feature to be optimized based on the feature difference between the second intermediate feature and the feature to be optimized, and determining an optimized feature;
in an alternative embodiment, the method further comprises training a feature optimization model,
the method for training the feature optimization model comprises the following steps:
based on the pre-obtained training features, performing dimensionality reduction mapping on the training features through an encoder of a feature optimization model to be trained, and mapping the training features into fourth intermediate features;
performing ascending-dimension mapping on the fourth intermediate feature through a decoder of a feature optimization model to be trained, and mapping the fourth intermediate feature into a fifth intermediate feature with the same dimension as the training feature;
and iteratively optimizing a loss function of the feature optimization model to be trained through a back propagation algorithm based on the feature errors of the fifth intermediate feature and the training feature so as to enable the feature errors to meet a preset convergence condition, and completing the training of the feature optimization model. Specifically, the loss function of the feature optimization model to be trained can be iteratively optimized through a back propagation algorithm according to a method shown in the following formula:
Figure BDA0003146279440000101
wherein x represents a training feature,
Figure BDA0003146279440000103
a fifth intermediate characteristic is shown which is,
Figure BDA0003146279440000102
the loss function is represented.
In practical applications, the loss function of the self-encoder can be iteratively optimized through a back propagation algorithm, and the method for iteratively optimizing the loss function of the self-encoder is not limited in the embodiment of the disclosure.
For example, the preset convergence condition of the embodiment of the present disclosure may include that the feature error of the fifth intermediate feature and the training feature converges around a certain preset value. It should be noted that the above convergence condition in the present disclosure is only an exemplary description, and the preset convergence condition is not limited in the embodiment of the present disclosure.
Through iterative optimization of the loss function of the feature optimization model to be trained, valuable information implicit in the features to be optimized can be effectively learned, and interference of correlation among the features on modeling is avoided.
For example, after the optimization features are obtained, they may be input into a credit wind control model. The credit wind control model of the embodiment of the disclosure generally takes whether a user violates a specified period as a prediction target, and is a binary model constructed by a machine learning algorithm, and outputs the probability of the user violation.
In the embodiment of the disclosure, the optimization features can be used as the feature input of the credit wind control model, the dimensionality of the optimization features is lower, the feature representation is more accurate, the optimization features are used as the input of the credit wind control model, the step of screening out noise in the input features is omitted, and the calculation efficiency and the accuracy of the output result of the credit wind control model are improved.
The present disclosure provides a method of feature optimization, the method comprising:
inputting the characteristics to be optimized into a pre-constructed characteristic optimization model, wherein the characteristic optimization model comprises an encoder and a decoder;
performing dimensionality reduction mapping on the feature to be optimized through the encoder, and mapping the feature to be optimized into a first intermediate feature;
dimension reduction mapping is carried out on the features to be optimized based on the encoder, so that mutual interference of high-dimensional features in the subsequent calculation process is avoided, and the calculation amount is reduced. In addition, the deep relation among the features to be optimized can be automatically learned by carrying out dimension reduction mapping on the features to be optimized, and some hidden nonlinear information features cannot be mined by the conventional feature derivation method.
Performing dimension-ascending mapping on the first intermediate feature through the decoder, and mapping the first intermediate feature into a second intermediate feature with the same dimension as the feature to be optimized;
based on the fact that the feature output by the encoder is subjected to the ascending-dimension mapping by the decoder, the difference between the input and the output of the feature optimization model can be compared, and therefore the output result of the feature optimization model implies deep valuable information.
Optimizing the feature to be optimized based on the feature difference between the second intermediate feature and the feature to be optimized, and determining an optimized feature.
By iteratively optimizing the quantization features, the interference of the correlation among the features on the model can be effectively avoided.
Fig. 2 schematically illustrates a structural diagram of a feature optimization device according to an embodiment of the present disclosure, and as shown in fig. 2, the device includes:
a first unit 21, configured to input a feature to be optimized into a pre-constructed feature optimization model, where the feature optimization model includes an encoder and a decoder;
a second unit 22, configured to perform dimension reduction mapping on the feature to be optimized through the encoder, and map the feature to be optimized into a first intermediate feature;
a third unit 23, configured to perform, by the decoder, ascending-dimension mapping on the first intermediate feature, and map the first intermediate feature into a second intermediate feature having the same dimension as the feature to be optimized;
a fourth unit 24, configured to optimize the feature to be optimized based on a feature difference between the second intermediate feature and the feature to be optimized, and determine an optimized feature.
In an alternative embodiment of the method according to the invention,
the encoder includes a batch normalization processing layer, a first mapping layer and a second mapping layer,
said second unit 22 comprises a normalization unit and a first mapping unit and a second mapping unit,
the standardization unit is used for carrying out batch standardization processing on the features to be optimized through the batch standardization layer to obtain standardized features;
the first mapping unit is configured to map, by the first mapping layer, the normalized feature into a third intermediate feature having the same dimension as the first mapping layer;
the second mapping unit is configured to map, by the second mapping layer, the third intermediate feature into a first intermediate feature having the same dimension as the second mapping layer,
and the dimension of the second mapping layer is smaller than that of the first mapping layer, and the dimension of the first mapping layer is smaller than that of the feature to be optimized.
In an alternative embodiment of the method according to the invention,
the first mapping unit is further configured to:
determining the third intermediate feature based on an activation function corresponding to an encoder, a first weight matrix corresponding to the first mapping layer, a first bias parameter corresponding to the first mapping layer, and the normalized feature;
the second mapping unit is further configured to:
determining the first intermediate feature based on the activation function corresponding to the encoder, the second weight matrix corresponding to the second mapping layer, the second bias parameter corresponding to the second mapping layer, and the third intermediate feature.
In an alternative embodiment of the method according to the invention,
the dimension of the first mapping layer is 512 dimensions, and the dimension of the second mapping layer is 128 dimensions.
In an alternative embodiment of the method according to the invention,
the apparatus further comprises a fifth unit for training the feature optimization model;
the fifth unit is configured to:
based on the pre-obtained training features, performing dimensionality reduction mapping on the training features through an encoder of a feature optimization model to be trained, and mapping the training features into fourth intermediate features;
performing ascending-dimension mapping on the fourth intermediate feature through a decoder of a feature optimization model to be trained, and mapping the fourth intermediate feature into a fifth intermediate feature with the same dimension as the training feature;
and iteratively optimizing a loss function of the feature optimization model to be trained through a back propagation algorithm based on the feature errors of the fifth intermediate feature and the training feature so as to enable the feature errors to meet a preset convergence condition, and completing the training of the feature optimization model.
It should be noted that, for the beneficial effects of the feature optimization device in the embodiment of the present disclosure, reference may be made to the beneficial effects of the feature optimization method, and details of the embodiment of the present disclosure are not repeated here.
The present disclosure also provides a program product comprising execution instructions stored in a readable storage medium. The at least one processor of the device may read the execution instructions from the readable storage medium, and the execution of the execution instructions by the at least one processor causes the device to implement the methods provided by the various embodiments described above.
The readable storage medium may be a computer storage medium or a communication medium. Communication media includes any medium that facilitates transfer of a computer program from one place to another. Computer storage media may be any available media that can be accessed by a general purpose or special purpose computer. For example, a readable storage medium is coupled to the processor such that the processor can read information from, and write information to, the readable storage medium. Of course, the readable storage medium may also be an integral part of the processor. The processor and the readable storage medium may reside in an Application Specific Integrated Circuits (ASIC). Additionally, the ASIC may reside in user equipment. Of course, the processor and the readable storage medium may also reside as discrete components in a communication device. The readable storage medium may be a read-only memory (ROM), a random-access memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
In the above embodiments of the terminal or the server, it should be understood that the Processor may be a Central Processing Unit (CPU), other general-purpose processors, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present disclosure may be embodied directly in a hardware processor, or in a combination of the hardware and software modules within the processor.
Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the present disclosure, and not for limiting the same; while the present disclosure has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art will understand that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present disclosure.

Claims (10)

1. A method of feature optimization, the method comprising:
inputting the characteristics to be optimized into a pre-constructed characteristic optimization model, wherein the characteristic optimization model comprises an encoder and a decoder;
performing dimensionality reduction mapping on the feature to be optimized through the encoder, and mapping the feature to be optimized into a first intermediate feature;
performing dimension-ascending mapping on the first intermediate feature through the decoder, and mapping the first intermediate feature into a second intermediate feature with the same dimension as the feature to be optimized;
optimizing the feature to be optimized based on the feature difference between the second intermediate feature and the feature to be optimized, and determining an optimized feature.
2. The method of claim 1, wherein the encoder includes a batch normalization layer, a first mapping layer, and a second mapping layer,
the method for performing the dimension reduction mapping on the feature to be optimized through the encoder comprises the following steps:
carrying out batch standardization processing on the features to be optimized through the batch standardization layer to obtain standardized features;
mapping, by the first mapping layer, the normalized features to third intermediate features that are the same dimension as the first mapping layer;
mapping, by the second mapping layer, the third intermediate feature to a first intermediate feature having the same dimension as the second mapping layer,
and the dimension of the second mapping layer is smaller than that of the first mapping layer, and the dimension of the first mapping layer is smaller than that of the feature to be optimized.
3. The method of claim 2,
the method of mapping the normalized features to third intermediate features of the same dimension as the first mapping layer by the first mapping layer comprises:
determining the third intermediate feature based on an activation function corresponding to an encoder, a first weight matrix corresponding to the first mapping layer, a first bias parameter corresponding to the first mapping layer, and the normalized feature;
the method of mapping the third intermediate feature to a first intermediate feature having the same dimension as the second mapping layer by the second mapping layer comprises:
determining the first intermediate feature based on the activation function corresponding to the encoder, the second weight matrix corresponding to the second mapping layer, the second bias parameter corresponding to the second mapping layer, and the third intermediate feature.
4. The method of claim 3, wherein the first mapping layer has a dimension of 512 dimensions and the second mapping layer has a dimension of 128 dimensions.
5. The method of claim 1, further comprising training a feature optimization model,
the method for training the feature optimization model comprises the following steps:
based on the pre-obtained training features, performing dimensionality reduction mapping on the training features through an encoder of a feature optimization model to be trained, and mapping the training features into fourth intermediate features;
performing ascending-dimension mapping on the fourth intermediate feature through a decoder of a feature optimization model to be trained, and mapping the fourth intermediate feature into a fifth intermediate feature with the same dimension as the training feature;
and iteratively optimizing a loss function of the feature optimization model to be trained through a back propagation algorithm based on the feature errors of the fifth intermediate feature and the training feature so as to enable the feature errors to meet a preset convergence condition, and completing the training of the feature optimization model.
6. A feature optimization device, the device comprising:
the device comprises a first unit, a second unit and a third unit, wherein the first unit is used for inputting the characteristics to be optimized into a pre-constructed characteristic optimization model, and the characteristic optimization model comprises an encoder and a decoder;
a second unit, configured to perform dimension reduction mapping on the feature to be optimized through the encoder, and map the feature to be optimized as a first intermediate feature;
a third unit, configured to perform, by the decoder, ascending-dimension mapping on the first intermediate feature, and map the first intermediate feature into a second intermediate feature having a dimension that is the same as that of the feature to be optimized;
and the fourth unit is used for optimizing the feature to be optimized and determining the optimized feature based on the feature difference between the second intermediate feature and the feature to be optimized.
7. The apparatus of claim 6, wherein the encoder comprises a batch normalization layer, a first mapping layer, and a second mapping layer,
the second unit comprises a normalization unit, a first mapping unit and a second mapping unit,
the standardization unit is used for carrying out batch standardization processing on the features to be optimized through the batch standardization layer to obtain standardized features;
the first mapping unit is configured to map, by the first mapping layer, the normalized feature into a third intermediate feature having the same dimension as the first mapping layer;
the second mapping unit is configured to map, by the second mapping layer, the third intermediate feature into a first intermediate feature having the same dimension as the second mapping layer,
and the dimension of the second mapping layer is smaller than that of the first mapping layer, and the dimension of the first mapping layer is smaller than that of the feature to be optimized.
8. The apparatus of claim 7,
the first mapping unit is further configured to:
determining the third intermediate feature based on an activation function corresponding to an encoder, a first weight matrix corresponding to the first mapping layer, a first bias parameter corresponding to the first mapping layer, and the normalized feature;
the second mapping unit is further configured to:
determining the first intermediate feature based on the activation function corresponding to the encoder, the second weight matrix corresponding to the second mapping layer, the second bias parameter corresponding to the second mapping layer, and the third intermediate feature.
9. The apparatus of claim 8, wherein the first mapping layer has a dimension of 512 dimensions and the second mapping layer has a dimension of 128 dimensions.
10. The apparatus of claim 6, further comprising a fifth unit for training the feature optimization model;
the fifth unit is configured to:
based on the pre-obtained training features, performing dimensionality reduction mapping on the training features through an encoder of a feature optimization model to be trained, and mapping the training features into fourth intermediate features;
performing ascending-dimension mapping on the fourth intermediate feature through a decoder of a feature optimization model to be trained, and mapping the fourth intermediate feature into a fifth intermediate feature with the same dimension as the training feature;
and iteratively optimizing a loss function of the feature optimization model to be trained through a back propagation algorithm based on the feature errors of the fifth intermediate feature and the training feature so as to enable the feature errors to meet a preset convergence condition, and completing the training of the feature optimization model.
CN202110751068.3A 2021-07-02 2021-07-02 Feature optimization method and device Pending CN113610107A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110751068.3A CN113610107A (en) 2021-07-02 2021-07-02 Feature optimization method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110751068.3A CN113610107A (en) 2021-07-02 2021-07-02 Feature optimization method and device

Publications (1)

Publication Number Publication Date
CN113610107A true CN113610107A (en) 2021-11-05

Family

ID=78303942

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110751068.3A Pending CN113610107A (en) 2021-07-02 2021-07-02 Feature optimization method and device

Country Status (1)

Country Link
CN (1) CN113610107A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114399384A (en) * 2022-03-25 2022-04-26 鲁担(山东)数据科技有限公司 Risk strategy generation method, system and device based on privacy calculation
CN116204843A (en) * 2023-04-24 2023-06-02 北京芯盾时代科技有限公司 Abnormal account detection method and device, electronic equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109815223A (en) * 2019-01-21 2019-05-28 北京科技大学 A kind of complementing method and complementing device for industry monitoring shortage of data
WO2019219198A1 (en) * 2018-05-17 2019-11-21 Huawei Technologies Co., Ltd. Device and method for clustering of input-data
CN112529767A (en) * 2020-12-01 2021-03-19 平安科技(深圳)有限公司 Image data processing method, image data processing device, computer equipment and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019219198A1 (en) * 2018-05-17 2019-11-21 Huawei Technologies Co., Ltd. Device and method for clustering of input-data
CN109815223A (en) * 2019-01-21 2019-05-28 北京科技大学 A kind of complementing method and complementing device for industry monitoring shortage of data
CN112529767A (en) * 2020-12-01 2021-03-19 平安科技(深圳)有限公司 Image data processing method, image data processing device, computer equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114399384A (en) * 2022-03-25 2022-04-26 鲁担(山东)数据科技有限公司 Risk strategy generation method, system and device based on privacy calculation
CN116204843A (en) * 2023-04-24 2023-06-02 北京芯盾时代科技有限公司 Abnormal account detection method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN108763277B (en) Data analysis method, computer readable storage medium and terminal device
WO2021164317A1 (en) Sequence mining model training method, sequence data processing method and device
CN113610107A (en) Feature optimization method and device
CN110751557A (en) Abnormal fund transaction behavior analysis method and system based on sequence model
CN111507470A (en) Abnormal account identification method and device
CN111428557A (en) Method and device for automatically checking handwritten signature based on neural network model
CN111260189B (en) Risk control method, risk control device, computer system and readable storage medium
CN113177700B (en) Risk assessment method, system, electronic equipment and storage medium
WO2019242627A1 (en) Data processing method and apparatus
CN112988840A (en) Time series prediction method, device, equipment and storage medium
CN114078008A (en) Abnormal behavior detection method, device, equipment and computer readable storage medium
CN115392937A (en) User fraud risk identification method and device, electronic equipment and storage medium
CN114169439A (en) Abnormal communication number identification method and device, electronic equipment and readable medium
CN113159213A (en) Service distribution method, device and equipment
CN114418158A (en) Cell network load index prediction method based on attention mechanism learning network
CN115204322B (en) Behavior link abnormity identification method and device
CN115905648A (en) Gaussian mixture model-based user group and financial user group analysis method and device
CN115422000A (en) Abnormal log processing method and device
CN114936701A (en) Real-time monitoring method and device for comprehensive energy consumption and terminal equipment
CN110322150B (en) Information auditing method, device and server
CN113052692A (en) Data processing method and device, electronic equipment and computer readable storage medium
CN111400413A (en) Method and system for determining category of knowledge points in knowledge base
CN110852392A (en) User grouping method, device, equipment and medium
CN113837183B (en) Multi-stage certificate intelligent generation method, system and medium based on real-time mining
CN117009883B (en) Object classification model construction method, object classification method, device and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination