CN112308196A

CN112308196A - Method, apparatus, device and storage medium for training a model

Info

Publication number: CN112308196A
Application number: CN202011192866.9A
Authority: CN
Inventors: 周洋杰; 方军; 陈亮辉; 付琰
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-10-30
Filing date: 2020-10-30
Publication date: 2021-02-02

Abstract

The present application discloses a method, apparatus, device and storage medium for training a model, and relates to the fields of artificial intelligence and big data. The specific implementation scheme is: obtaining initial sample data, basic feature column data for correlation comparison, and a model to be trained; based on the initial sample data and basic feature column data, determine each feature column data and basic feature column in the initial sample data The initial correlation of the data; based on the initial correlation, update the monotonicity constraint vector in the model to be trained to train the model to be trained. This implementation method optimizes and trains the model to be trained by using the specified basic feature column for correlation comparison and sample data for model training, and iteratively updates the monotonicity constraint vector in the model to be trained. A tree model that reasonably generates monotonicity-constrained vectors.

Description

Method, apparatus, device and storage medium for training a model

Technical Field

The present application relates to the field of artificial intelligence, and further relates to the field of big data, and in particular, to a method, apparatus, device, and storage medium for training a model.

Background

In the field of financial wind control, for deep learning, a neural network always has an distrust feeling, and the requirement on model interpretability is high, so that the monotonicity of model prediction scoring also has a stronger requirement, if the model scoring is non-monotonous, a sample with a high corresponding scoring exists, and the proportion of an actual bad sample is lower than that of a sample with a low corresponding scoring instead, so that the abnormal contradiction exists.

Most finance companies will seek other data sources to provide predictive scores for samples as one of the in-module factors of their own models, and usually merge the fusion in a manner based on weighted averages, or simpler logical models. The fusion mode has strong requirements on monotonicity of the characteristic factors, and if the monotonicity is not smooth, the effect after fusion is easy to deteriorate. At present, no effective method exists for judging the correlation of variable samples.

Disclosure of Invention

The present disclosure provides a method, apparatus, device, and storage medium for training a model.

According to an aspect of the present disclosure, there is provided a method for training a model, comprising: acquiring initial sample data, basic characteristic column data for correlation comparison and a model to be trained; determining initial correlation of each feature column data and the basic feature column data in the initial sample data based on the initial sample data and the basic feature column data; and updating monotonicity constraint vectors in the model to be trained based on the initial correlation so as to train the model to be trained.

According to another aspect of the present disclosure, there is provided an apparatus for training a model, comprising: the system comprises an acquisition unit, a training unit and a training unit, wherein the acquisition unit is configured to acquire initial sample data, basic characteristic column data for correlation comparison and a model to be trained; the correlation determination unit is configured to determine initial correlation between each line of data in the initial sample data and the basic feature line data based on the initial sample data and the basic feature line data; and the training unit is configured to update the monotonicity constraint vector in the model to be trained based on the initial correlation so as to train the model to be trained.

According to yet another aspect of the present disclosure, there is provided an electronic device for training a model, comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the method for training a model as described above.

According to yet another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method for training a model as described above.

According to the method and the device, the problem that an effective method for judging the correlation of the variable sample does not exist at present is solved, the specified basic characteristic column for correlation comparison and the sample data for model training are utilized to optimally train the model to be trained, the monotonicity constraint vector in the model to be trained is iteratively updated, and the tree model capable of independently and reasonably generating the monotonicity constraint vector can be trained.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:

FIG. 1 is an exemplary system architecture diagram in which one embodiment of the present application may be applied;

FIG. 2 is a flow diagram of one embodiment of a method for training a model according to the present application;

FIG. 3 is a schematic diagram of an application scenario of a method for training a model according to the present application;

FIG. 4 is a flow diagram of another embodiment of a method for training a model according to the present application;

FIG. 5 is a schematic diagram of an embodiment of an apparatus for training a model according to the present application;

FIG. 6 is a block diagram of an electronic device for implementing a method for training a model according to an embodiment of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

Fig. 1 illustrates an exemplary system architecture 100 to which embodiments of the present method for training a model or apparatus for training a model may be applied.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. Various communication client applications, such as data processing applications, model training applications, etc., may be installed on the

terminal devices

101, 102, 103.

The

terminal apparatuses

101, 102, and 103 may be hardware or software. When the

terminal devices

101, 102, 103 are hardware, they may be various electronic devices including, but not limited to, smart phones, tablet computers, car computers, laptop portable computers, desktop computers, and the like. When the

terminal apparatuses

101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented as multiple pieces of software or software modules, or as a single piece of software or software module. And is not particularly limited herein.

The server 105 may be a server providing various services, such as a background server performing model training using initial sample data collected by the

terminal devices

101, 102, 103 and basic feature column data for performing correlation comparison. The background server can acquire initial sample data, basic characteristic column data for correlation comparison and a model to be trained, and determines initial correlation of each characteristic column data and the basic characteristic column data in the initial sample data based on the initial sample data and the basic characteristic column data; and updating monotonicity constraint vectors in the model to be trained based on the initial correlation so as to train the model to be trained.

The server 105 may be hardware or software. When the server 105 is hardware, it may be implemented as a distributed server cluster composed of a plurality of servers, or may be implemented as a single server. When the server 105 is software, it may be implemented as a plurality of software or software modules, or as a single software or software module. And is not particularly limited herein.

It should be noted that the method for training the model provided in the embodiment of the present application is generally performed by the server 105. Accordingly, the means for training the model is typically located in the server 105.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to FIG. 2, a flow 200 of one embodiment of a method for training a model according to the present application is shown. The method for training the model of the embodiment comprises the following steps:

step 201, obtaining initial sample data, basic feature column data for correlation comparison and a model to be trained.

In this embodiment, an executing entity (for example, the server 105 in fig. 1) of the method for training the model may obtain initial sample data, basic feature column data for performing correlation comparison, and the model to be trained from a local storage or an external database server through a wired connection or a wireless connection. The initial sample data may be sample data of user information for risk assessment in the field of financial wind control, and specifically may be data of income level, gender, age, height, weight, deposit, academic calendar, marital, debt, work and rest, consumption purpose, single consumption amount, and the like of each user. The model to be trained can be an XGB model to be trained, and the purpose is to train the XGB model to be trained into an XGB tree model capable of generating monotonicity constraint vectors independently and reasonably.

Step 202, determining an initial correlation between each feature column data and the basic feature column data in the initial sample data based on the initial sample data and the basic feature column data.

After the execution main body obtains the initial sample data, the basic feature column data for performing correlation comparison and the model to be trained, the initial correlation between each feature column data and the basic feature column data in the initial sample data can be determined. Specifically, the execution main body may divide the subspace of the initial sample data according to a preset rule, and of course, the execution main body may also perform an equivalent division of the initial sample data randomly according to an arrangement sequence of the data, so as to obtain the subspace. The manner in which the subspace is partitioned is not specifically limited in this application. After the execution subject divides the subspace of the initial sample data, the correlation of each feature column data in the initial sample data relative to the basic feature column data may be determined based on each subspace, the initial sample data, the basic feature column data, and a preset threshold. Specifically, the execution subject may draw a two-dimensional scattergram for each feature line data of the initial sample data in each subspace and a preset basic feature line data. On the drawn two-dimensional scattergram, when points of the scattergram are uniformly scattered in a region from the lower left corner to the upper right corner or points of the scattergram are uniformly scattered in a region from the upper left corner to the lower right corner, it can be determined that the points on the two-dimensional scattergram are correlated, that is, feature line data corresponding to the points on the two-dimensional scattergram and preset base feature line data are correlated. When the deviation distance of the existing point on the scatter diagram relative to each point uniformly scattered in the region from the lower left corner to the upper right corner or each point in the region from the upper left corner to the lower right corner is larger than a preset threshold value, the point is deleted, that is, the feature column data corresponding to the point is not related to the preset basic feature column data. And determining the initial correlation of each characteristic column data and the basic characteristic column data in the initial sample data according to the obtained result. The initial correlation is the correlation between initial sample data without monotonicity constraint and preset basic feature column data. The basic characteristic line data may be, for example, preset gender characteristic line data. The specific content of the basic feature column data is not limited in the present application.

And step 203, updating the monotonicity constraint vector in the model to be trained based on the initial correlation so as to train the model to be trained.

After determining the initial correlation, the execution subject may update the monotonicity constraint vector in the model to be trained based on the initial correlation to train the model to be trained. Specifically, the execution subject may fill the obtained initial correlation into a monotonicity constraint vector of the model to be trained, to obtain an updated monotonicity constraint vector. And then the execution main body can re-determine the characteristic column data in the initial sample data needing correlation calculation according to the updated monotonicity constraint vector, re-calculate the correlation between the re-determined characteristic column data and the basic characteristic column data, update the obtained correlation, update the monotonicity constraint vector in the model to be trained according to the updated correlation, re-determine the characteristic column data in the initial sample data needing correlation calculation according to the updated monotonicity constraint vector until the difference between the characteristic column data needing correlation calculation in the determined initial sample data and the characteristic column data needing correlation calculation determined last time is smaller than a preset threshold value or is not changed, and finish training the model to be trained.

With continued reference to FIG. 3, a schematic diagram of one application scenario of a method for training a model according to the present application is shown. In the credit application scenario of fig. 3, the server 304 acquires initial sample data 301, base feature column data (whether there is a breach) 302 for correlation comparison, and the model to be trained 303. The server 304 determines an initial correlation 305 between each feature column data (name, age, household registration, academic calendar, height, address, wage, sex) in the initial sample data 301 and the basic feature column data (whether default) 302 based on the initial sample data 301 and the basic feature column data 302. Server 304 updates monotonicity constraint vectors 306 in the model to be trained based on initial correlations 305 to train model 303 to be trained.

In the embodiment, the specified basic feature column for correlation comparison and the sample data for model training are used for carrying out optimization training on the model to be trained, and the monotonicity constraint vector in the model to be trained is updated in an iterative manner, so that the tree model capable of generating the monotonicity constraint vector independently and reasonably can be obtained through training.

With continued reference to FIG. 4, a flow 400 of another embodiment of a method for training a model according to the present application is shown. As shown in fig. 4, the method for training a model of the present embodiment may include the following steps:

step 401, obtaining initial sample data, basic feature column data for correlation comparison, and a model to be trained.

Step 402, determining an initial correlation between each feature column data and the basic feature column data in the initial sample data based on the initial sample data and the basic feature column data.

The principle of step 401 to step 402 is similar to that of step 201 to step 202, and is not described herein again.

Specifically, step 402 may be implemented by the following step 4021:

step 4021, determining initial correlation between each feature column data and the basic feature column data in the initial sample data according to the initial sample data, the basic feature column data and the logistic regression model.

In this embodiment, the logistic regression model is used to characterize the corresponding relationship between each feature column data, the basic feature column data, and the correlation between the two. After the execution subject acquires the initial sample data, the execution subject may partition a subspace for the initial sample data. Specifically, the execution principal may first input initial sample data into the XGB model, perform non-monotonicity constraint, and obtain the initialization tree model. The initialized tree model is a tree structure corresponding to the initial sample data. According to the tree structure corresponding to the obtained initial sample data, the division space of the initial sample data can be known, and each subspace for dividing the initial sample data is obtained.

Specifically, in this embodiment, after determining the tree structure corresponding to the initial sample data, the execution subject may divide a subspace for the initial sample data according to the obtained tree structure. Specifically, each subspace is automatically divided by the tree structure, each leaf node in the tree structure is a subspace, and each subspace contains partial sample data meeting the subspace constraint. For example: the tree structure corresponding to the output initial sample data is as follows:

0:[f103393<33.5]yes＝1，no＝2，missing＝1

1:[f122749<34.5]yes＝3，no＝4，missing＝3

3:leaf＝0.0215067528

4:leaf＝-0.0601626039

2:[f103366<-0.388199508]yes＝5，no＝6，missing＝6

5:leaf＝-0.071560092

6:leaf＝-0.0131947799

in the tree structure, a leaf is a leaf node, is also an output space of the model, is also a subspace, all samples correspondingly fall on a certain leaf node, and the corresponding numerical value is the output of the model.

In the tree structure, the leftmost node is the node ID of the tree: 0, and the record line of the tree node (i.e., the square bracket contains the feature ID: f103393 and the threshold value: 33.5) additionally contains a Missing value (Missing indicates the direction of the Missing value) corresponding to whether the left node or the right node should be. The left nodes (yes) are all subspaces corresponding to the inequality constraints in the square brackets when the inequality constraints are satisfied. For example, it is seen from the row of the tree structure node 1 that missing-3 is the left node because yes-3.

The execution principal, after partitioning the initial sample data into subspaces, may determine the data type of the initial sample data in each subspace. Specifically, the execution subject may obtain initial sample data of a corresponding subspace according to the divided subspace. The execution subject can judge the data type of the initial sample data in each subspace according to the initial sample data, the data numerical value and the data coverage rate of the corresponding subspace. XGB algorithm principle: the algorithm idea is to continuously add trees, continuously perform feature splitting to grow a tree, and each time a tree is added, actually learn a new function to fit the residual error predicted last time. When the training is completed to obtain k trees, the score of a sample is to be predicted, namely, according to the characteristics of the sample, a corresponding leaf node is fallen in each tree, each leaf node corresponds to a score, and finally, the predicted value of the sample is obtained by only adding the scores corresponding to each tree.

For example, when the execution principal determines that the data value of the initial sample data in the subspace takes only 0/1, the execution principal may determine that the initial sample data in the subspace is a one-hot discrete value. When the execution main body determines that the value of the data value in the subspace is 0-1 or 0-99 and the coverage rate of the value of the corresponding subspace is smaller than the threshold set by the user, the execution main body can judge that the initial sample data in the subspace is discrete data. The execution main body can carry out one-hot coding according to the threshold value of the output model of the XGB model, and meanwhile, the execution main body can convert missing values in the discrete data into corresponding codes according to the division of the XGB model. Of course, the execution subject may also determine the data type of the initial sample data in each subspace according to the data type of each column that has been formulated by the user. For example, if there is a list of percentage features, the actual meaning is the confidence that the movie is liked, and if the user thinks that this percentage can actually be regarded as a piece of discrete variable data (since the user subjectively thinks this comparison is like a discrete feature through visualization or otherwise), the user can customize the feature list accordingly (customized as continuous data or discrete data), and the subject can then perform further data processing according to the user-defined data type. After the discrete data in each subspace is determined, the remaining data are continuous data, missing values are ignored by the continuous data, and the missing values are not processed. In particular, discrete data may refer to variables that may be listed in an order, usually in integer numbers. Such as the number of workers, the number of factories, the number of machines, the age, etc. The numerical value of the discrete variable is obtained by a counting method. The continuous data can refer to variables which can be randomly valued in a certain interval, the numerical values are continuous, and two adjacent numerical values can be infinitely divided, namely infinite numerical values can be obtained. For example, the dimensions of the parts to be produced, the height, weight, chest circumference and the like measured by a human body are continuous variables, and the values can only be obtained by a measuring or metering method. If a certain feature of the original sample data is used by multiple trees (i.e. there are multiple subspaces), the subspace corresponding to the earliest used tree is taken as the standard.

The execution subject may respectively use the initial sample data and the basic feature column data in each subspace as input of the logistic regression model, respectively use the correlation between the initial sample data and the basic feature data in each subspace as output, and determine the initial correlation between each feature column data and the basic feature column data in each subspace divided by the initial sample data.

Specifically, after determining the data type of the initial sample data in each subspace, the execution subject may determine the correlation between each feature column data in the initial sample data in each subspace and the basic feature column data based on the data type, the initial sample data in each subspace, and a preset threshold. Specifically, after filling missing values in the discrete feature column data in each determined subspace, the execution main body inputs the filled discrete feature column data and the determined continuous feature column data in each subspace into a pre-trained Logistic Regression (LR) model, and the model outputs each P-value corresponding to the T statistic value of each discrete feature column data and each continuous feature column data in each subspace. The T statistic is a statistic of a hypothesis test for checking whether there is a correlation between variable data. P-value represents the degree of support for the original hypothesis, and the P-value is the probability of a sample result occurring given the original hypothesis being true. After obtaining each P-value, the execution body may determine whether there is a correlation between the feature column data and the basic feature column data in the subspace corresponding to each P-value according to a P-value threshold (i.e., a preset threshold) specified by a user. Specifically, the execution body may compare each P-value with a P-value threshold, and when the P-value is greater than the P-value threshold, the execution body may determine that there is a correlation between the feature column data in the subspace corresponding to the P-value and the base feature column data. Therefore, through comparison of the P-value corresponding to each feature column data in each subspace and the P-value threshold value, correlation between each feature column data in each subspace and the basic feature column data can be determined according to the obtained comparison result. In addition, the execution main body can also determine the weight value of the correlated sample data in each determined subspace in the logistic regression model, and according to each determined weight value and a preset weight threshold, the strongly correlated sample data in the correlated sample data in each subspace is determined. The user-set P-value/weight threshold can distinguish between discrete data (discrete variables) and continuous data (continuous variables). Specifically, the executing agent may calculate a weight ratio between each weight value and a preset weight threshold, compare the obtained weight ratio with 1, and further determine sample data corresponding to a weight corresponding to the ratio greater than 1 as strongly-correlated sample data in each subspace. The correlated sample data may comprise strongly correlated sample data.

In this embodiment, by determining the tree structure corresponding to the initial sample data, a subspace of the initial sample data can be automatically obtained, and the correlation between the feature column data in each subspace and the basic feature column data is determined based on the logistic regression model, so that the calculation of the correlation between each feature column data in the initial sample data and the basic feature column data is more accurate, and the accuracy of data correlation processing is improved. In addition, in the embodiment, the data type of the initial sample data in each subspace is determined, and according to the obtained data type, the initial sample data in each subspace and the preset threshold, the related sample data in each subspace can be accurately determined. In the embodiment, the missing values in the discrete data are divided according to the tree model, so that the accuracy of the correlation calculation of the corresponding variable characteristics specific to the tree model is improved. The embodiment can classify the related strength measurement corresponding to the continuous and discrete features, thereby enhancing the generalization.

And step 403, updating the monotonicity constraint vector in the model to be trained based on the initial correlation so as to train the model to be trained.

The principle of step 403 is similar to that of step 203, and is not described in detail here.

Specifically, step 403 may be implemented by iteratively performing steps 4031 to 4033 as follows:

step 4031, according to the initial correlation and the initial sample data, target feature column data in the initial sample data is determined and updated.

After the execution subject obtains the initial correlation, the model parameters in the pre-training model may be updated according to the initial correlation, for example, the execution subject may determine and update a mononous vector in the model to be trained according to a corresponding relationship between the initial correlation and the mononous vector (the mononous vector: a monotonicity constrained vector value, for example, if there are 100 features, the mononous vector is 100 dimensions, and if the corresponding learned feature such as the 30 th dimension is a positive correlation, the 30 dimensions of the vector are set to be positive 1 (the initialized model may consider that all components of the vector are 0)). At this time, the model to be trained is also changed, the selected feature column data for correlation comparison is also changed according to the updated Monotoous vector, and through continuous iterative training, the Monotoous vector of the model to be trained and the corresponding target feature column data for correlation calculation are finally determined and updated. For example, there are 100 feature column data in the initial sample data, 20 feature column data are initially selected for correlation calculation, initial correlation between the 20 feature column data and the basic feature column data is calculated, a mononous vector is determined and updated according to the obtained initial correlation, and then feature column data for correlation calculation are reselected according to the updated mononous vector, for example, the number of the selected feature column data is increased to 50, so that the mononous vector does not change any more, and then the 50 selected feature column data are target feature column data. The target feature column data is obtained by continuously iteratively updating the Monotonous vector and updating the selected feature column data.

Of course, it is understood that the target feature column data may also be feature column data for performing correlation calculation in the newly selected initial sample data after updating the montonoous vector each time. And continuously and iteratively updating target feature column data for correlation calculation in the selected initial sample data by continuously updating the Monotoous vector (the number of the target feature columns in the initial sample data and the corresponding data are both updated).

Step 4032, determine the correlation between the target feature column data and the basic feature column data, and update the initial correlation to the obtained correlation.

In this embodiment, when the target feature column data is feature column data used for performing correlation calculation in the newly selected initial sample data after updating the montonous vector each time, the execution subject may determine the correlation between the target feature column data and the basic feature column data, and update the initial correlation to the obtained correlation. Specifically, when the mononous vector changes all the time, the target feature column data is determined and updated all the time, the correlation between the updated target feature column data and the basic feature column data is determined, and the initial correlation is updated to the latest correlation.

Step 4033, according to the obtained correlation, the monotonicity constraint vector in the model to be trained is updated so as to train the model to be trained.

The execution subject may iteratively determine and update the monotonicity constraint vector in the model to be trained according to each obtained correlation, so as to train the model to be trained.

In the embodiment, the monotonicity constraint vector in the model to be trained is determined and updated for multiple times according to the initial correlation, so that the tree model capable of generating the monotonicity constraint vector autonomously and reasonably is obtained, and the accuracy of model training is improved.

Specifically, step 4033 may be implemented by steps 40331 to 40332 as follows:

step 40331, in response to determining that the variation value of the monotonicity constraint vector is smaller than the preset threshold, ending training of the model to be trained.

In this embodiment, in the training process of the model to be trained (XGB model), the execution subject may end the training of the model to be trained in response to determining that the variation of the monotonicity constraint vector (mononous vector) between the previous time and the current time is smaller than the preset threshold or the mononous vector between the previous time and the current time is not changed any more. When the Monotoous vector does not change any more, the model tends to be stable, the accuracy of the model output does not change greatly, and the training of the model to be trained (XGB model) can be terminated. The training of the model is stopped in time, and the endless model training and the waste of storage resources are prevented.

Step 40332, in response to determining that the variation value of the output of the model to be trained is smaller than the preset threshold, ending training of the model to be trained.

In this embodiment, in the training process of the model to be trained, the executive body indicates that the training of the model can be finished in response to determining that the change value of the output of the model to be trained is smaller than the preset threshold, that is, the change value of the output value of the model to be trained between the previous time and the current time is smaller than the preset threshold or the previous time and the current time are not changed. The training of the model is stopped in time, and the endless model training and the waste of storage resources are prevented.

With further reference to fig. 5, as an implementation of the method shown in the above figures, the present application provides an embodiment of an apparatus for training a model, which corresponds to the embodiment of the method shown in fig. 2, and which can be applied in various electronic devices.

As shown in fig. 5, the apparatus 500 for training a model of the present embodiment includes: an acquisition unit 501, a correlation determination unit 502 and a training unit 503.

An obtaining unit 501, configured to obtain initial sample data, basic feature column data for performing correlation comparison, and a model to be trained.

The correlation determination unit 502 is configured to determine an initial correlation between each line of data in the initial sample data and the basic feature line data based on the initial sample data and the basic feature line data.

A training unit 503 configured to update monotonicity constraint vectors in the model to be trained based on the initial correlation to train the model to be trained.

In some optional implementations of the present embodiment, the relevance determining unit 502 is further configured to: and determining initial correlation of each feature column data and the basic feature column data in the initial sample data according to the initial sample data, the basic feature column data and a logistic regression model, wherein the logistic regression model is used for representing corresponding relations of the correlation between each feature column data, the basic feature column data and the basic feature column data.

In some optional implementations of this embodiment, the training unit 503 is further configured to: the following iterative steps are performed a plurality of times: determining and updating target feature column data in the initial sample data according to the initial correlation and the initial sample data; determining the correlation between the target characteristic column data and the basic characteristic column data, and updating the initial correlation into the obtained correlation; and updating the monotonicity constraint vector in the model to be trained according to the obtained correlation so as to train the model to be trained.

In some optional implementations of this embodiment, the training unit 503 is further configured to: and finishing training the model to be trained in response to the fact that the variation value of the monotonicity constraint vector is smaller than a preset threshold value.

In some optional implementations of this embodiment, the training unit 503 is further configured to: and finishing training the model to be trained in response to determining that the variation value of the output of the model to be trained is smaller than a preset threshold value.

It should be understood that units 501 to 503, which are recited in the apparatus 500 for training a model, respectively, correspond to the respective steps in the method described with reference to fig. 2. Thus, the operations and features described above for the method for training a model are equally applicable to the apparatus 500 and the units included therein and will not be described in detail here.

An electronic device and a readable storage medium for training a model are also provided according to embodiments of the present application.

As shown in fig. 6, a block diagram of an electronic device for a method of training a model according to an embodiment of the present application is shown. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

As shown in fig. 6, the electronic apparatus includes: one or more processors 601, memory 602, and interfaces for connecting the various components, including a high-speed interface and a low-speed interface. The various components are interconnected using different buses 605 and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses 605 may be used, along with multiple memories and multiple memories, if desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). In fig. 6, one processor 601 is taken as an example.

The memory 602 is a non-transitory computer readable storage medium as provided herein. Wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the method for training a model provided herein. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the method for training a model provided herein.

The memory 602, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and units, such as program instructions/units corresponding to the method for training a model in the embodiments of the present application (e.g., the obtaining unit 501, the correlation determination unit 502, and the training unit 503 shown in fig. 5). The processor 601 executes various functional applications of the server and data processing by running non-transitory software programs, instructions and modules stored in the memory 602, namely, implements the method for training the model in the above method embodiment.

The memory 602 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created from use of the electronic device for training the model, and the like. Further, the memory 602 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory 602 optionally includes memory located remotely from processor 601, and these remote memories may be connected over a network to an electronic device for training the model. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device for the method of training a model may further comprise: an input device 603 and an output device 604. The processor 601, the memory 602, the input device 603, and the output device 604 may be connected by a bus 605 or other means, and are exemplified by the bus 605 in fig. 6.

The input device 603 may receive input numeric or character information and generate key signal inputs related to user settings and function controls of the electronic apparatus used to train the model, such as a touch screen, keypad, mouse, track pad, touch pad, pointer stick, one or more mouse buttons, track ball, joystick, or other input device. The output devices 604 may include a display device, auxiliary lighting devices (e.g., LEDs), and tactile feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

According to the technical scheme of the embodiment of the application, the molecular space is reasonably divided by the initial sample data, and the relevance is determined according to the sample data in each subspace, so that the sample data with the determined relevance is obtained, and an accurate neural network model is trained and obtained on the basis of the obtained sample data with the determined relevance.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.

The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A method for training a model, comprising:

Obtain initial sample data, basic feature column data for correlation comparison, and model to be trained;

Based on the initial sample data and the basic feature column data, determine the initial correlation between each feature column data in the initial sample data and the basic feature column data;

Based on the initial correlation, the monotonicity constraint vector in the to-be-trained model is updated to train the to-be-trained model.

2 . The method according to claim 1 , wherein, based on the initial sample data and the basic feature column data used for correlation comparison, it is determined that each feature column data in the initial sample data and the basic feature column data are determined. 3 . Initial correlation of feature column data, including:

Determine the initial correlation between each column of data in the initial sample data and the basic feature column data according to the initial sample data, the basic feature column data, and the logistic regression model, where the logistic regression model is used for Characterize the correspondence between each feature column data, the basic feature column data and the correlation between the two.

3. The method according to claim 1, wherein, based on the initial correlation, updating the monotonicity constraint vector in the to-be-trained model to train the to-be-trained model, comprising:

Perform the following iterative steps multiple times:

Determine and update target feature column data in the initial sample data according to the initial correlation and the initial sample data;

determining the correlation between the target feature column data and the basic feature column data, and updating the initial correlation to the obtained correlation;

According to the obtained correlation, the monotonicity constraint vector in the to-be-trained model is updated to train the to-be-trained model.

4. The method according to claim 3, wherein, according to the obtained correlation, updating the monotonicity constraint vector in the to-be-trained model to train the to-be-trained model, comprising:

In response to determining that the change value of the monotonicity constraint vector is less than a preset threshold, the training of the to-be-trained model is ended.

5. The method of claim 3, wherein the method further comprises:

In response to determining that the change value of the output of the model to be trained is less than the preset threshold, the training of the model to be trained is ended.

6. An apparatus for training a model, comprising:

an acquisition unit, configured to acquire initial sample data, basic feature column data for correlation comparison, and a model to be trained;

a correlation determination unit, configured to determine the initial correlation between each column of data in the initial sample data and the basic feature column data based on the initial sample data and the basic feature column data;

A training unit, configured to update the monotonicity constraint vector in the to-be-trained model based on the initial correlation, so as to train the to-be-trained model.

7. The apparatus of claim 6, wherein the correlation determination unit is further configured to:

Determine the initial correlation between each feature column data in the initial sample data and the basic feature column data according to the initial sample data, the basic feature column data, and the logistic regression model, wherein the logistic regression model uses It is used to characterize the corresponding relationship between each feature column data, the basic feature column data and the correlation between the two.

8. The apparatus of claim 6, wherein the training unit is further configured to:

Perform the following iterative steps multiple times:

9. The apparatus of claim 8, wherein the training unit is further configured to:

10. The method of claim 8, wherein the training unit is further configured to:

11. An electronic device for training a model, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor, the instructions being executed by the at least one processor to enable the at least one processor to perform the execution of any of claims 1-5 Methods.

12. A non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are used to cause the computer to perform the method of any one of claims 1-5.