CN111950706A - Data processing method and device based on artificial intelligence, computer equipment and medium - Google Patents

Data processing method and device based on artificial intelligence, computer equipment and medium Download PDF

Info

Publication number
CN111950706A
CN111950706A CN202010798027.5A CN202010798027A CN111950706A CN 111950706 A CN111950706 A CN 111950706A CN 202010798027 A CN202010798027 A CN 202010798027A CN 111950706 A CN111950706 A CN 111950706A
Authority
CN
China
Prior art keywords
fitting
data
sample set
training sample
sub
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010798027.5A
Other languages
Chinese (zh)
Inventor
张巧丽
林荣吉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Life Insurance Company of China Ltd
Original Assignee
Ping An Life Insurance Company of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Life Insurance Company of China Ltd filed Critical Ping An Life Insurance Company of China Ltd
Priority to CN202010798027.5A priority Critical patent/CN111950706A/en
Publication of CN111950706A publication Critical patent/CN111950706A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Complex Calculations (AREA)

Abstract

The invention relates to the technical field of artificial intelligence, and provides a data processing method, a device, computer equipment and a medium based on artificial intelligence, which comprises the following steps: performing box separation on the original data by adopting a first frequency-division box model to obtain a plurality of box separation data, and constructing a training sample set according to the plurality of box separation data; fitting calculation is carried out on the training sample set by adopting a plurality of preset fitting functions to obtain a fitting error corresponding to each fitting function, and the fitting function corresponding to the minimum fitting error is selected as a target fitting function; segmenting the training sample set to obtain a plurality of sub-training sample sets, and performing fitting calculation on the plurality of sub-training sample sets by adopting a target fitting function to obtain a plurality of first fitting parameters; calculating a plurality of second fitting parameters of the test sample set according to the plurality of first fitting parameters; and the target variable of the test sample set is obtained according to the plurality of second fitting parameters and the target fitting function. The invention can process the original data into stable data and reserve the diversity of the data.

Description

Data processing method and device based on artificial intelligence, computer equipment and medium
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a data processing method and device based on artificial intelligence, computer equipment and a medium.
Background
In a model prediction scenario in which the training set and the prediction set have a long time interval, for example, in a prediction scenario in which the insurance agent retains the prediction, the time interval between the training set and the prediction set may be as long as one year, the distribution of data and the prediction capability of the model may fluctuate with time, and the presence of such unstable data leads to an increased risk of model prediction.
In order to reduce the risk of model prediction, the inventor finds that although the prediction risk of the model can be reduced by removing unstable data by using a characteristic selection mode or performing information smoothing processing in the prior art, information beneficial to model prediction still exists in the removed unstable data, so that the diversity of the information is lost, and the prediction accuracy of the model is reduced.
Disclosure of Invention
In view of the above, there is a need for a data processing method, apparatus, computer device and medium based on artificial intelligence, which can process raw data into stable data and retain the diversity of the data.
The invention provides a data processing method based on artificial intelligence in a first aspect, which comprises the following steps:
performing box separation on original data by adopting a first frequency-division box model to obtain a plurality of box separation data, and constructing a training sample set according to the plurality of box separation data;
fitting calculation is carried out on the training sample set by adopting a plurality of preset fitting functions to obtain a fitting error corresponding to each fitting function, and the fitting function corresponding to the minimum fitting error is selected as a target fitting function;
segmenting the training sample set to obtain a plurality of sub-training sample sets, and performing fitting calculation on the plurality of sub-training sample sets by adopting the target fitting function to obtain a plurality of first fitting parameters;
calculating a plurality of second fitting parameters of the test sample set according to the plurality of first fitting parameters;
and calculating a target variable of the test sample set according to the plurality of second fitting parameters and the target fitting function.
According to an optional embodiment of the present invention, the performing binning processing on the original data by using the first ofdm model to obtain a plurality of binned data includes:
creating a first sliding window;
and performing non-overlapping sliding in the original data by adopting the first sliding window, and putting the original data corresponding to the first sliding window into one box during each sliding to obtain a plurality of box data.
According to an alternative embodiment of the present invention, the constructing the training sample set according to the plurality of binned data comprises:
calculating a first average value of a plurality of index variables and a second average value of a plurality of target variables in each piece of box data;
taking a first average value and a second average value corresponding to each piece of box data as a training sample;
and constructing a training sample set based on training samples corresponding to the plurality of boxed data.
According to an alternative embodiment of the present invention, the calculating the plurality of second fitting parameters of the test sample set according to the plurality of first fitting parameters comprises:
sequencing each first fitting parameter according to a time sequence;
calculating a parameter fitting function according to the sorted first fitting parameters;
and calculating a second fitting parameter according to the parameter fitting function and the acquisition time of the test sample set.
According to an alternative embodiment of the invention, the method further comprises:
calculating a first proportion of null value samples in each sub-training sample set to positive samples;
sequencing the plurality of first ratios according to a time sequence, and calculating a ratio fitting function according to the sequenced plurality of first ratios;
calculating a second proportion according to the proportion fitting function and the acquisition time of the test sample set;
and determining the number of null value samples in the positive samples in the test sample set according to the second proportion.
According to an alternative embodiment of the invention, after said calculating target variables of said set of test samples according to said plurality of second fitting parameters, said method further comprises:
training a machine learning model based on the training sample set;
calculating a risk loss value for the machine learning model based on target variables of the test sample set;
judging whether the risk loss value is larger than a preset loss threshold value or not;
and when the risk loss value is determined to be larger than the preset loss threshold value, performing box separation on the original data by adopting a second sub-box separation model to obtain a plurality of box separation data, and repeating the process until the risk loss value is smaller than the preset loss threshold value to obtain a machine learning model.
According to an alternative embodiment of the present invention, before the binning the raw data by using the equal frequency binning model, the method further comprises:
sequencing the original data and acquiring first data sequenced at a preset first sub-position and second data sequenced at a preset second sub-position;
updating the original data before the preset first sub-position by using the first data, and updating the original data after the preset second sub-position by using the second data to obtain updated data;
and acquiring minimum data in the updated data, and performing data translation on each data in the updated data by using the minimum data to obtain target data.
A second aspect of the invention provides an artificial intelligence based data processing apparatus, the apparatus comprising:
the system comprises a box separation processing module, a training sample set and a data analysis module, wherein the box separation processing module is used for performing box separation processing on original data by adopting a first frequency-box model to obtain a plurality of box separation data and constructing the training sample set according to the plurality of box separation data;
the first fitting module is used for performing fitting calculation on the training sample set by adopting a plurality of preset fitting functions to obtain a fitting error corresponding to each fitting function, and selecting the fitting function corresponding to the minimum fitting error as a target fitting function;
the second fitting module is used for segmenting the training sample set to obtain a plurality of sub-training sample sets, and fitting and calculating the plurality of sub-training sample sets by adopting the target fitting function to obtain a plurality of first fitting parameters;
a parameter calculation module for calculating a plurality of second fitting parameters of the test sample set according to the plurality of first fitting parameters;
and the third fitting module is used for calculating a target variable of the test sample set according to the plurality of second fitting parameters and the target fitting function.
A third aspect of the present invention provides a computer apparatus comprising:
a memory for storing a computer program;
and the processor is used for realizing the artificial intelligence based data processing method when executing the computer program.
A fourth aspect of the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the artificial intelligence based data processing method.
In summary, the artificial intelligence based data processing method, the artificial intelligence based data processing apparatus, the computer device and the medium of the present invention perform binning processing on the original data, so that each bin data can have the same specific gravity, and thus the distribution of the training sample set constructed based on the bin data is uniform; the fitting function with the best fitting effect is selected from the plurality of fitting functions through integral fitting of the training sample set, and when the fitting function with the best effect is used for fitting each sub-training sample set, the fitting function also has a better fitting effect, so that fitting parameters of target variables of the test sample set can be fitted, and the relation between the index variables and the target variables after the test sample set is transformed is more stable in the cross-time by utilizing the variation trend of the index variables; the instability problem of the indexes is solved through index transformation instead of index deletion, the richness of the index set is reserved, the prediction risk of a machine learning model obtained based on transformed index variable training is reduced, and the performance is stable.
Drawings
Fig. 1 is a flowchart of an artificial intelligence based data processing method according to an embodiment of the present invention.
Fig. 2 is a block diagram of a data processing apparatus based on artificial intelligence according to a second embodiment of the present invention.
Fig. 3 is a schematic structural diagram of a computer device according to a third embodiment of the present invention.
Detailed Description
In order that the above objects, features and advantages of the present invention can be more clearly understood, a detailed description of the present invention will be given below with reference to the accompanying drawings and specific embodiments. It should be noted that the embodiments of the present invention and features of the embodiments may be combined with each other without conflict.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.
The data processing method based on artificial intelligence is executed by computer equipment, and correspondingly, the data processing device based on artificial intelligence runs in the computer equipment.
Fig. 1 is a flowchart of an artificial intelligence based data processing method according to an embodiment of the present invention. The artificial intelligence based data processing method specifically comprises the following steps, and the sequence of the steps in the flow chart can be changed and some steps can be omitted according to different requirements.
And S11, performing box separation on the original data by adopting a first frequency division box model to obtain a plurality of box separation data, and constructing a training sample set according to the plurality of box separation data.
The equal-frequency binning model is used for ensuring that the quantity of data in each bin is the same, so that the data in each bin has the same proportion. The method comprises the steps of constructing a training sample set based on a plurality of box data, predicting target variables of the testing sample set according to the training sample set, training a machine learning model based on the target variables of the testing sample set, not only keeping diversity of information in the testing sample set, but also ensuring stability between the testing sample set and the training sample set, and improving prediction accuracy of the model while reducing prediction risks of the model.
In an optional embodiment, the binning the raw data by using the first ofdm model to obtain a plurality of binned data includes:
creating a first sliding window;
and performing non-overlapping sliding in the original data by adopting the first sliding window, and putting the original data corresponding to the first sliding window into one box during each sliding to obtain a plurality of box data.
In this alternative embodiment, the size of the first sliding window is N × N, and N may be 1000. The provision of the first sliding window ensures that the data in each bin is of equal magnitude, thereby ensuring that the data in each bin has the same weight when the function fitting is subsequently performed.
It should be noted that if the original data has been subjected to sorting processing, the computer device does not need to sort the original data when performing binning processing on the original data; if the original data are not subjected to sorting processing, the computer equipment firstly needs to sort the original data from large to small or from small to large, and when the sorted original data are subjected to box separation processing, the change trend of the box separation data can be effectively ensured to have consistency.
In an optional embodiment, the constructing a training sample set from the plurality of binned data comprises:
calculating a first average value of a plurality of index variables and a second average value of a plurality of target variables in each piece of box data;
taking a first average value and a second average value corresponding to each piece of box data as a training sample;
and constructing a training sample set based on training samples corresponding to the plurality of boxed data.
The index variable refers to the original data itself, for example, click rate, browsing rate, etc. The target variable refers to some assessment data obtained based on the original data, such as retention rate, performance level and the like. There is a linear correlation between the index variable and the target variable.
For example, assuming that there are 60 ten thousand pieces of raw data, the computer device performs binning processing on the 60 ten thousand pieces of raw data to obtain 1 ten thousand bins, where each bin includes 60 pieces of raw data. For any one of the boxed data, a first average value of index variables of 60 pieces of original data and a second average value of target variables in the boxed data are calculated, and then the first average value and the second average value are used as a training sample, so that a training sample set comprising 1 ten thousand training samples can be constructed.
And S12, performing fitting calculation on the training sample set by adopting a plurality of preset fitting functions to obtain a fitting error corresponding to each fitting function, and selecting the fitting function corresponding to the minimum fitting error as a target fitting function.
A plurality of fitting functions of different types are preset in the computer device, for example, a first type of fitting function y ═ a × x + b, a second type of fitting function y ═ ax ^2+ b, and a third type of fitting function y ═ a × log (x) + b.
And aiming at each preset fitting function, fitting the training sample set by adopting a least square method, and calculating to obtain a fitting error. Wherein the fitting error may include, but is not limited to: mean absolute differences, root mean square differences, and the like.
The larger the fitting error is, the better fitting of the training sample set cannot be performed by the corresponding fitting function, and the larger the error is obtained when the fitting function is used for fitting the test sample set; the smaller the fitting error is, the better the fitting function can fit the training sample set, so that the error obtained when the fitting function is used for fitting the test sample set is smaller.
S13, segmenting the training sample set to obtain a plurality of sub-training sample sets, and performing fitting calculation on the plurality of sub-training sample sets by adopting the target fitting function to obtain a plurality of first fitting parameters.
In this embodiment, the training sample set may be segmented according to the acquisition time of the training sample set, for example, the training sample set may be segmented according to the acquisition month of the training sample set to obtain a plurality of sub-training sample sets. For example, assume that a training sample set of 1 month to 6 months in 2020 is collected, the training sample set of 1 month is used as a first sub-training sample set, the training sample set of 2 months is used as a second sub-training sample set, and by interpolation, the training sample set of 6 months is used as a sixth sub-training sample set.
The method comprises the steps of firstly carrying out overall fitting on the training sample set, selecting a fitting function with the best fitting effect from a plurality of fitting functions, and then fitting each sub-training sample set by using the fitting function with the best fitting effect, so that the fitting effect of each sub-training sample set can be ensured, and the sub-training sample set and the overall training sample set are ensured to have the same variation trend.
For example, assuming that the selected fitting function is y ═ a × log (x) + b, each sub-training sample set is fitted by using a least square method, and the first fitting parameters a and b are obtained correspondingly. If there are 6 sub-training sample sets, 6 first fitting parameters a and 6 first fitting parameters b can be obtained.
And S14, calculating a plurality of second fitting parameters of the test sample set according to the plurality of first fitting parameters.
According to the first fitting parameters corresponding to the sub-training sample sets, the variation trend of the first fitting parameters can be determined, the variation trend can be monotonically increased, monotonically decreased, periodically changed, leapfrog and the like, and the second fitting parameters of the test sample set can be predicted according to the variation trend of the first fitting parameters.
In an optional embodiment, said calculating a plurality of second fitting parameters of the test sample set from said plurality of first fitting parameters comprises:
sequencing each first fitting parameter according to a time sequence;
calculating a parameter fitting function according to the sorted first fitting parameters;
and calculating a second fitting parameter according to the parameter fitting function and the acquisition time of the test sample set.
For example, assuming that the training sample set is divided into 6 sub-training sample sets by month, the first proportion and the first fitting parameter of each training sample set are as follows:
the sub-training sample set for month 1 corresponds to the first scale C1, first fitting parameters a1 and B1;
the sub-training sample set for month 2 corresponds to the first scale C2, first fitting parameters a2 and B2;
the sub-training sample set for month 3 corresponds to the first scale C3, first fitting parameters A3 and B3;
the sub-training sample set for month 4 corresponds to the first scale C4, first fitting parameters a4 and B4;
the sub-training sample set for month 5 corresponds to the first scale C5, first fitting parameters a5 and B5;
the sub-training sample set for month 6 corresponds to the first scale, C6, first fitting parameters a6 and B6.
Taking a month as X, taking a first proportion corresponding to the month as Y, obtaining 6 points (1, C1), (2, C2), (3, C3), (4, C4), (5, C5), (6, C6), fitting a proportion fitting function according to the 6 points by adopting a least square algorithm, and inputting the month 7 into the proportion fitting function to obtain a proportion C7 as a second proportion of the test sample set; similarly, a parameter fitting function is fitted by the first fitting parameters A1-A6 of 1-6 months, so that the parameter A7 of 7 months is fitted according to the parameter fitting function and serves as a second fitting parameter of the test sample set; the parameter fitting function is fitted with the first fitting parameters B1-B6 for months 1-6, so that the parameter B7 for month 7 is fitted according to the parameter fitting function as another second fitting parameter of the test sample set.
And S15, calculating a target variable of the test sample set according to the plurality of second fitting parameters and the target fitting function.
After the plurality of second fitting parameters corresponding to the test sample set and the index variables of the test sample set are obtained through calculation, the target fitting function can be used for calculating the target variables of the test sample set according to the plurality of second fitting parameters and the index variables of the test sample set, so that space mapping is performed according to a sample to be predicted in the future of the test sample set, and a new variable (target variable) with high stability is found.
In an optional embodiment, the method further comprises:
calculating a first proportion of null value samples in each sub-training sample set to positive samples;
sequencing the plurality of first ratios according to a time sequence, and calculating a ratio fitting function according to the sequenced plurality of first ratios;
calculating a second proportion according to the proportion fitting function and the acquisition time of the test sample set;
and determining the number of null value samples in the positive samples in the test sample set according to the second proportion.
And taking the training sample with the target variable as the first variable in the sub-training sample set as a positive sample, and taking the training sample with the target variable as the second variable in the sub-training sample set as a negative sample. For example, the target variable is high performance or low performance, and the training sample corresponding to high performance is set as a positive sample and the training sample corresponding to low performance is set as a negative sample. Because the values of some index variables in the positive samples are default values, and for the default values, because spatial mapping cannot be performed, the proportion of null value samples in the positive samples needs to be calculated so as to determine the variation trend of the proportion of the null value samples in each sub-training sample set, thereby predicting the proportion of the null value samples in the test sample set and further predicting the number of the null value samples in the positive samples in the test sample set.
In an optional embodiment, after the calculating the target variable of the test sample set according to the plurality of second fitting parameters, the method further comprises:
training a machine learning model based on the training sample set;
calculating a risk loss value for the machine learning model based on target variables of the test sample set;
judging whether the risk loss value is larger than a preset loss threshold value or not;
and when the risk loss value is determined to be larger than the preset loss threshold value, performing box separation on the original data by adopting a second sub-box separation model to obtain a plurality of box separation data, and repeating the process until the risk loss value is smaller than the preset loss threshold value to obtain a machine learning model.
In this alternative embodiment, a deep neural network (e.g., CNN) framework is initialized, and then the training sample set is input into the initialized deep neural network as an input parameter for training and a prediction value is output. The deep neural network comprises a plurality of network layers, wherein a risk loss layer is included in the plurality of network layers, and the risk loss layer can adopt a softmax function as a loss function. And calculating a difference value score risk loss value between the predicted value output by the deep neural network and the target variable through a softmax function.
The process of performing binning processing on the original data by using a second equal-frequency binning model to obtain a plurality of binning data comprises creating a second sliding window, performing non-overlapping sliding in the original data by using the second sliding window, and placing the original data corresponding to the second sliding window into one binning box when the original data slides each time to obtain a plurality of binning data.
The width of the first sliding window is larger than that of the second sliding window, and the original data are subjected to continuous box separation and are fitted with the target variable of the optimal test sample set, so that the machine learning model obtained through training is better in performance on the training sample set and better in performance on the test sample set, the prediction risk of the machine learning model is reduced, and the performance of the machine learning model is improved. The machine learning model may be a low performance predictive model, or a human agent retention predictive model.
In an optional embodiment, before the binning the raw data by using the equal frequency binning model, the method further includes:
sequencing the original data and acquiring first data sequenced at a preset first sub-position and second data sequenced at a preset second sub-position;
updating the original data before the preset first sub-position by using the first data, and updating the original data after the preset second sub-position by using the second data to obtain updated data;
and acquiring minimum data in the updated data, and performing data translation on each data in the updated data by using the minimum data to obtain target data.
The original data can be sorted from large to small, or from small to large. The sum of the preset first fraction and the preset second fraction is 1, for example, the preset first fraction may be 0.1%, and the preset second fraction may be 99.9%.
The method comprises the steps of replacing or covering original data before a first position is preset by using the first data to update the original data before the first position, and replacing or covering the original data after a second position is preset by using the second data to update the original data after the first position is preset, so that the whole original data is cleaned, maximum data and minimum data in the original data are removed, and the overall stability distribution of the maximum data and the minimum data to the original data is reduced.
The data translation of each data in the updated data by using the minimum data means that each data is subtracted from the minimum data, so that all data in the target data are non-negative data, and the monotonicity in fitting is ensured.
It should be understood that after the raw data is subjected to data cleaning and translation processing to obtain target data, the target data is subjected to binning processing by using the equal-frequency binning model to obtain a plurality of binned data.
In summary, the original data are subjected to binning processing, so that each bin of data can have the same specific gravity, and the distribution of the training sample set constructed based on the bin of data is uniform; the fitting function with the best fitting effect is selected from the plurality of fitting functions through integral fitting of the training sample set, and when the fitting function with the best effect is used for fitting each sub-training sample set, the fitting function also has a better fitting effect, so that fitting parameters of target variables of the test sample set can be fitted, and the relation between the index variables and the target variables after the test sample set is transformed is more stable in the cross-time by utilizing the variation trend of the index variables; the instability problem of the indexes is solved through index transformation instead of index deletion, and the richness of the index set is reserved; the prediction risk of the machine learning model obtained based on the transformed index variable training is reduced, and the performance is stable. Therefore, when the machine learning model is applied to actual business, the accuracy rate is higher.
It is emphasized that the machine learning model may be stored in a node of the blockchain in order to further ensure privacy and security of the machine learning model.
Fig. 2 is a block diagram of a data processing apparatus based on artificial intelligence according to a second embodiment of the present invention.
In some embodiments, the artificial intelligence based data processing apparatus 20 may include a plurality of functional modules comprising computer program segments. The computer program of each program segment in the artificial intelligence based data processing apparatus 20 may be stored in a memory of a computer device and executed by at least one processor to perform the functions of artificial intelligence based data processing (described in detail with reference to fig. 1).
In this embodiment, the artificial intelligence based data processing apparatus 20 may be divided into a plurality of functional modules according to the functions performed by the apparatus. The functional module may include: the system comprises a box separation processing module 201, a first fitting module 202, a second fitting module 203, a parameter calculation module 204, a third fitting module 205, a null value calculation module 206, a model training module 207 and a data cleaning module 208. The module referred to herein is a series of computer program segments capable of being executed by at least one processor and capable of performing a fixed function and is stored in memory. In the present embodiment, the functions of the modules will be described in detail in the following embodiments.
The binning processing module 201 is configured to perform binning processing on the original data by using a first equal-frequency binning model to obtain a plurality of binning data, and construct a training sample set according to the plurality of binning data.
The equal-frequency binning model is used for ensuring that the quantity of data in each bin is the same, so that the data in each bin has the same proportion. The method comprises the steps of constructing a training sample set based on a plurality of box data, predicting target variables of the testing sample set according to the training sample set, training a machine learning model based on the target variables of the testing sample set, not only keeping diversity of information in the testing sample set, but also ensuring stability between the testing sample set and the training sample set, and improving prediction accuracy of the model while reducing prediction risks of the model.
In an optional embodiment, the binning processing module 201 performs binning processing on the raw data by using a first ofdm model to obtain a plurality of binned data, where the binning processing module includes:
creating a first sliding window;
and performing non-overlapping sliding in the original data by adopting the first sliding window, and putting the original data corresponding to the first sliding window into one box during each sliding to obtain a plurality of box data.
In this alternative embodiment, the size of the first sliding window is N × N, and N may be 1000. The provision of the first sliding window ensures that the data in each bin is of equal magnitude, thereby ensuring that the data in each bin has the same weight when the function fitting is subsequently performed.
It should be noted that if the original data has been subjected to sorting processing, the computer device does not need to sort the original data when performing binning processing on the original data; if the original data are not subjected to sorting processing, the computer equipment firstly needs to sort the original data from large to small or from small to large, and when the sorted original data are subjected to box separation processing, the change trend of the box separation data can be effectively ensured to have consistency.
In an optional embodiment, the binning processing module 201 constructing the training sample set according to the plurality of binning data includes:
calculating a first average value of a plurality of index variables and a second average value of a plurality of target variables in each piece of box data;
taking a first average value and a second average value corresponding to each piece of box data as a training sample;
and constructing a training sample set based on training samples corresponding to the plurality of boxed data.
The index variable refers to the original data itself, for example, click rate, browsing rate, etc. The target variable refers to some assessment data obtained based on the original data, such as retention rate, performance level and the like. There is a linear correlation between the index variable and the target variable.
For example, assuming that there are 60 ten thousand pieces of raw data, the computer device performs binning processing on the 60 ten thousand pieces of raw data to obtain 1 ten thousand bins, where each bin includes 60 pieces of raw data. For any one of the boxed data, a first average value of index variables of 60 pieces of original data and a second average value of target variables in the boxed data are calculated, and then the first average value and the second average value are used as a training sample, so that a training sample set comprising 1 ten thousand training samples can be constructed.
The first fitting module 202 is configured to perform fitting calculation on the training sample set by using a plurality of preset fitting functions to obtain a fitting error corresponding to each fitting function, and select the fitting function corresponding to the smallest fitting error as a target fitting function.
A plurality of fitting functions of different types are preset in the computer device, for example, a first type of fitting function y ═ a × x + b, a second type of fitting function y ═ ax ^2+ b, and a third type of fitting function y ═ a × log (x) + b.
And aiming at each preset fitting function, fitting the training sample set by adopting a least square method, and calculating to obtain a fitting error. Wherein the fitting error may include, but is not limited to: mean absolute differences, root mean square differences, and the like.
The larger the fitting error is, the better fitting of the training sample set cannot be performed by the corresponding fitting function, and the larger the error is obtained when the fitting function is used for fitting the test sample set; the smaller the fitting error is, the better the fitting function can fit the training sample set, so that the error obtained when the fitting function is used for fitting the test sample set is smaller.
The second fitting module 203 is configured to segment the training sample set to obtain a plurality of sub-training sample sets, and perform fitting calculation on the plurality of sub-training sample sets by using the target fitting function to obtain a plurality of first fitting parameters.
In this embodiment, the training sample set may be segmented according to the acquisition time of the training sample set, for example, the training sample set may be segmented according to the acquisition month of the training sample set to obtain a plurality of sub-training sample sets. For example, assume that a training sample set of 1 month to 6 months in 2020 is collected, the training sample set of 1 month is used as a first sub-training sample set, the training sample set of 2 months is used as a second sub-training sample set, and by interpolation, the training sample set of 6 months is used as a sixth sub-training sample set.
The method comprises the steps of firstly carrying out overall fitting on the training sample set, selecting a fitting function with the best fitting effect from a plurality of fitting functions, and then fitting each sub-training sample set by using the fitting function with the best fitting effect, so that the fitting effect of each sub-training sample set can be ensured, and the sub-training sample set and the overall training sample set are ensured to have the same variation trend.
For example, assuming that the selected fitting function is y ═ a × log (x) + b, each sub-training sample set is fitted by using a least square method, and the first fitting parameters a and b are obtained correspondingly. If there are 6 sub-training sample sets, 6 first fitting parameters a and 6 first fitting parameters b can be obtained.
The parameter calculating module 204 is configured to calculate a plurality of second fitting parameters of the test sample set according to the plurality of first fitting parameters.
According to the first fitting parameters corresponding to the sub-training sample sets, the variation trend of the first fitting parameters can be determined, the variation trend can be monotonically increased, monotonically decreased, periodically changed, leapfrog and the like, and the second fitting parameters of the test sample set can be predicted according to the variation trend of the first fitting parameters.
In an alternative embodiment, the parameter calculation module 204 calculating a plurality of second fitting parameters of the test sample set according to the plurality of first fitting parameters includes:
sequencing each first fitting parameter according to a time sequence;
calculating a parameter fitting function according to the sorted first fitting parameters;
and calculating a second fitting parameter according to the parameter fitting function and the acquisition time of the test sample set.
For example, assuming that the training sample set is divided into 6 sub-training sample sets by month, the first proportion and the first fitting parameter of each training sample set are as follows:
the sub-training sample set for month 1 corresponds to the first scale C1, first fitting parameters a1 and B1;
the sub-training sample set for month 2 corresponds to the first scale C2, first fitting parameters a2 and B2;
the sub-training sample set for month 3 corresponds to the first scale C3, first fitting parameters A3 and B3;
the sub-training sample set for month 4 corresponds to the first scale C4, first fitting parameters a4 and B4;
the sub-training sample set for month 5 corresponds to the first scale C5, first fitting parameters a5 and B5;
the sub-training sample set for month 6 corresponds to the first scale, C6, first fitting parameters a6 and B6.
Taking a month as X, taking a first proportion corresponding to the month as Y, obtaining 6 points (1, C1), (2, C2), (3, C3), (4, C4), (5, C5), (6, C6), fitting a proportion fitting function according to the 6 points by adopting a least square algorithm, and inputting the month 7 into the proportion fitting function to obtain a proportion C7 as a second proportion of the test sample set; similarly, a parameter fitting function is fitted by the first fitting parameters A1-A6 of 1-6 months, so that the parameter A7 of 7 months is fitted according to the parameter fitting function and serves as a second fitting parameter of the test sample set; the parameter fitting function is fitted with the first fitting parameters B1-B6 for months 1-6, so that the parameter B7 for month 7 is fitted according to the parameter fitting function as another second fitting parameter of the test sample set.
The third fitting module 205 is configured to calculate a target variable of the test sample set according to the plurality of second fitting parameters and the target fitting function.
After the plurality of second fitting parameters corresponding to the test sample set and the index variables of the test sample set are obtained through calculation, the target fitting function can be used for calculating the target variables of the test sample set according to the plurality of second fitting parameters and the index variables of the test sample set, so that space mapping is performed according to a sample to be predicted in the future of the test sample set, and a new variable (target variable) with high stability is found.
The null value calculating module 206 is configured to calculate a first ratio of null value samples in each of the sub-training sample sets to positive samples; sequencing the plurality of first ratios according to a time sequence, and calculating a ratio fitting function according to the sequenced plurality of first ratios; calculating a second proportion according to the proportion fitting function and the acquisition time of the test sample set; and determining the number of null value samples in the positive samples in the test sample set according to the second proportion.
And taking the training sample with the target variable as the first variable in the sub-training sample set as a positive sample, and taking the training sample with the target variable as the second variable in the sub-training sample set as a negative sample. For example, the target variable is high performance or low performance, and the training sample corresponding to high performance is set as a positive sample and the training sample corresponding to low performance is set as a negative sample. Because the values of some index variables in the positive samples are default values, and for the default values, because spatial mapping cannot be performed, the proportion of null value samples in the positive samples needs to be calculated so as to determine the variation trend of the proportion of the null value samples in each sub-training sample set, thereby predicting the proportion of the null value samples in the test sample set and further predicting the number of the null value samples in the positive samples in the test sample set.
The model training module 207 is configured to train a machine learning model after the target variables of the test sample set are calculated according to the second fitting parameters.
In an alternative embodiment, the model training module 207 training the machine learning model includes:
training a machine learning model based on the training sample set;
calculating a risk loss value for the machine learning model based on target variables of the test sample set;
judging whether the risk loss value is larger than a preset loss threshold value or not;
and when the risk loss value is determined to be larger than the preset loss threshold value, performing box separation on the original data by adopting a second sub-box separation model to obtain a plurality of box separation data, and repeating the process until the risk loss value is smaller than the preset loss threshold value to obtain a machine learning model.
In this alternative embodiment, a deep neural network (e.g., CNN) framework is initialized, and then the training sample set is input into the initialized deep neural network as an input parameter for training and a prediction value is output. The deep neural network comprises a plurality of network layers, wherein a risk loss layer is included in the plurality of network layers, and the risk loss layer can adopt a softmax function as a loss function. And calculating a difference value score risk loss value between the predicted value output by the deep neural network and the target variable through a softmax function.
The process of performing binning processing on the original data by using a second equal-frequency binning model to obtain a plurality of binning data comprises creating a second sliding window, performing non-overlapping sliding in the original data by using the second sliding window, and placing the original data corresponding to the second sliding window into one binning box when the original data slides each time to obtain a plurality of binning data.
The width of the first sliding window is larger than that of the second sliding window, and the original data are subjected to continuous box separation and are fitted with the target variable of the optimal test sample set, so that the machine learning model obtained through training is better in performance on the training sample set and better in performance on the test sample set, the prediction risk of the machine learning model is reduced, and the performance of the machine learning model is improved. The machine learning model may be a low performance predictive model, or a human agent retention predictive model.
The data cleaning module 208 is configured to clean and translate the raw data before performing binning processing on the raw data by using the equal-frequency binning model.
In an alternative embodiment, the data cleansing module 208 cleansing and translating the raw data includes:
sequencing the original data and acquiring first data sequenced at a preset first sub-position and second data sequenced at a preset second sub-position;
updating the original data before the preset first sub-position by using the first data, and updating the original data after the preset second sub-position by using the second data to obtain updated data;
and acquiring minimum data in the updated data, and performing data translation on each data in the updated data by using the minimum data to obtain target data.
The original data can be sorted from large to small, or from small to large. The sum of the preset first fraction and the preset second fraction is 1, for example, the preset first fraction may be 0.1%, and the preset second fraction may be 99.9%.
The method comprises the steps of replacing or covering original data before a first position is preset by using the first data to update the original data before the first position, and replacing or covering the original data after a second position is preset by using the second data to update the original data after the first position is preset, so that the whole original data is cleaned, maximum data and minimum data in the original data are removed, and the overall stability distribution of the maximum data and the minimum data to the original data is reduced.
The data translation of each data in the updated data by using the minimum data means that each data is subtracted from the minimum data, so that all data in the target data are non-negative data, and the monotonicity in fitting is ensured.
It should be understood that after the raw data is subjected to data cleaning and translation processing to obtain target data, the target data is subjected to binning processing by using the equal-frequency binning model to obtain a plurality of binned data.
In summary, the original data are subjected to binning processing, so that each bin of data can have the same specific gravity, and the distribution of the training sample set constructed based on the bin of data is uniform; the fitting function with the best fitting effect is selected from the plurality of fitting functions through integral fitting of the training sample set, and when the fitting function with the best effect is used for fitting each sub-training sample set, the fitting function also has a better fitting effect, so that fitting parameters of target variables of the test sample set can be fitted, and the relation between the index variables and the target variables after the test sample set is transformed is more stable in the cross-time by utilizing the variation trend of the index variables; the instability problem of the indexes is solved through index transformation instead of index deletion, and the richness of the index set is reserved; the prediction risk of the machine learning model obtained based on the transformed index variable training is reduced, and the performance is stable. Therefore, when the machine learning model is applied to actual business, the accuracy rate is higher.
Fig. 3 is a schematic structural diagram of a computer device according to a third embodiment of the present invention. In the preferred embodiment of the present invention, the computer device 3 includes a memory 31, at least one processor 32, at least one communication bus 33, and a transceiver 34.
It will be appreciated by those skilled in the art that the configuration of the computer device shown in fig. 3 does not constitute a limitation of the embodiments of the present invention, and may be a bus-type configuration or a star-type configuration, and that the computer device 3 may include more or less hardware or software than those shown, or a different arrangement of components.
In some embodiments, the computer device 3 is a computer device capable of automatically performing numerical calculation and/or information processing according to instructions set or stored in advance, and the hardware thereof includes but is not limited to a microprocessor, an application specific integrated circuit, a programmable gate array, a digital processor, an embedded device, and the like. The computer device 3 may also include a client device, which includes, but is not limited to, any electronic product capable of interacting with a client through a keyboard, a mouse, a remote controller, a touch pad, or a voice control device, for example, a personal computer, a tablet computer, a smart phone, a digital camera, etc.
It should be noted that the computer device 3 is only an example, and other electronic products that are currently available or may come into existence in the future, such as electronic products that can be adapted to the present invention, should also be included in the scope of the present invention, and are included herein by reference.
In some embodiments, the memory 31 has stored therein a computer program which, when executed by the at least one processor 32, performs all or part of the steps of the artificial intelligence based data processing method as described. The Memory 31 includes a Read-Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Programmable Read-Only Memory (EPROM), a One-time Programmable Read-Only Memory (OTPROM), an electronically Erasable rewritable Read-Only Memory (Electrically-Erasable Programmable Read-Only Memory (EEPROM)), an optical Read-Only disk (CD-ROM) or other optical disk Memory, a magnetic disk Memory, a tape Memory, or any other medium readable by a computer capable of carrying or storing data.
Further, the computer-readable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the blockchain node, and the like.
The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
In some embodiments, the at least one processor 32 is a Control Unit (Control Unit) of the computer device 3, connects various components of the entire computer device 3 by using various interfaces and lines, and executes various functions and processes data of the computer device 3 by running or executing programs or modules stored in the memory 31 and calling data stored in the memory 31. For example, the at least one processor 32, when executing the computer program stored in the memory, implements all or part of the steps of the artificial intelligence based data processing method described in the embodiments of the present invention; or implement all or part of the functionality of an artificial intelligence based data processing apparatus. The at least one processor 32 may be composed of an integrated circuit, for example, a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same or different functions, including one or more Central Processing Units (CPUs), microprocessors, digital Processing chips, graphics processors, and combinations of various control chips.
In some embodiments, the at least one communication bus 33 is arranged to enable connection communication between the memory 31 and the at least one processor 32 or the like.
Although not shown, the computer device 3 may further include a power supply (such as a battery) for supplying power to each component, and preferably, the power supply may be logically connected to the at least one processor 32 through a power management device, so as to implement functions of managing charging, discharging, and power consumption through the power management device. The power supply may also include any component of one or more dc or ac power sources, recharging devices, power failure detection circuitry, power converters or inverters, power status indicators, and the like. The computer device 3 may further include various sensors, a bluetooth module, a Wi-Fi module, and the like, which are not described herein again.
The integrated unit implemented in the form of a software functional module may be stored in a computer-readable storage medium. The software functional module is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a computer device, or a network device) or a processor (processor) to execute parts of the methods according to the embodiments of the present invention.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is obvious that the word "comprising" does not exclude other elements or that the singular does not exclude the plural. A plurality of units or means recited in the apparatus claims may also be implemented by one unit or means in software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.
Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims (10)

1. A method of artificial intelligence based data processing, the method comprising:
performing box separation on original data by adopting a first frequency-division box model to obtain a plurality of box separation data, and constructing a training sample set according to the plurality of box separation data;
fitting calculation is carried out on the training sample set by adopting a plurality of preset fitting functions to obtain a fitting error corresponding to each fitting function, and the fitting function corresponding to the minimum fitting error is selected as a target fitting function;
segmenting the training sample set to obtain a plurality of sub-training sample sets, and performing fitting calculation on the plurality of sub-training sample sets by adopting the target fitting function to obtain a plurality of first fitting parameters;
calculating a plurality of second fitting parameters of the test sample set according to the plurality of first fitting parameters;
and calculating a target variable of the test sample set according to the plurality of second fitting parameters and the target fitting function.
2. The artificial intelligence based data processing method of claim 1, wherein the binning the raw data using the first ofdm model to obtain a plurality of binned data comprises:
creating a first sliding window;
and performing non-overlapping sliding in the original data by adopting the first sliding window, and putting the original data corresponding to the first sliding window into one box during each sliding to obtain a plurality of box data.
3. The artificial intelligence based data processing method of claim 1, wherein said constructing a training sample set from the plurality of binned data comprises:
calculating a first average value of a plurality of index variables and a second average value of a plurality of target variables in each piece of box data;
taking a first average value and a second average value corresponding to each piece of box data as a training sample;
and constructing a training sample set based on training samples corresponding to the plurality of boxed data.
4. The artificial intelligence based data processing method of claim 1, wherein said calculating a plurality of second fitting parameters of the test sample set from the plurality of first fitting parameters comprises:
sequencing each first fitting parameter according to a time sequence;
calculating a parameter fitting function according to the sorted first fitting parameters;
and calculating a second fitting parameter according to the parameter fitting function and the acquisition time of the test sample set.
5. An artificial intelligence based data processing method according to any of the claims 1 to 4, characterized in that the method further comprises:
calculating a first proportion of null value samples in each sub-training sample set to positive samples;
sequencing the plurality of first ratios according to a time sequence, and calculating a ratio fitting function according to the sequenced plurality of first ratios;
calculating a second proportion according to the proportion fitting function and the acquisition time of the test sample set;
and determining the number of null value samples in the positive samples in the test sample set according to the second proportion.
6. The artificial intelligence based data processing method of claim 5, wherein after the calculating target variables for the set of test samples from the plurality of second fitting parameters, the method further comprises:
training a machine learning model based on the training sample set;
calculating a risk loss value for the machine learning model based on target variables of the test sample set;
judging whether the risk loss value is larger than a preset loss threshold value or not;
and when the risk loss value is determined to be larger than the preset loss threshold value, performing box separation on the original data by adopting a second sub-box separation model to obtain a plurality of box separation data, and repeating the process until the risk loss value is smaller than the preset loss threshold value to obtain a machine learning model.
7. An artificial intelligence based data processing method according to any of claims 1 to 4, wherein before said binning raw data using an equal frequency binning model, the method further comprises:
sequencing the original data and acquiring first data sequenced at a preset first sub-position and second data sequenced at a preset second sub-position;
updating the original data before the preset first sub-position by using the first data, and updating the original data after the preset second sub-position by using the second data to obtain updated data;
and acquiring minimum data in the updated data, and performing data translation on each data in the updated data by using the minimum data to obtain target data.
8. An artificial intelligence based data processing apparatus, the apparatus comprising:
the system comprises a box separation processing module, a training sample set and a data analysis module, wherein the box separation processing module is used for performing box separation processing on original data by adopting a first frequency-box model to obtain a plurality of box separation data and constructing the training sample set according to the plurality of box separation data;
the first fitting module is used for performing fitting calculation on the training sample set by adopting a plurality of preset fitting functions to obtain a fitting error corresponding to each fitting function, and selecting the fitting function corresponding to the minimum fitting error as a target fitting function;
the second fitting module is used for segmenting the training sample set to obtain a plurality of sub-training sample sets, and fitting and calculating the plurality of sub-training sample sets by adopting the target fitting function to obtain a plurality of first fitting parameters;
a parameter calculation module for calculating a plurality of second fitting parameters of the test sample set according to the plurality of first fitting parameters;
and the third fitting module is used for calculating a target variable of the test sample set according to the plurality of second fitting parameters and the target fitting function.
9. A computer device, characterized in that the computer device comprises:
a memory for storing a computer program;
a processor for implementing the artificial intelligence based data processing method of any one of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the artificial intelligence based data processing method according to any one of claims 1 to 7.
CN202010798027.5A 2020-08-10 2020-08-10 Data processing method and device based on artificial intelligence, computer equipment and medium Pending CN111950706A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010798027.5A CN111950706A (en) 2020-08-10 2020-08-10 Data processing method and device based on artificial intelligence, computer equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010798027.5A CN111950706A (en) 2020-08-10 2020-08-10 Data processing method and device based on artificial intelligence, computer equipment and medium

Publications (1)

Publication Number Publication Date
CN111950706A true CN111950706A (en) 2020-11-17

Family

ID=73332750

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010798027.5A Pending CN111950706A (en) 2020-08-10 2020-08-10 Data processing method and device based on artificial intelligence, computer equipment and medium

Country Status (1)

Country Link
CN (1) CN111950706A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112836765A (en) * 2021-03-01 2021-05-25 深圳前海微众银行股份有限公司 Data processing method and device for distributed learning and electronic equipment
CN113742193A (en) * 2021-09-13 2021-12-03 上海晓途网络科技有限公司 Data analysis method and device, electronic equipment and storage medium
CN113780583A (en) * 2021-09-18 2021-12-10 中国平安人寿保险股份有限公司 Model training monitoring method, device, equipment and storage medium
CN116610897A (en) * 2023-07-14 2023-08-18 矿冶科技集团有限公司 Tailing pond drainage data fitting method, system, equipment and storage medium

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112836765A (en) * 2021-03-01 2021-05-25 深圳前海微众银行股份有限公司 Data processing method and device for distributed learning and electronic equipment
CN112836765B (en) * 2021-03-01 2023-12-22 深圳前海微众银行股份有限公司 Data processing method and device for distributed learning and electronic equipment
CN113742193A (en) * 2021-09-13 2021-12-03 上海晓途网络科技有限公司 Data analysis method and device, electronic equipment and storage medium
CN113780583A (en) * 2021-09-18 2021-12-10 中国平安人寿保险股份有限公司 Model training monitoring method, device, equipment and storage medium
CN116610897A (en) * 2023-07-14 2023-08-18 矿冶科技集团有限公司 Tailing pond drainage data fitting method, system, equipment and storage medium
CN116610897B (en) * 2023-07-14 2023-10-17 矿冶科技集团有限公司 Tailing pond drainage data fitting method, system, equipment and storage medium

Similar Documents

Publication Publication Date Title
US11392843B2 (en) Utilizing a machine learning model to predict a quantity of cloud resources to allocate to a customer
CN111950706A (en) Data processing method and device based on artificial intelligence, computer equipment and medium
CN111950738A (en) Machine learning model optimization effect evaluation method and device, terminal and storage medium
US10635986B2 (en) Information processing system and information processing method
US10719639B2 (en) Massively accelerated Bayesian machine
CN112700131B (en) AB test method and device based on artificial intelligence, computer equipment and medium
CN113157379A (en) Cluster node resource scheduling method and device
CN111694844A (en) Enterprise operation data analysis method and device based on configuration algorithm and electronic equipment
CN114612194A (en) Product recommendation method and device, electronic equipment and storage medium
CN113435998A (en) Loan overdue prediction method and device, electronic equipment and storage medium
CN111768096A (en) Rating method and device based on algorithm model, electronic equipment and storage medium
CN111951047A (en) Advertisement effect evaluation method based on artificial intelligence, terminal and storage medium
CN114201212A (en) Configuration file processing method and device, computer equipment and storage medium
CN112818028B (en) Data index screening method and device, computer equipment and storage medium
CN111652282B (en) Big data-based user preference analysis method and device and electronic equipment
CN112598135A (en) Model training processing method and device, computer equipment and medium
CN112102011A (en) User grade prediction method, device, terminal and medium based on artificial intelligence
CN117193975A (en) Task scheduling method, device, equipment and storage medium
US11782923B2 (en) Optimizing breakeven points for enhancing system performance
CN115271821A (en) Dot distribution processing method, dot distribution processing device, computer equipment and storage medium
CN113435746B (en) User workload scoring method and device, electronic equipment and storage medium
CN113918296A (en) Model training task scheduling execution method and device, electronic equipment and storage medium
CN114968336A (en) Application gray level publishing method and device, computer equipment and storage medium
CN115187134A (en) Grid-based power distribution network planning method and device and terminal equipment
CN114490590A (en) Data warehouse quality evaluation method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination