CN115660101A - Data service providing method and device based on service node information - Google Patents

Data service providing method and device based on service node information Download PDF

Info

Publication number
CN115660101A
CN115660101A CN202211186015.2A CN202211186015A CN115660101A CN 115660101 A CN115660101 A CN 115660101A CN 202211186015 A CN202211186015 A CN 202211186015A CN 115660101 A CN115660101 A CN 115660101A
Authority
CN
China
Prior art keywords
real
machine learning
learning model
data
sample data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211186015.2A
Other languages
Chinese (zh)
Inventor
胡成意
沈赟
朱维娜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Qiyue Information Technology Co Ltd
Original Assignee
Shanghai Qiyue Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Qiyue Information Technology Co Ltd filed Critical Shanghai Qiyue Information Technology Co Ltd
Priority to CN202211186015.2A priority Critical patent/CN115660101A/en
Publication of CN115660101A publication Critical patent/CN115660101A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application relates to a data service providing method and device based on service node information. The method comprises the following steps: training the machine learning model through real-time sample data; in the training iterative process of the machine learning model, when the one-dimensional weighted cross entropy loss index obtained by weighting adjustment meets a preset strategy, the machine learning model is obtained; processing the real-time service data based on the updated machine learning model to generate service node information; and forwarding the service request of the real-time user to a corresponding background server according to the service node information to provide data service. The data service providing method, the data service providing device, the electronic equipment and the computer readable medium for generating the business node information based on the weighted cross entropy loss index can meet the updating requirement of an online machine learning model in real time, ensure the safety and accuracy of model calculation on the premise of meeting the calculation efficiency, improve the data safety of the whole system and reduce the data redundancy cost.

Description

Data service providing method and device based on service node information
Technical Field
The present application relates to the field of computer information processing, and in particular, to a data service providing method and apparatus, an electronic device, and a computer-readable medium for generating business node information based on a weighted cross entropy loss index.
Background
After a machine learning model is trained, the machine learning model needs to be smoothly deployed in an actual production environment for actual operation and verification. In the process of online deployment of the model, the support of a lot of developers is needed. The model developer: according to business requirements, a wind control model is constructed based on offline data and is responsible for model deployment, monitoring, maintenance and the like; the strategy developer: according to the model score of off-line prediction, making a corresponding wind control strategy scheme, configuring a strategy package and the like; the environment developer: and the work of accessing a bottom data source, troubleshooting, online deployment, platform building and the like is supported.
Moreover, real-time data is generated along with the use of the machine learning model in an actual environment, the machine learning model needs to be checked by using the real-time data for the accuracy consideration of the machine learning model, and when a deviation of the calculation result of the machine learning model is found, the parameters of the machine learning model need to be updated in time. While more and more tools assist in deploying a trained machine-learned model to a production environment, the model's online process involves numerous links, such as data acquisition initiation, data analysis, data deformation, data validation, data splitting, training, model creation, model validation, large-scale training, model publishing, and so forth. The training of the machine learning model is a step which takes a lot of time. In order to make the machine learning model as accurate as possible, it is necessary to train the machine learning model with as many samples as possible, but the accuracy requirement is met, and at the same time, time loss is inevitably brought. Moreover, the machine learning model training using big data also requires more data redundancy space, and it is almost impossible to provide data service based on the model updated in real time at the stage of busy traffic.
Therefore, a new data service providing method, apparatus, electronic device and computer readable medium for generating service node information based on weighted cross entropy loss index is needed.
The above information disclosed in this background section is only for enhancement of understanding of the background of the application and therefore it may contain information that does not form the prior art that is already known to a person of ordinary skill in the art.
Disclosure of Invention
In view of this, the present application provides a data service providing method, an apparatus, an electronic device, and a computer readable medium for generating service node information based on a weighted cross entropy loss index, which can implement the update requirement of a real-time online machine learning model, ensure the safety and accuracy of model calculation on the premise of satisfying the calculation efficiency, improve the data safety of the whole system, and reduce the data redundancy cost.
Other features and advantages of the present application will be apparent from the following detailed description, or may be learned by practice of the application.
According to an aspect of the present application, a data service providing method for generating service node information based on a weighted cross entropy loss index is provided, the method including: acquiring a machine learning model to be updated corresponding to a service port, and acquiring real-time sample data corresponding to the machine learning model; determining a sample label for the real-time sample data; training the machine learning model through real-time sample data with sample labels; in the training iteration process of the machine learning model, performing weighted adjustment on the cross entropy loss index according to the sample distribution and the identification difficulty in real-time sample data to generate a one-dimensional weighted cross entropy loss index, and obtaining an updated machine learning model when the one-dimensional weighted cross entropy loss index meets a preset strategy; acquiring real-time service data from the service port, and processing the real-time service data based on the updated machine learning model to generate service node information; and forwarding the service request of the real-time user to a corresponding background server according to the service node information so as to provide data service.
Optionally, performing weighted adjustment on the cross entropy loss index according to the sample distribution and the identification difficulty in the real-time sample data to generate a one-dimensional weighted cross entropy loss index, including: in the iterative process of each round of machine learning model training, obtaining a cross entropy loss index based on the prediction probability of the iterative machine learning model to the real-time sample data; generating a weighting factor according to the number of positive samples and the number of negative samples in the sample distribution of the real-time sample data; determining an adjusting factor according to the prediction probability and the identification difficulty of the real-time sample data; adjusting and balancing the cross entropy loss index through the adjusting factor; and weighting the cross entropy loss index after the balance adjustment through the weighting factor to generate the one-dimensional weighted cross entropy loss index.
Optionally, generating a weighting factor according to the number of positive samples and the number of negative samples in the sample distribution of the real-time sample data includes: respectively obtaining the weight factor of each real-time sample data according to the corresponding relation between the number of the positive samples, the number of the negative samples and the label of each real-time sample data; and carrying out mean value calculation on the weight factors of all the real-time sample data to generate the weight factors.
Optionally, the obtaining a machine learning model to be updated corresponding to a service port, and obtaining real-time sample data corresponding to the machine learning model, includes: determining a model updating strategy according to the business strategy; when a model updating strategy is reached, a machine learning model to be updated corresponding to a service port is obtained, and real-time sample data corresponding to the machine learning model is obtained.
Optionally, the obtaining a machine learning model to be updated corresponding to a service port, and obtaining real-time sample data corresponding to the machine learning model includes: obtaining incremental business data generated in the model updating period; and performing data cleaning and feature screening on the incremental business data to generate the real-time sample data.
Optionally, determining a sample tag for the real-time sample data includes: obtaining a label judgment criterion according to a service strategy; extracting a plurality of sample data in the real-time sample data one by one; automatically assigning labels to the plurality of samples according to the label decision criteria.
According to an aspect of the present application, a data service providing apparatus for generating service node information based on a weighted cross entropy loss index is provided, the apparatus including: the model module is used for acquiring a machine learning model to be updated corresponding to a service port and acquiring real-time sample data corresponding to the machine learning model; the label module is used for determining a sample label for the real-time sample data; the training module is used for training the machine learning model through real-time sample data with sample labels; the updating module is used for performing weighted adjustment on the cross entropy loss index according to the sample distribution and the identification difficulty in the real-time sample data in the training iteration process of the machine learning model to generate a one-dimensional weighted cross entropy loss index, and when the one-dimensional weighted cross entropy loss index meets a preset strategy, the updated machine learning model is obtained; the deployment module is used for acquiring real-time service data from the service port, processing the real-time service data based on the updated machine learning model and generating service node information; and forwarding the service request of the real-time user to a corresponding background server according to the service node information to provide data service.
Optionally, the updating module is specifically configured to, in an iterative process of each round of the machine learning model training, obtain a cross entropy loss index based on a prediction probability of the iterative machine learning model on the real-time sample data; generating a weighting factor according to the number of positive samples and the number of negative samples in the sample distribution of the real-time sample data; determining an adjusting factor according to the prediction probability and the identification difficulty of the real-time sample data; adjusting and balancing the cross entropy loss index through the adjusting factor; and weighting the cross entropy loss index after the balance is adjusted according to the weighting factor to generate the one-dimensional weighted cross entropy loss index.
Optionally, the updating module is specifically configured to obtain a weight factor of each real-time sample data according to a corresponding relationship between the number of positive samples, the number of negative samples, and the label of each real-time sample data; and carrying out mean value calculation on the weight factors of all the real-time sample data to generate the weight factors.
According to an aspect of the present application, an electronic device is provided, the electronic device including: one or more processors; storage means for storing one or more programs; when executed by one or more processors, cause the one or more processors to implement a method as above.
According to an aspect of the application, a computer-readable medium is proposed, on which a computer program is stored, which program, when being executed by a processor, carries out the method as above.
According to the data service providing method, the data service providing device, the electronic equipment and the computer readable medium for generating the business node information based on the weighted cross entropy loss index, the machine learning model to be updated and the corresponding real-time sample data are obtained; determining a sample label for the real-time sample data; training the machine learning model through real-time sample data with a sample label; in the training process of the machine learning model, when the one-dimensional weighted cross entropy loss index meets a preset strategy, generating an updated machine learning model; the updated machine learning model is deployed online to provide data service for real-time users, the updating requirement of the real-time online machine learning model can be met, the safety and accuracy of model calculation are guaranteed on the premise that the calculation efficiency is met, the data safety of the whole system is improved, and the data redundancy cost is reduced.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Drawings
The above and other objects, features and advantages of the present application will become more apparent by describing in detail exemplary embodiments thereof with reference to the attached drawings. The drawings described below are only some embodiments of the present application and other drawings may be derived by those skilled in the art without inventive effort.
Fig. 1 is a flowchart illustrating a data service providing method for generating service node information based on weighted cross entropy loss index according to an exemplary embodiment.
Fig. 2 is a flowchart illustrating a data service providing method of generating traffic node information based on weighted cross entropy loss indicators according to another exemplary embodiment.
Fig. 3 is a flowchart illustrating a data service providing method of generating service node information based on weighted cross entropy loss index according to another exemplary embodiment.
Fig. 4 is a schematic diagram illustrating a data service providing method for generating service node information based on a weighted cross entropy loss index according to another exemplary embodiment.
Fig. 5 is a block diagram illustrating a data service providing apparatus that generates traffic node information based on weighted cross entropy loss index according to an exemplary embodiment.
FIG. 6 is a block diagram illustrating an electronic device in accordance with an example embodiment.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The same reference numerals in the drawings denote the same or similar parts, and a repetitive description thereof will be omitted.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the application. One skilled in the relevant art will recognize, however, that the subject matter of the present application can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known methods, devices, implementations, or operations have not been shown or described in detail to avoid obscuring aspects of the application.
The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.
The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.
It will be understood that, although the terms first, second, third, etc. may be used herein to describe various components, these components should not be limited by these terms. These terms are used to distinguish one element from another. Thus, a first component discussed below could be termed a second component without departing from the teachings of the present concepts. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
It should be understood by those skilled in the art that the drawings are merely schematic representations of exemplary embodiments, and that the blocks or flowchart illustrations in the drawings are not necessarily required to practice the present application and, therefore, should not be considered to limit the scope of the present application.
The technical abbreviations referred to in this application are explained as follows:
the hyper-parameters are frame parameters in the machine learning model, such as the number of classes in a clustering method, or the number of topics in a topic model, and the like, and are called hyper-parameters.
A loss function (loss function) or cost function (cost function) is a function that maps a random event or its associated random variable values to non-negative real numbers to represent the "risk" or "loss" of the random event. In application, the loss function is usually associated with the optimization problem as a learning criterion, i.e. the model is solved and evaluated by minimizing the loss function.
The method is developed on the basis of a Gradient Boosting Decision Tree, has the overall name of eXreme Gradient Bositing, and is proposed by Tianqi Chen in 2015.
AUC: AUC is the area under the ROC (receiver operating characteristic curve) curve. The ROC curve is a curve plotted on the ordinate with true positive rate (sensitivity) as ordinate and false positive rate (1-specificity) as abscissa, according to a series of different two classification methods (cut-off values or decision thresholds). The AUC is an evaluation index for measuring the quality of the two-classification model and represents the probability that a predicted positive case is arranged in front of a predicted negative case.
Information Value: the information value is one of the ways to select the important variables in the predictive model, and it can rank the predictive variables according to their importance.
Fig. 1 is a flowchart illustrating a data service providing method for generating service node information based on weighted cross entropy loss index according to an exemplary embodiment. The data service providing method 10 of generating the service node information based on the weighted cross entropy loss index includes at least steps S102 to S108.
As shown in fig. 1, in S102, a machine learning model to be updated corresponding to a service port is obtained, and real-time sample data corresponding to the machine learning model is obtained. A model update policy can be determined according to a business policy; and when the model updating strategy is reached, acquiring the machine learning model to be updated and the corresponding real-time sample data.
In S104, a sample label is determined for the real-time sample data. The label judgment criterion can be obtained according to the service strategy; extracting a plurality of sample data in the real-time sample data one by one; automatically assigning labels to the plurality of samples according to the label decision criteria.
In S106, the machine learning model is trained with real-time sample data with sample labels.
In S108, in the training iterative process of the machine learning model, the cross entropy loss index is weighted and adjusted according to the sample distribution and the recognition difficulty in the real-time sample data to generate a one-dimensional weighted cross entropy loss index, and when the one-dimensional weighted cross entropy loss index meets a preset strategy, an updated machine learning model is obtained.
In one embodiment, a cross entropy loss index may be obtained based on a prediction probability of the machine learning model for the real-time sample data in each round of the iterative process of the machine learning model training; generating a weighting factor according to the number of positive samples and the number of negative samples in the sample distribution of the real-time sample data; determining an adjusting factor according to the prediction probability and the identification difficulty of the real-time sample data; adjusting and balancing the cross entropy loss index through the adjusting factor; and weighting the cross entropy loss index after the balance adjustment through the weighting factor to generate the one-dimensional weighted cross entropy loss index.
In S110, real-time service data is obtained from the service port, the real-time service data is processed based on the updated machine learning model, service node information is generated, and a service request of a real-time user is forwarded to a corresponding background server according to the service node information to provide a data service.
In this embodiment, the real-time service data may be daily operation log information of the real-time user, or data access information, identification information, and the like of the real-time user, the implementation service data is processed through a machine learning model, an identification tag of the real-time user is determined, and service node information is generated through the identification tag, for example, if the identification tag of the user is a normal user, the generated service node information includes pass information of the user accessing the data service; and if the identification label of the user is an illegal user, the generated service node information comprises information for refusing the user to access the data service.
In this embodiment, the real-time service data may be processed based on the updated machine learning model to obtain an identification tag for the real-time user, the service node information is generated when the identification tag is that the real-time user is a normal user, the service node information is not generated when the identification tag is that the real-time user is an illegal user, and the service request is forwarded to the corresponding server for the real-time user only when the real-time user has the service node information.
In this embodiment, the real-time service data may be processed based on the updated machine learning model to obtain an identification tag for the real-time user, determine a service right of the real-time user according to a score of the identification tag for the real-time user, generate service node information according to the service right, determine a data service accessible by the real-time user according to the service node information, and forward a service request of the data service accessible by the real-time user to a corresponding server.
In one embodiment, the updated machine learning model may be parameter matched with the service port to be matched, for example; after parameter matching is completed, acquiring real-time service data based on the updated machine learning model; the updated machine learning model generates service node information by calculating the real-time service data; and forwarding the service request of the real-time user to a corresponding background server according to the service node information for processing.
According to the data service providing method for generating the service node information based on the weighted cross entropy loss index, a machine learning model to be updated and real-time sample data corresponding to the machine learning model are obtained; determining a sample label for the real-time sample data; training the machine learning model through real-time sample data with a sample label; in the training process of the machine learning model, when the one-dimensional weighted cross entropy loss index meets a preset strategy, generating an updated machine learning model; the updated machine learning model is deployed online to provide data service for real-time users, the updating requirement of the real-time online machine learning model can be met, the safety and accuracy of model calculation are guaranteed on the premise that the calculation efficiency is met, the data safety of the whole system is improved, and the data redundancy cost is reduced.
Example (b): the scheme can be applied to a scheme for determining user permission when providing data service for a user, for example, a service port can be a port for the user to download data, a machine learning model arranged in the service port can be obtained by training as sample data according to historical operation data of the user and the user identified as illegally acquiring data, in the scheme, newly-added sample data appears in real-time sample data, namely the corresponding service port, through the scheme disclosed by the application, when the user downloads data through the service port, the corresponding machine learning model is updated in real time, whether the user accessing the service terminal illegally acquires data is determined through the machine learning model, and whether the data service is performed for the user is determined through the machine learning model.
Of course, the scheme may also be applied to other application scenarios that require analysis of the security level or credit score of the user, for example, scenarios such as internet finance, page login access, page data browsing, data transmission between terminals, and the like.
It should be clearly understood that this application describes how to make and use particular examples, but the principles of this application are not limited to any details of these examples. Rather, these principles can be applied to many other embodiments based on the teachings of the present disclosure.
Fig. 2 is a flowchart illustrating a data service providing method of generating service node information based on weighted cross entropy loss index according to another exemplary embodiment. The process 20 shown in fig. 2 is a detailed description of S102 "obtaining the machine learning model to be updated corresponding to the service port and obtaining real-time sample data corresponding to the machine learning model" in the process shown in fig. 1.
As shown in fig. 2, in S202, a model update policy is determined according to a business policy. Different model update periods or update policies may be determined for different business policies. The model may be updated, for example, once a week, and may also be updated, for example, once a day.
In an embodiment, for example, when the model calculation accuracy index is smaller than the threshold, it may be determined to enter model updating, and when other business policies are adjusted, it may also be determined to perform model updating, which is not limited herein.
In S204, when the model update policy is reached, the machine learning model to be updated corresponding to the service port is obtained, and the incremental service data generated in the current model update cycle is obtained. And when the model updating strategy is reached, acquiring incremental business data generated in the current time period after the last model updating.
In S206, data cleaning and feature screening are performed on the incremental service data to generate the real-time sample data. The incremental business data is subjected to data processing, and the incremental business data can be subjected to processing such as feature extraction to generate a real-time sample.
According to the data service providing method for generating business node information based on the weighted cross entropy Loss index, an optimized Focal local Loss function is provided, the method can be applied to a machine learning model under an actual application scene, the time and space complexity conditions of repeated optimization parameters in the model construction process can be saved, and the Loss function after optimization is applied to the iterative process of the model. The parameter adjusting process can be accelerated and optimized, and the model effect is improved.
Fig. 3 is a flowchart illustrating a data service providing method of generating service node information based on weighted cross entropy loss index according to another exemplary embodiment. The process 30 shown in fig. 3 is a supplementary description of the process shown in fig. 1.
As shown in fig. 3, in S302, in each iteration of the machine learning model training, a cross entropy loss index is obtained based on a prediction probability of the machine learning model for the real-time sample data.
In S304, a weighting factor is generated according to the number of positive samples and the number of negative samples in the sample distribution of the real-time sample data.
In this embodiment, the sample distribution in the real-time sample data includes: the number of positive samples and the number of negative samples; the sample distribution may also be that the samples are divided into a plurality of sample sets by category, the number of samples in each sample set, or the number of positive samples and the number of negative samples in each sample set.
The XGBoost algorithm is taken as an example in the present application to describe the technical content of the present application in detail. The XGboost algorithm has very good performance in the aspects of classification and regression. However, when the positive and negative samples in the binary samples are extremely unbalanced, the model usually has difficulty in performing very good model performance. A great deal of research has been done by a great deal of researchers focusing on such problems, mainly with regard to optimization of loss functions, mainly including OHEM and focal loss functions, etc.
OHEM, also known as online hard example mining, screens for hard examples, which represent examples with a large impact on classification and detection, based on the loss of input samples, and then applies these screened examples to stochastic gradient descent training. However, this method requires two stages to train the model, which consumes a lot of time. In addressing the class imbalance problem, RBG and Kaiming (2017) propose a new loss function: focal loss, this loss function is modified based on the standard cross entropy loss. This function may make the model more focused on difficult samples during training by reducing the weights of the easy samples. The method can improve the construction efficiency and accuracy of the model. The focal loss function also faces complex two-dimensional hyper-parameter r and a space, which also consumes a lot of time to perform hyper-parameter tuning in the process of constructing the model.
The introduction of Focal local is mainly to solve the problem of unbalanced number of difficult and easy samples, the practical usable range is very wide, and in the general practical application process, there are very many negative samples, only few positive samples, and the number of positive and negative samples is very unbalanced. The formula for the loss cross entropy commonly used in computing classification is as follows:
Figure BDA0003867721840000111
wherein p is the prediction probability, y is the label value, and y corresponds to labels 1 and 0 in the binary model.
To solve the problem of positive and negative sample imbalance, a parameter α is usually added before the cross-entropy loss, i.e.:
Figure BDA0003867721840000112
but this does not solve all the problems. The samples can be divided into four categories as shown in fig. 4 according to positive, negative, difficult and easy.
Although α balances the positive and negative samples, it does not help in the imbalance of the difficult and easy samples. In fact, a large number of candidate targets in risk assessment are all easily separable samples. The loss of these samples is low, but due to the extreme imbalance in the number, the number of easily separable samples is relatively large, eventually leading to an overall loss. While the authors of this document believe that the promotion effect of the easily separable samples (i.e., the samples with high confidence) on the model is very small, the model should focus on those difficult samples, and therefore, the main idea of the Focal loss function is to adjust the weight of the samples with high confidence to be lower.
Figure BDA0003867721840000121
Wherein r is a weight parameter by which- (1-p) in the formula (3) is constructed r And-p r And as an adjusting factor, the weights of the easily-separable samples and the difficultly-separable samples are adjusted. In the scheme, the output prediction probability is a numerical value between 0 and 1, and the labels of the positive and negative samples are determined, so that the negative sample is more easily distinguished when the prediction probability of the negative sample is closer to 0, and the positive sample is more easily distinguished when the prediction probability of the positive sample is closer to 1.
Then, the final focal loss function is constructed in combination with equation (2). Formula (3) solves the imbalance of difficult and easy samples, formula (2) solves the imbalance of positive and negative samples, and formula (2) and formula (3) are combined for use, and the two problems of positive and negative difficulty and easy samples are solved simultaneously. The final representation of the Focal loss function is as follows:
Figure BDA0003867721840000122
in S304, a weighting factor is generated by a superposition relationship of the number of positive samples, the number of negative samples, and the weighting factor.
The focal loss function needs to find the optimal hyper-parameter in a two-dimensional hyper-parameter space, so that a large amount of time is consumed, and the focal loss function is optimally adjusted by combining with the weighted loss function. The hyper-parameter a is to balance the imbalance problem of the sample, and is estimated by adopting the following method:
Figure BDA0003867721840000123
Figure BDA0003867721840000124
wherein, beta i Is the weight factor of the ith real-time sample data,
Figure BDA0003867721840000125
for the weighting factor, P is the number of positive samples in the sample, N is the number of negative samples, and m represents the number of real-time sample data.
In S306, the one-dimensional weighted cross entropy loss index of the training round is generated through the training times of the training round, the weighting factor value of the training round, the adjusting factor and the prediction probability of the training round.
Then, in combination with the above equation, the final optimized Focal loss function is as follows:
Figure BDA0003867721840000131
Figure BDA0003867721840000132
representing the weighting factor corresponding to the positive sample in the real-time sample data, calculating
Figure BDA0003867721840000133
Then, m in the above formula is converted into the number of positive samples in the real-time sample data, β i The weighting factor is the weight factor of the ith positive sample in the real-time sample data;
Figure BDA0003867721840000134
representing the weighting factor corresponding to the negative sample in the real-time sample data, calculating
Figure BDA0003867721840000135
Then, m in the above formula is changed into the number of negative samples in the real-time sample data, beta i The weighting factor for the ith negative sample in the real-time sample data.
In order to verify the effect of the scheme, the application uses a set of sample data to carry out testing, and the results are shown in the following table, the Optimized Focal local construction characteristics are used for modeling that the AUC is 0.7643, the KS is 0.7643, and after the original Focal local is used, the AUC is 0.7583, the KS is 0.3629, the value of the one-dimensional weighted cross entropy Loss function provided by the application is basically leveled and improved compared with the AUC of the original result Loss function, but the search time in the parameter space is greatly improved.
test_auc test_ks
Focal Loss 0.7643 0.3629
Optimized Focal Loss 0.7583 0.3531
According to the data service providing method for generating the service node information based on the weighted cross entropy loss index, the hyper-parameter space of the model is optimized by using the weighted cross loss function theory method, the model construction efficiency is greatly improved, and the model weight two-dimensional hyper-parameter space is reduced to one dimension.
Those skilled in the art will appreciate that all or part of the steps implementing the above embodiments are implemented as computer programs executed by a CPU. When executed by the CPU, performs the functions defined by the methods provided herein. The program may be stored in a computer readable storage medium, which may be a read-only memory, a magnetic or optical disk, or the like.
Furthermore, it should be noted that the above-mentioned figures are only schematic illustrations of the processes involved in the method according to exemplary embodiments of the present application and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.
The following are embodiments of the apparatus of the present application that may be used to perform embodiments of the method of the present application. For details which are not disclosed in the embodiments of the apparatus of the present application, reference is made to the embodiments of the method of the present application.
Fig. 5 is a block diagram illustrating a data service providing apparatus that generates traffic node information based on weighted cross entropy loss indicators according to an example embodiment. As shown in fig. 5, the data service providing apparatus 50 that generates the service node information based on the weighted cross entropy loss index includes: model module 502, label module 504, training module 506, update module 508, and deployment module 510.
The model module 502 is configured to obtain a machine learning model to be updated corresponding to a service port, and obtain real-time sample data corresponding to the machine learning model;
the label module 504 is configured to determine a sample label for the real-time sample data;
the training module 506 is configured to train the machine learning model through real-time sample data with sample labels;
the updating module 508 is configured to perform weighted adjustment on the cross entropy loss index according to sample distribution and recognition difficulty in real-time sample data in a training iteration process of the machine learning model to generate a one-dimensional weighted cross entropy loss index, and obtain an updated machine learning model when the one-dimensional weighted cross entropy loss index meets a preset policy;
the deployment module 510 is configured to obtain real-time service data from the service port, process the real-time service data based on the updated machine learning model, and generate service node information; and forwarding the service request of the real-time user to a corresponding background server according to the service node information so as to provide data service.
In a specific embodiment, the updating module 508 is specifically configured to, in an iterative process of each round of the machine learning model training, obtain a cross entropy loss index based on a prediction probability of the iterative machine learning model on the real-time sample data; generating a weighting factor according to the number of positive samples and the number of negative samples in the sample distribution of the real-time sample data; determining an adjusting factor according to the prediction probability and the identification difficulty of the real-time sample data; adjusting and balancing the cross entropy loss index through the adjusting factor; and weighting the cross entropy loss index after the balance is adjusted according to the weighting factor to generate the one-dimensional weighted cross entropy loss index.
In a specific embodiment, the updating module 508 is specifically configured to obtain the weight factor of each real-time sample data respectively according to the corresponding relationship between the number of positive samples, the number of negative samples, and the label of each real-time sample data; and carrying out mean value calculation on the weight factors of all the real-time sample data to generate the weight factors.
In a specific embodiment, the model module 502 is specifically configured to determine a model update policy according to a business policy; when a model updating strategy is reached, a machine learning model to be updated corresponding to a service port is obtained, and real-time sample data corresponding to the machine learning model is obtained.
In a specific embodiment, the model module 502 is specifically configured to obtain incremental service data generated in the current model updating period; and performing data cleaning and feature screening on the incremental business data to generate the real-time sample data.
In a specific embodiment, the tag module 504 is specifically configured to obtain a tag decision criterion according to a service policy; extracting a plurality of sample data in the real-time sample data one by one; automatically assigning labels to the plurality of samples according to the label decision criteria.
According to the data service providing device for generating the service node information based on the weighted cross entropy loss index, a machine learning model to be updated and real-time sample data corresponding to the machine learning model are obtained; determining a sample label for the real-time sample data; training the machine learning model through real-time sample data with a sample label; in the training process of the machine learning model, when the one-dimensional weighted cross entropy loss index meets a preset strategy, generating an updated machine learning model; the updated machine learning model is deployed online to provide data service for real-time users, the updating requirement of the real-time online machine learning model can be met, on the premise that the calculation efficiency is met, the safety and accuracy of model calculation are guaranteed, the data safety of the whole system is improved, and the data redundancy cost is reduced.
As shown in fig. 6, an embodiment of the present application provides an electronic device, which includes a processor 610, a communication interface 620, a memory 630, and a communication bus 640, where the processor 610, the communication interface 620, and the memory 630 complete communication with each other through the communication bus 640;
a memory 630 for storing computer programs;
the processor 610 is configured to implement the method for adjusting data distribution permissions based on video expression actions according to any of the embodiments described above when executing the program stored in the memory 630.
In the electronic device provided by the embodiment of the application, the processor 610 acquires data of a target and allocates initial rights and access information by executing a program stored in the memory 630; determining video text content through the access information; establishing a real-time video link with the target, and displaying the video text content according to the video link to generate video data; identifying the expression and action of the user in the video data to determine a corresponding authority adjustment coefficient; and adjusting the data distribution authority of the user according to the initial authority and the authority adjustment coefficient.
The communication bus 640 mentioned in the above electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus 640 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.
The communication interface 620 is used for communication between the above-described electronic device and other devices.
The memory 630 may include a Random Access Memory (RAM) 630, and may also include a non-volatile memory 630 (e.g., at least one disk memory 630). Optionally, the memory 630 may also be at least one storage device located remotely from the processor 610.
The processor 610 may be a general-purpose processor 610, and includes a central processing unit 610 (CPU), a network processor 610 (NP), and the like; the signal processing circuit may also be a digital signal processor 610 (DSP for short), an application specific integrated circuit (ASIC for short), a Field programmable gate array (FPGA for short), or other programmable logic device, discrete gate or transistor logic device, or discrete hardware component.
The present application provides a computer-readable storage medium, which stores one or more programs that are executable by one or more processors to implement the data service providing method for generating service node information based on weighted cross entropy loss index according to any of the foregoing embodiments. For example, a machine learning model to be updated and real-time sample data corresponding to the machine learning model are acquired; determining a sample label for the real-time sample data; training the machine learning model through real-time sample data with a sample label; in the training process of the machine learning model, when the one-dimensional weighted cross entropy loss index meets a preset strategy, generating an updated machine learning model; and performing online deployment on the updated machine learning model to provide data service for the real-time user.
In the above embodiments, all or part of the implementation may be realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The procedures or functions according to the embodiments of the present application are all or partially generated when the computer program instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means. Computer-readable storage media can be any available media that can be accessed by a computer or a data storage device, such as a server, data center, etc., that includes one or more available media. The available media may be magnetic media (e.g., floppy disks, hard disks, tapes), optical media (e.g., DVDs), or semiconductor media (e.g., solid State Disks (SSDs)), among others.
Exemplary embodiments of the present application are specifically illustrated and described above. It is to be understood that the application is not limited to the details of construction, arrangement, or method of implementation described herein; on the contrary, the application is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims (11)

1. A data service providing method for generating service node information based on a weighted cross entropy loss index is characterized by comprising the following steps:
acquiring a machine learning model to be updated corresponding to a service port, and acquiring real-time sample data corresponding to the machine learning model;
determining a sample label for the real-time sample data;
training the machine learning model through real-time sample data with a sample label;
in the training iteration process of the machine learning model, performing weighted adjustment on the cross entropy loss index according to the sample distribution and the identification difficulty in real-time sample data to generate a one-dimensional weighted cross entropy loss index, and obtaining an updated machine learning model when the one-dimensional weighted cross entropy loss index meets a preset strategy;
acquiring real-time service data from the service port, and processing the real-time service data based on the updated machine learning model to generate service node information;
and forwarding the service request of the real-time user to a corresponding background server according to the service node information so as to provide data service.
2. The method of claim 1, wherein the weighting adjustment of the cross-entropy loss indicator according to the sample distribution and the recognition difficulty in the real-time sample data to generate the one-dimensional weighted cross-entropy loss indicator comprises:
in the iterative process of each round of machine learning model training, obtaining a cross entropy loss index based on the prediction probability of the iterative machine learning model to the real-time sample data;
generating a weighting factor according to the number of positive samples and the number of negative samples in the sample distribution of the real-time sample data;
determining an adjusting factor according to the prediction probability and the identification difficulty of the real-time sample data;
adjusting and balancing the cross entropy loss index through the adjusting factor;
and weighting the cross entropy loss index after the balance adjustment through the weighting factor to generate the one-dimensional weighted cross entropy loss index.
3. The method of claim 2, wherein generating weighting factors from the number of positive samples, the number of negative samples in the sample distribution of the real-time sample data comprises:
respectively obtaining the weight factor of each real-time sample data according to the corresponding relation between the number of the positive samples, the number of the negative samples and the label of each real-time sample data;
and carrying out mean value calculation on the weight factors of all the real-time sample data to generate the weight factors.
4. The method of claim 1, wherein obtaining a machine learning model to be updated corresponding to a service port and obtaining real-time sample data corresponding to the machine learning model comprises:
determining a model updating strategy according to the business strategy;
when a model updating strategy is reached, a machine learning model to be updated corresponding to a service port is obtained, and real-time sample data corresponding to the machine learning model is obtained.
5. The method of claim 4, wherein obtaining real-time sample data corresponding to the machine learning model comprises:
obtaining incremental business data generated in the model updating period;
and performing data cleaning and feature screening on the incremental business data to generate the real-time sample data.
6. The method of claim 1, wherein determining a sample label for the real-time sample data comprises:
acquiring a label judgment criterion according to a service strategy;
extracting a plurality of sample data in the real-time sample data one by one;
automatically assigning labels to the plurality of samples according to the label decision criteria.
7. A data service providing apparatus for generating service node information based on a weighted cross entropy loss index, comprising:
the model module is used for acquiring a machine learning model to be updated corresponding to a service port and acquiring real-time sample data corresponding to the machine learning model;
the label module is used for determining a sample label for the real-time sample data;
the training module is used for training the machine learning model through real-time sample data with a sample label;
the updating module is used for performing weighted adjustment on the cross entropy loss index according to the sample distribution and the identification difficulty in the real-time sample data in the training iteration process of the machine learning model to generate a one-dimensional weighted cross entropy loss index, and when the one-dimensional weighted cross entropy loss index meets a preset strategy, the updated machine learning model is obtained;
the deployment module is used for acquiring real-time service data from the service port, processing the real-time service data based on the updated machine learning model and generating service node information; and forwarding the service request of the real-time user to a corresponding background server according to the service node information to provide data service.
8. The apparatus according to claim 7, wherein the updating module is specifically configured to, in an iterative process of each round of the machine learning model training, obtain a cross entropy loss indicator based on a prediction probability of the iterative machine learning model on the real-time sample data; generating a weighting factor according to the number of positive samples and the number of negative samples in the sample distribution of the real-time sample data; determining an adjusting factor according to the prediction probability and the identification difficulty of the real-time sample data; adjusting and balancing the cross entropy loss index through the adjusting factor; and weighting the cross entropy loss index after the balance is adjusted according to the weighting factor to generate the one-dimensional weighted cross entropy loss index.
9. The apparatus according to claim 8, wherein the updating module is specifically configured to obtain the weighting factor of each real-time sample data respectively according to the corresponding relationship between the number of positive samples, the number of negative samples, and the label of each real-time sample data; and carrying out mean value calculation on the weight factors of all the real-time sample data to generate the weight factors.
10. An electronic device, comprising:
one or more processors;
storage means for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-6.
11. A computer-readable medium, on which a computer program is stored, which program, when being executed by a processor, carries out the method of any one of claims 1 to 6.
CN202211186015.2A 2022-09-27 2022-09-27 Data service providing method and device based on service node information Pending CN115660101A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211186015.2A CN115660101A (en) 2022-09-27 2022-09-27 Data service providing method and device based on service node information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211186015.2A CN115660101A (en) 2022-09-27 2022-09-27 Data service providing method and device based on service node information

Publications (1)

Publication Number Publication Date
CN115660101A true CN115660101A (en) 2023-01-31

Family

ID=84985084

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211186015.2A Pending CN115660101A (en) 2022-09-27 2022-09-27 Data service providing method and device based on service node information

Country Status (1)

Country Link
CN (1) CN115660101A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117611932A (en) * 2024-01-24 2024-02-27 山东建筑大学 Image classification method and system based on double pseudo tag refinement and sample re-weighting

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117611932A (en) * 2024-01-24 2024-02-27 山东建筑大学 Image classification method and system based on double pseudo tag refinement and sample re-weighting
CN117611932B (en) * 2024-01-24 2024-04-26 山东建筑大学 Image classification method and system based on double pseudo tag refinement and sample re-weighting

Similar Documents

Publication Publication Date Title
Cai et al. An under‐sampled software defect prediction method based on hybrid multi‐objective cuckoo search
US20190164015A1 (en) Machine learning techniques for evaluating entities
US20180240041A1 (en) Distributed hyperparameter tuning system for machine learning
CN103778548B (en) Merchandise news and key word matching method, merchandise news put-on method and device
CN110458324B (en) Method and device for calculating risk probability and computer equipment
Zhang et al. Objective attributes weights determining based on shannon information entropy in hesitant fuzzy multiple attribute decision making
US11481707B2 (en) Risk prediction system and operation method thereof
CN110674636B (en) Power consumption behavior analysis method
CN115577152B (en) Online book borrowing management system based on data analysis
CN109656818B (en) Fault prediction method for software intensive system
WO2021139279A1 (en) Data processing method and apparatus based on classification model, and electronic device and medium
WO2023093100A1 (en) Method and apparatus for identifying abnormal calling of api gateway, device, and product
CN111797320A (en) Data processing method, device, equipment and storage medium
Sarantitis et al. A network analysis of the United Kingdom’s consumer price index
CN110689211A (en) Method and device for evaluating website service capability
CN115660101A (en) Data service providing method and device based on service node information
CN113762973A (en) Data processing method and device, computer readable medium and electronic equipment
CN113704389A (en) Data evaluation method and device, computer equipment and storage medium
CN117132383A (en) Credit data processing method, device, equipment and readable storage medium
CN115600818A (en) Multi-dimensional scoring method and device, electronic equipment and storage medium
CN115688101A (en) Deep learning-based file classification method and device
CN114912535A (en) Data classification method, system, storage medium and equipment
CN114610590A (en) Method, device and equipment for determining operation time length and storage medium
CN113256173A (en) Routing method, routing device, electronic equipment and storage medium
CN115687034A (en) Service system plane availability judgment method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Country or region after: China

Address after: Room 1109, No. 4, Lane 800, Tongpu Road, Putuo District, Shanghai, 200062

Applicant after: Shanghai Qiyue Information Technology Co.,Ltd.

Address before: Room a2-8914, 58 Fumin Branch Road, Hengsha Township, Chongming District, Shanghai, 201500

Applicant before: Shanghai Qiyue Information Technology Co.,Ltd.

Country or region before: China