CN115376691A - Risk level evaluation method and device, electronic equipment and storage medium - Google Patents

Risk level evaluation method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN115376691A
CN115376691A CN202211062537.1A CN202211062537A CN115376691A CN 115376691 A CN115376691 A CN 115376691A CN 202211062537 A CN202211062537 A CN 202211062537A CN 115376691 A CN115376691 A CN 115376691A
Authority
CN
China
Prior art keywords
behavior data
data
target
determining
type
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211062537.1A
Other languages
Chinese (zh)
Inventor
李方芸
曲以元
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lianren Healthcare Big Data Technology Co Ltd
Original Assignee
Lianren Healthcare Big Data Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lianren Healthcare Big Data Technology Co Ltd filed Critical Lianren Healthcare Big Data Technology Co Ltd
Priority to CN202211062537.1A priority Critical patent/CN115376691A/en
Publication of CN115376691A publication Critical patent/CN115376691A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/30ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients

Landscapes

  • Engineering & Computer Science (AREA)
  • Medical Informatics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • Primary Health Care (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Theoretical Computer Science (AREA)
  • Pathology (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a risk level assessment method, a risk level assessment device, electronic equipment and a storage medium, wherein the method comprises the following steps: behavior data corresponding to each user and stored in a target data source are obtained; for each type of behavior data, determining behavior data to be used from the current type of behavior data, and processing the behavior data to be used based on a prediction model corresponding to the current type to obtain a prediction result; for each behavior data to be used, screening out the behavior data to be applied from the behavior data to be used based on the prediction result of the current type and the corresponding actual result; determining target behavior data based on the behavior data to be applied corresponding to each type of behavior data; and training the risk assessment model to be trained based on the target behavior data to obtain a target risk assessment model, and determining the risk level of the corresponding user based on the target risk assessment model. Based on the technical scheme, the accuracy of risk grade evaluation is improved.

Description

Risk level evaluation method and device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of risk assessment technologies, and in particular, to a method and an apparatus for risk level assessment, an electronic device, and a storage medium.
Background
With the rapid development of the medical industry, the health information of the user is often evaluated in order to ensure the physical health of the user, and then the health risk of the user can be timely prompted when the health information of the user is abnormal.
However, the existing method is to perform field evaluation on the health information of the user in a manual mode, but the field evaluation method is easily influenced by external factors, so that the accuracy and the precision of the evaluation cannot be ensured, and the field evaluation requires the user to be in person, so that the user experience is reduced.
Disclosure of Invention
The invention provides a risk grade evaluation method and device, electronic equipment and a storage medium, wherein a risk evaluation model is obtained by training a risk evaluation model to be trained through target behavior data, so that a risk grade is obtained based on the risk evaluation model, and the accuracy of risk grade evaluation is improved.
In a first aspect, an embodiment of the present invention provides a risk level assessment method, where the method includes:
behavior data corresponding to each user and stored in a target data source are obtained; wherein the behavior data comprises basic behavior data and operational behavior data;
for each type of behavior data, determining behavior data to be used from the current type of behavior data, and processing the behavior data to be used based on a prediction model corresponding to the current type to obtain a prediction result;
for each type of behavior data, screening out behavior data to be applied from the behavior data to be used based on the prediction result of the current type and the corresponding actual result;
determining target behavior data based on the behavior data to be applied corresponding to each type of behavior data;
and training the risk assessment model to be trained based on the target behavior data to obtain a target risk assessment model, and determining the risk level of the corresponding user based on the target risk assessment model.
In a second aspect, an embodiment of the present invention further provides a risk level assessment apparatus, where the apparatus includes:
the data acquisition module is used for acquiring behavior data which are stored in a target data source and correspond to each user; wherein the behavior data comprises basic behavior data and operational behavior data;
the prediction result acquisition module is used for determining behavior data to be used from the current type of behavior data for each type of behavior data and processing the behavior data to be used based on a prediction model corresponding to the current type to obtain a prediction result;
the screening module is used for screening the behavior data to be applied from the behavior data to be used based on the prediction result of the current type and the corresponding actual result for each type of behavior data;
the target behavior data determining module is used for determining target behavior data based on the behavior data to be applied corresponding to each type of behavior data;
and the evaluation module is used for training the risk evaluation model to be trained based on the target behavior data to obtain a target risk evaluation model, and determining the risk level of the corresponding user based on the target risk evaluation model.
In a third aspect, an embodiment of the present invention further provides an electronic device, where the electronic device includes:
one or more processors; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores a computer program executable by the at least one processor, the computer program being executable by the at least one processor to enable the at least one processor to perform the risk level assessment method according to any of the embodiments of the present invention.
In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium, where computer instructions are stored, and the computer instructions are configured to, when executed by a processor, implement the risk level assessment method according to any embodiment of the present invention.
According to the technical scheme, the behavior data corresponding to each user and stored in the target data source are obtained, the behavior data to be used are determined from the current type of behavior data for each type of behavior data, the behavior data to be used are processed based on the prediction model corresponding to the current type, the prediction result is obtained, then the behavior data to be applied are screened from the behavior data to be used for each type of behavior data based on the prediction result of the current type and the corresponding actual result, the target behavior data are determined based on the behavior data to be applied corresponding to each type of behavior data, and finally the risk assessment model to be trained based on the target behavior data can be trained, so that the target risk assessment model is obtained, and the risk level of the corresponding user is determined based on the target risk assessment model. Based on the technical scheme, the accuracy of risk level assessment is improved, and the effect of improving user experience is further achieved.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present invention, nor do they necessarily limit the scope of the invention. Other features of the present invention will become apparent from the following description.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 is a schematic flow chart of a risk level assessment method according to an embodiment of the present invention;
FIG. 2 is a flowchart of a risk level assessment method according to an embodiment of the present invention;
fig. 3 is a block diagram of a risk level assessment apparatus according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It is understood that before the technical solutions disclosed in the embodiments of the present invention are used, the type, the use range, the use scene, etc. of the personal information related to the present invention should be informed to the user and authorized by the user in a proper manner according to the relevant laws and regulations.
For example, in response to receiving an active request from a user, a prompt message is sent to the user to explicitly prompt the user that the requested operation to be performed would require the acquisition and use of personal information to the user. Therefore, the user can select whether to provide personal information to the software or hardware such as electronic equipment, application program, server or storage medium for executing the operation of the technical scheme of the invention according to the prompt information.
As an alternative but non-limiting implementation manner, in response to receiving an active request from the user, the manner of sending the prompt information to the user may be, for example, a pop-up window manner, and the prompt information may be presented in a text manner in the pop-up window. In addition, a selection control for providing personal information to the electronic device by the user's selection of "agreeing" or "disagreeing" can be carried in the pop-up window.
It is understood that the above notification and user authorization process is only illustrative and not limiting, and other ways of satisfying relevant laws and regulations may be applied to the implementation of the present disclosure.
It will be appreciated that the data involved in the subject technology, including but not limited to the data itself, the acquisition or use of the data, should comply with the requirements of the corresponding laws and regulations and related regulations.
Example one
Fig. 1 is a schematic flow diagram of a risk level assessment method according to an embodiment of the present invention, where the present embodiment is applicable to a situation where behavior data is screened to obtain target behavior data, a risk assessment model to be trained is trained based on the target behavior data, and further risk assessment is performed based on the risk assessment model, the method may be executed by a risk level assessment device, the risk level assessment device may be implemented in a hardware/software manner, the device may be configured in an electronic device, and the electronic device may be a PC terminal or a server terminal, or the like.
As shown in fig. 1, the method includes:
and S110, acquiring behavior data corresponding to each user and stored in the target data source.
Wherein the behavior data comprises base behavior data and operational behavior data. The target data source may be a database for storing behavioral data, such as MySQL data, an Oracle database, or the like. Behavior data may be understood as data generated based on user's behavior, for example, user's operation behavior data behavior may be data generated by behaviors of browsing web page, using APP, talking data, SIM card travel, etc., and basic data may be data composed of basic attributes of user, such as package data, gender data, etc.
Specifically, by selecting the corresponding target data source and reading the behavior data corresponding to each user stored in the target data source, for example, the user may designate data stored by an operator as the target data source, and further read the behavior data corresponding to the user in the target data source, it should be noted that the data may be divided into basic behavior data and operation behavior data according to the attribute of the behavior data, where the basic behavior data may be behavior data generated based on basic information of the user, and correspondingly, the operation behavior data may be behavior data generated according to the operation behavior of the user. For example, if the gender of the user a is female, the data generated based on the basic attribute is basic behavior data, and if the network access time of the user a per day is 6 hours, the data generated based on the operation behavior is operation behavior data.
On the basis of the technical scheme, before acquiring the behavior data corresponding to each user and stored in the target data source, the method comprises the following steps: reading pre-stored treatment data and diagnosis and treatment data, and analyzing the treatment data and the diagnosis and treatment data to obtain at least one quantitative result; and determining corresponding risk grade information based on the at least one quantification result and the diagnosis and treatment data and diagnosis and treatment data corresponding to each user, and storing the risk grade information.
The diagnosis data may be diagnosis times data of the user, for example, the diagnosis times of the user within a certain time, and accordingly, the diagnosis data may be understood as the accumulated diagnosis and treatment expenditure of the user. The quantitative result may be understood as a result for determining the risk level of the user, and may be, for example, a result of comprehensively analyzing the visit data and the diagnosis data, obtaining a quantitative result corresponding to the visit data and the diagnosis data, and determining the corresponding risk level based on the quantitative result. The risk level information may be used to characterize the user's current risk level.
Specifically, the pre-stored treatment data and diagnosis and treatment data are read, the treatment data and the diagnosis and treatment data are analyzed to obtain at least one quantitative result, the risk grade information of each user is determined according to the at least one quantitative result and the treatment data and the diagnosis and treatment data corresponding to each user, and the risk grade information is stored. For example, the diagnosis data and the diagnosis data stored in advance may be read, and the diagnosis data may be analyzed by a human to obtain a corresponding quantitative result, where the quantitative result is used to represent a corresponding relationship between the diagnosis data, the diagnosis data and the risk levels, for example, if the current user has no diagnosis data and diagnosis data, the risk level of the current user is 0, if the current user has diagnosis data smaller than 5, the risk level of the current user is 1, if the current user has diagnosis data greater than or equal to 5, the risk level of the current user is 2, if the current user has diagnosis data smaller than thirty thousand, the risk level of the current user is 3, if the current user has diagnosis data greater than or equal to thirty thousand, the current user has risk level of 4, and after the quantitative result is obtained, it may be understood that the diagnosis data may be the number of times of the user within a certain time, the diagnosis data may be diagnosis cost of the user within a certain time, and further, the corresponding quantitative result information of the diagnosis data and the diagnosis result may be determined based on the diagnosis data.
And S120, for each type of behavior data, determining behavior data to be used from the current type of behavior data, and processing the behavior data to be used based on a prediction model corresponding to the current type to obtain a prediction result.
The behavior data to be used may be behavior data that needs to be used, and it should be noted that, because the types of the behavior data are different, corresponding behavior data to be used needs to be determined for different types of behavior data. The prediction model may be a model that is preset to perform the preliminary prediction. The prediction result may be a result corresponding to the behavior data to be used, which is output by the prediction model after the behavior data to be used is input into the prediction model.
Specifically, for each different type of behavior data, the behavior data that needs to be used may be determined from the current type of behavior data, and is used as the behavior data to be used, for example, the behavior data that needs to be used may be selected by the user, and after the data to be used is obtained, the data to be used may be input into the prediction model corresponding to the current type, so as to obtain the prediction result corresponding to the data to be used. It should be noted that, because the technical solution of the embodiment of the present invention is to evaluate the risk level, all the prediction models output the corresponding risk level prediction result based on the input.
On the basis of the above technical solution, for each type of behavior data, determining the behavior data to be used from the current type of behavior data includes: determining median information corresponding to the various types of behavior data, and determining a median absolute difference corresponding to the various types of behavior data based on the median information; and filtering the behavior data of each type according to the median absolute difference and preset difference information to obtain the behavior data to be used.
The median information may be a median corresponding to the behavior data, for example, the current type of behavior data may be sorted, and data in a middle position is selected according to a sorting result and is used as the median. Accordingly, the median absolute difference can be understood as difference information obtained based on the median and each behavior data. The preset difference information may be a preset difference threshold.
Specifically, the various types of behavior data are sorted according to the size of the data, corresponding sorting results are obtained, corresponding median information is determined according to the sorting results, corresponding median absolute differences are determined based on the median information and the various types of behavior data, and the various types of behavior data are filtered based on the pre-difference information and the median absolute differences to obtain the behavior data to be used. It should be noted that the median absolute difference may be obtained by subtracting each data from the median to obtain a corresponding difference, summing all the differences, and dividing the sum by the total number of the data, and then filtering each type of behavior data according to preset difference information and the median absolute difference to obtain behavior data to be used. For example, if the preset difference information is 0, behavior data of a type in which the median absolute difference is not 0 may be retained and used as behavior data to be used.
According to the technical scheme of the embodiment of the disclosure, the median absolute difference and the preset difference information are adopted to filter each type of behavior data to obtain the behavior data to be used, so that each type of behavior data can be subjected to elastic processing, and the influence of abnormal values on a data set is greatly reduced.
S130, for each to-be-used behavior data, screening out the to-be-applied behavior data from the to-be-used behavior data based on the prediction result of the current type and the corresponding actual result.
The actual result may be risk level information of the user corresponding to the current behavior data, and it should be noted that the actual result may be generated before the behavior data is read, and may be risk level information determined by a manual determination method. The behavior data to be applied can be understood as behavior data obtained after screening.
Specifically, after a prediction result corresponding to the behavior data to be used is obtained, the behavior data to be applied is obtained by screening based on the actual result stored in advance and the behavior data to be used of the prediction result. For example, the matching data amount of the prediction result and the actual result generated by each behavior data to be used may be counted, and then the matching rate may be determined according to the matching amount, and further the behavior data to be applied may be determined according to the matching rate of the prediction result and the actual result of the behavior to be applied. .
On the basis of the above technical solution, the screening of behavior data to be applied from the behavior data to be used based on the prediction result of the current type and the corresponding actual result for each behavior data to be used includes: determining a working characteristic curve corresponding to the current type based on the prediction result of the current type and the corresponding actual result; and determining consistency parameters corresponding to the current type based on the working characteristic curve, and screening the behavior data to be used based on the consistency parameters and preset parameter thresholds to obtain the behavior data to be applied.
The operating characteristic curve may be a ROC curve (ROC), which is a connection line of points drawn by using the false reporting probability P (y/N) obtained under different judgment standards as an abscissa and the hit probability P (y/SN) as an ordinate. The consistency parameter may be a parameter obtained based on the working characteristic curve, and is used for characterizing the matching degree of the current type of behavior data and the prediction model. The preset parameter threshold value may be understood as a preset parameter value.
Specifically, a working characteristic curve corresponding to the current type is determined according to the prediction result of the current type and the corresponding actual result, a consistency parameter corresponding to the current type is determined based on the working characteristic curve, and then the behavior data to be used is filtered again according to a preset parameter threshold and the consistency parameter to obtain the behavior data to be applied. For example, a parameter threshold value may be set to 0.58 in advance, and the consistency parameter corresponding to each behavior data to be used is compared with the preset parameter threshold value, and the behavior data to be used with the consistency parameter greater than 0.58 is reserved as the behavior data to be applied.
And S140, determining target behavior data based on the behavior data to be applied corresponding to each type of behavior data.
The target behavior data may be behavior data obtained by screening from behavior data to be applied.
Specifically, according to behavior data to be applied corresponding to each type of behavior data, a user can determine final target behavior data, for example, the user can determine the target behavior data according to whether the current type of behavior data to be applied is related to risk assessment, namely, data which is not related to the risk assessment is deleted, data which is related to the risk assessment is reserved, or each data is input into a random forest model, importance ranking is performed on each data based on the characteristics of the random model to obtain a corresponding importance ranking result, and then data with the highest importance is selected as the target behavior data according to the ranking result which can be changed.
On the basis of the above technical solution, the determining target behavior data based on the to-be-applied behavior data corresponding to each type of behavior data includes: determining a correlation coefficient between behavior data to be applied corresponding to each type of behavior data; and screening the behavior data to be applied corresponding to each type of behavior data based on the correlation coefficient and a preset correlation threshold to obtain the target behavior data.
The correlation coefficient may be used to characterize the degree of correlation between the types of data, for example, if the correlation coefficient between the data of type a and the data of type B is 1, it is proved that the data of type a and the data of type B are linearly correlated. The preset correlation threshold value may be a preset correlation coefficient value.
Specifically, for behavior data to be applied corresponding to each type of behavior data, correlation coefficients between different types of behavior data to be applied may be determined, and the behavior data to be applied is filtered according to the correlation coefficients and preset correlation thresholds to obtain target behavior data, for example, if the preset correlation threshold is 0.9, and the correlation coefficient between the behavior data to be applied of type a and the behavior data to be applied of type B is 0.91, that is, the prediction values provided for the model by the behavior data to be applied of type a and the behavior data to be applied of type B are approximate, only one of them needs to be retained.
According to the technical scheme of the embodiment of the disclosure, the target behavior data is obtained by calculating the correlation coefficient between different types of behavior data to be applied and filtering the behavior data to be applied based on the preset correlation threshold and the correlation coefficient, so that highly correlated behavior data to be applied can be removed, and the dimensionality of the data is further reduced.
S150, training the risk assessment model to be trained based on the target behavior data to obtain a target risk assessment model, and determining the risk level of the corresponding user based on the target risk assessment model.
The risk assessment model to be trained can be understood as a model which is not trained, and the corresponding target risk assessment model can be a risk assessment model obtained after the risk assessment model to be trained is trained.
Specifically, the target behavior data may be used as a sample, and the target behavior data is input into the risk assessment model to be trained to obtain an actual risk level corresponding to the target behavior data, and then parameters of the model are corrected according to the actual risk level and a corresponding theoretical risk level, where it is to be noted that the theoretical risk level may be risk level information determined before the behavior data is read. For example, whether the model is trained completely or not can be determined according to the matching rate between the actual result and the theoretical result, if the matching rate is detected to reach a preset threshold value, the model is determined to be trained completely, and if the matching rate is detected not to reach the preset threshold value, the model continues to be trained to obtain the risk assessment model meeting the requirements of the user.
On the basis of the technical scheme, the training of the risk assessment model to be trained based on the target behavior data to obtain the target risk assessment model comprises the following steps: dividing the target behavior data into training data and testing data according to a preset proportion, and training the risk assessment model to be trained based on the testing data to obtain the target risk assessment model; and inputting the test data into the target risk assessment model to obtain a test result, and assessing the performance of the target risk assessment model based on the test result.
The preset ratio may be a preset division ratio. Training data may be understood as data for training a model, and correspondingly, test data may be understood as data for testing a model. The test result may be a predicted result of inputting the test data into the target risk assessment model.
Specifically, the target behavior data are divided according to a preset proportion to obtain training data and test data, the risk assessment model to be trained is trained on the basis of the test data to obtain a target risk assessment model, after the target risk assessment model is obtained, the test data can be input into the target risk assessment model to obtain a corresponding test result, and the performance of the target risk assessment model is assessed according to the test result. For example, the preset ratio may be 4:1, that is, the target behavior data is divided into 5 parts, four parts are training data, and one part is testing data.
On the basis of the above technical solution, the inputting the test data into the target risk assessment model to obtain a test result, and assessing the performance of the target risk assessment model based on the test result includes: determining a confusion matrix according to the test result, and determining a grading attribute corresponding to the target risk assessment model based on the confusion matrix; determining whether the target risk assessment model meets a performance index based on the scoring attributes and preset scoring attributes; and if the target risk assessment model does not meet the performance index, continuing to train the target risk assessment model.
The confusion matrix may be a matrix used for evaluating the prediction accuracy of the model, and it should be noted that each column of the confusion matrix represents a prediction category, and the total number of each column represents the number of data predicted as the category; each row represents a true attribution category of data, and the total number of data in each row represents the number of data instances for that category. The scoring attributes may be understood as performance scores of the target risk assessment model. The preset scoring attribute may be a preset scoring threshold. The performance index can be understood as the performance that the user needs the target risk assessment model to achieve.
Specifically, a corresponding confusion matrix can be determined according to the test result by counting the test result corresponding to the test data, a scoring attribute corresponding to the target risk assessment model is determined according to the confusion matrix, whether the model meets the performance index of the user or not is determined according to the scoring attribute and the preset scoring attribute, and if not, the model continues to be trained. For example, the various results may be set to (true positive) TP: predicting the positive sample as positive by the model, namely predicting the positive class as the positive class; (false positive) FP: predicting the negative sample as positive sample by the model, and predicting the negative class as positive class; (false negative) FN: predicting positive samples which are predicted to be negative by the model, and predicting positive samples to be negative samples; (true negative) TN: predicting the negative sample as negative by the model, and predicting the negative class as negative class; and calculating Precision (Precision) and Recall (Recall) corresponding to the model, further calculating the F1 score of the model based on the Precision and the Recall, and taking the F1 score as the score attribute of the model.
On the basis of the above technical solutions, the technical solution provided by this embodiment can be further described with reference to fig. 2, as shown in fig. 2:
reading behavior data, and establishing a quantitative index: specifically, behavior data corresponding to the user and pre-stored in the target data source are read, pre-stored treatment data and diagnosis and treatment data are read, and the treatment data and the diagnosis and treatment data are analyzed to obtain corresponding quantitative indexes.
Screening the median absolute difference: specifically, the median absolute deviation corresponding to each behavior data is calculated, and features including prediction information are screened. It should be noted that, since the median absolute deviation describes the variation degree of the feature in the sample, and the median absolute deviation of 0 indicates that the feature values of the behavior data in all behavior data are very close, so that information cannot be provided for the prediction model, the behavior data with the median absolute difference of 0 is deleted to obtain the behavior data to be used.
Screening consistency parameters: specifically, different types of behavior data to be used, which are screened and reserved in the last step, are respectively predicted by using prediction models corresponding to the data types to obtain corresponding prediction results, a Receiver Operating Characteristic (ROC) curve is drawn according to the prediction results and real results, and consistency parameters are calculated according to the ROC curve. Because the consistency parameter describes the prediction capability of the characteristic on the model, the characteristic with lower consistency index has poorer prediction capability on the model, and thus has no prediction value, the data with the consistency parameter lower than 0.58 is deleted, and the behavior data to be applied is obtained.
Deleting highly relevant behavioral data: specifically, on the basis of the result of the previous screening, the correlation coefficient between each type of behavior data to be applied is calculated, and the highly correlated behavior data is removed to further reduce the dimensionality of the behavior data. For the characteristic pairs with the correlation coefficient more than or equal to 0.90, the behavior data with more prediction value can be reserved, and the other behavior data is deleted to further obtain the target behavior data.
Data division: specifically, the target behavior data is randomly divided into training data and test data according to the proportion of 4:1, and then the training data is used for optimizing model parameters. Before training the model, 25% of the training data may also be randomly selected to form a validation queue to guide the hyper-parametric selection of the model.
Constructing a model: specifically, in a diagnosis model construction stage, a random forest classifier is adopted to construct a multi-classification model. Random Forest (RF) is a classifier that trains and predicts samples using a number of decision trees. Explaining from an intuitive perspective, each decision tree is a classifier, so for one input sample, N trees will produce N classification results, and RF integrates the classification voting results of all decision trees, and designates the category with the highest voting number as the final output. The advantage of RF is that it can be used for large sample size, large variable dimension datasets while avoiding over-fitting. In addition, RF can also evaluate the importance of individual predictor variables on the classification task.
And (3) testing a model: specifically, the model and the hyper-parameters are adjusted and optimized by using the training data, and the model test is performed by using the test data. Calculating a confusion matrix by correctly recording the 5 types of estimation as 1, and otherwise, calculating the number of true positive, false negative and false positive, and finally calculating an F1 score to evaluate the performance of the model.
According to the technical scheme, the behavior data corresponding to each user and stored in the target data source are obtained, the behavior data to be used are determined from the current type of behavior data for each type of behavior data, the behavior data to be used are processed based on the prediction model corresponding to the current type, the prediction result is obtained, then the behavior data to be applied are screened from the behavior data to be used for each type of behavior data based on the prediction result of the current type and the corresponding actual result, the target behavior data are determined based on the behavior data to be applied corresponding to each type of behavior data, and finally the risk assessment model to be trained based on the target behavior data can be trained, so that the target risk assessment model is obtained, and the risk level of the corresponding user is determined based on the target risk assessment model. Based on the technical scheme, the accuracy of risk level assessment is improved, and the effect of improving user experience is further achieved
Example two
Fig. 3 is a block diagram of a risk level assessment apparatus according to an embodiment of the present disclosure. The device comprises: a data acquisition module 310, a prediction result acquisition module 320, a screening module 330, a target behavior data determination module 340, and an evaluation module 350.
A data obtaining module 310, configured to obtain behavior data corresponding to each user and stored in a target data source; wherein the behavior data comprises basic behavior data and operational behavior data;
the prediction result obtaining module 320 is configured to determine behavior data to be used from the current type of behavior data for each type of behavior data, and process the behavior data to be used based on a prediction model corresponding to the current type to obtain a prediction result;
the screening module 330 is configured to, for each to-be-used behavior data, screen out to-be-applied behavior data from the to-be-used behavior data based on the current type prediction result and the corresponding actual result;
the target behavior data determining module 340 is configured to determine target behavior data based on to-be-applied behavior data corresponding to each type of behavior data;
and the evaluation module 350 is configured to train the risk evaluation model to be trained based on the target behavior data to obtain a target risk evaluation model, and determine the risk level of the corresponding user based on the target risk evaluation model.
On the basis of the above technical solution, the apparatus further comprises:
the risk grade information determining module is used for reading pre-stored treatment data and diagnosis and treatment data, and analyzing the treatment data and the diagnosis and treatment data to obtain at least one quantitative result; and determining corresponding risk grade information based on the at least one quantitative result and the clinic data and the diagnosis and treatment data corresponding to each user, and storing the risk grade information.
On the basis of the above technical solution, the prediction result obtaining module includes:
the to-be-used behavior data acquisition unit is used for determining median information corresponding to each type of behavior data and determining a median absolute difference corresponding to each type of behavior data based on the median information; and filtering the behavior data of each type according to the median absolute difference and preset difference information to obtain the behavior data to be used.
On the basis of the technical scheme, the screening module is specifically used for determining a working characteristic curve corresponding to the current type based on the prediction result of the current type and the corresponding actual result; and determining consistency parameters corresponding to the current type based on the working characteristic curve, and screening the behavior data to be used based on the consistency parameters and preset parameter thresholds to obtain the behavior data to be applied.
On the basis of the technical scheme, the target behavior data determining module is used for determining correlation coefficients between behavior data to be applied corresponding to each type of behavior data; and screening the behavior data to be applied corresponding to each type of behavior data based on the correlation coefficient and a preset correlation threshold to obtain the target behavior data.
On the basis of the above technical solution, the evaluation module further comprises:
the data dividing unit is used for dividing the target behavior data into training data and test data according to a preset proportion, and training the risk assessment model to be trained based on the test data to obtain the target risk assessment model;
and the test unit is used for inputting the test data into the target risk assessment model to obtain a test result, and assessing the performance of the target risk assessment model based on the test result.
On the basis of the technical scheme, the test unit is specifically configured to determine a confusion matrix according to the test result, and determine a scoring attribute corresponding to the target risk assessment model based on the confusion matrix; determining whether the target risk assessment model meets a performance index based on the scoring attributes and preset scoring attributes; and if the target risk assessment model does not meet the performance index, continuing to train the target risk assessment model.
According to the technical scheme, the behavior data corresponding to each user and stored in the target data source are obtained, the behavior data to be used are determined from the current type of behavior data for each type of behavior data, the behavior data to be used are processed based on the prediction model corresponding to the current type, the prediction result is obtained, then the behavior data to be applied are screened from the behavior data to be used for each type of behavior data based on the prediction result of the current type and the corresponding actual result, the target behavior data are determined based on the behavior data to be applied corresponding to each type of behavior data, and finally the risk assessment model to be trained based on the target behavior data can be trained, so that the target risk assessment model is obtained, and the risk level of the corresponding user is determined based on the target risk assessment model. Based on the technical scheme, the accuracy of risk level assessment is improved, and the effect of improving user experience is further achieved
The risk level assessment device provided by the embodiment of the invention can execute the risk level assessment method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.
It should be noted that, the units and modules included in the apparatus are merely divided according to functional logic, but are not limited to the above division as long as the corresponding functions can be implemented; in addition, specific names of the functional units are only used for distinguishing one functional unit from another, and are not used for limiting the protection scope of the embodiments of the present disclosure.
EXAMPLE III
FIG. 4 shows a schematic block diagram of an electronic device 10 that may be used to implement an embodiment of the invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital assistants, cellular phones, smart phones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
As shown in fig. 4, the electronic device 10 includes at least one processor 11, and a memory communicatively connected to the at least one processor 11, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, and the like, wherein the memory stores a computer program executable by the at least one processor, and the processor 11 may perform various suitable actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from the storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data necessary for the operation of the electronic apparatus 10 may also be stored. The processor 11, the ROM 12, and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.
A number of components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, or the like; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, or the like. The processor 11 performs the various methods and processes described above, such as the risk level assessment method.
In some embodiments, the risk level assessment method may be implemented as a computer program tangibly embodied in a computer-readable storage medium, such as storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. When loaded into RAM 13 and executed by processor 11, the computer program may perform one or more of the steps of the risk level assessment method described above. Alternatively, in other embodiments, the processor 11 may be configured to perform the risk level assessment method by any other suitable means (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
A computer program for implementing the methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be performed. A computer program can execute entirely on a machine, partly on a machine, as a stand-alone software package partly on a machine and partly on a remote machine or entirely on a remote machine or server.
In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. A computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user can be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service are overcome.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present invention may be executed in parallel, sequentially, or in different orders, and are not limited herein as long as the desired result of the technical solution of the present invention can be achieved.
The above-described embodiments should not be construed as limiting the scope of the invention. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A method for risk level assessment, comprising:
behavior data corresponding to each user and stored in a target data source are obtained; wherein the behavior data comprises basic behavior data and operational behavior data;
for each type of behavior data, determining behavior data to be used from the current type of behavior data, and processing the behavior data to be used based on a prediction model corresponding to the current type to obtain a prediction result;
for each behavior data to be used, screening out the behavior data to be applied from the behavior data to be used based on the prediction result of the current type and the corresponding actual result;
determining target behavior data based on the behavior data to be applied corresponding to each type of behavior data;
and training the risk assessment model to be trained based on the target behavior data to obtain a target risk assessment model, and determining the risk level of the corresponding user based on the target risk assessment model.
2. The method of claim 1, prior to obtaining the behavioral data corresponding to each user stored in the target data source, comprising:
reading pre-stored treatment data and diagnosis and treatment data, and analyzing the treatment data and the diagnosis and treatment data to obtain at least one quantitative result;
and determining corresponding risk grade information based on the at least one quantification result and the diagnosis and treatment data and diagnosis and treatment data corresponding to each user, and storing the risk grade information.
3. The method according to claim 1, wherein for each activity data to be used, determining the activity data to be used from the activity data of the current type comprises:
determining median information corresponding to the various types of behavior data, and determining a median absolute difference corresponding to the various types of behavior data based on the median information;
and filtering the behavior data of each type according to the median absolute difference and preset difference information to obtain the behavior data to be used.
4. The method according to claim 1, wherein for each type of behavior data, screening out behavior data to be applied from the behavior data to be used based on the current type of predicted result and the corresponding actual result, comprises:
determining a working characteristic curve corresponding to the current type based on the prediction result of the current type and the corresponding actual result;
and determining consistency parameters corresponding to the current type based on the working characteristic curve, and screening the behavior data to be used based on the consistency parameters and preset parameter thresholds to obtain the behavior data to be applied.
5. The method according to claim 1, wherein the determining the target behavior data based on the to-be-applied behavior data corresponding to each type of behavior data comprises:
determining a correlation coefficient between behavior data to be applied corresponding to each type of behavior data;
and screening the behavior data to be applied corresponding to each type of behavior data based on the correlation coefficient and a preset correlation threshold to obtain the target behavior data.
6. The method according to claim 1, wherein the training a risk assessment model to be trained based on the target behavior data to obtain a target risk assessment model comprises:
dividing the target behavior data into training data and testing data according to a preset proportion, and training the risk assessment model to be trained based on the testing data to obtain the target risk assessment model;
and inputting the test data into the target risk assessment model to obtain a test result, and assessing the performance of the target risk assessment model based on the test result.
7. The method of claim 6, wherein inputting the test data into the target risk assessment model results in a test result, and assessing performance of the target risk assessment model based on the test result comprises:
determining a confusion matrix according to the test result, and determining a grading attribute corresponding to the target risk assessment model based on the confusion matrix;
determining whether the target risk assessment model meets a performance index based on the scoring attributes and preset scoring attributes;
and if the target risk assessment model does not meet the performance index, continuing to train the target risk assessment model.
8. A risk level assessment apparatus, comprising:
the data acquisition module is used for acquiring behavior data which are stored in a target data source and correspond to each user; wherein the behavior data comprises basic behavior data and operational behavior data;
the prediction result acquisition module is used for determining behavior data to be used from the current type of behavior data for each type of behavior data and processing the behavior data to be used based on a prediction model corresponding to the current type to obtain a prediction result;
the screening module is used for screening the behavior data to be applied from the behavior data to be used based on the prediction result of the current type and the corresponding actual result for each type of behavior data;
the target behavior data determining module is used for determining target behavior data based on the to-be-applied behavior data corresponding to each type of behavior data;
and the evaluation module is used for training the risk evaluation model to be trained based on the target behavior data to obtain a target risk evaluation model, and determining the risk level of the corresponding user based on the target risk evaluation model.
9. An electronic device, characterized in that the electronic device comprises:
one or more processors; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the risk level assessment method of any one of claims 1-7.
10. A computer-readable storage medium having stored thereon computer instructions for causing a processor to, when executed, implement the risk level assessment method of any one of claims 1-7.
CN202211062537.1A 2022-09-01 2022-09-01 Risk level evaluation method and device, electronic equipment and storage medium Pending CN115376691A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211062537.1A CN115376691A (en) 2022-09-01 2022-09-01 Risk level evaluation method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211062537.1A CN115376691A (en) 2022-09-01 2022-09-01 Risk level evaluation method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115376691A true CN115376691A (en) 2022-11-22

Family

ID=84070613

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211062537.1A Pending CN115376691A (en) 2022-09-01 2022-09-01 Risk level evaluation method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115376691A (en)

Similar Documents

Publication Publication Date Title
CN107040397B (en) Service parameter acquisition method and device
CN109242135B (en) Model operation method, device and business server
US20190180379A1 (en) Life insurance system with fully automated underwriting process for real-time underwriting and risk adjustment, and corresponding method thereof
CN116955092B (en) Multimedia system monitoring method and system based on data analysis
CN113837596B (en) Fault determination method and device, electronic equipment and storage medium
US11481707B2 (en) Risk prediction system and operation method thereof
CN111144941A (en) Merchant score generation method, device, equipment and readable storage medium
WO2023029065A1 (en) Method and apparatus for evaluating data set quality, computer device, and storage medium
KR102195629B1 (en) Method for selecting workers based on capability of work in crowdsourcing based projects for artificial intelligence training data generation
CN114049197A (en) Data processing method, model building device and electronic equipment
CN109685255A (en) A kind of method and apparatus for predicting customer churn
CN112101572A (en) Model optimization method, device, equipment and medium
CN111275338A (en) Method, device, equipment and storage medium for judging enterprise fraud behaviors
CN111340540B (en) Advertisement recommendation model monitoring method, advertisement recommendation method and advertisement recommendation model monitoring device
TWI677830B (en) Method and device for detecting key variables in a model
CN116739742A (en) Monitoring method, device, equipment and storage medium of credit wind control model
CN113919432A (en) Classification model construction method, data classification method and device
CN109711450A (en) A kind of power grid forecast failure collection prediction technique, device, electronic equipment and storage medium
CN110704614B (en) Information processing method and device for predicting user group type in application
CN117593115A (en) Feature value determining method, device, equipment and medium of credit risk assessment model
CN115376691A (en) Risk level evaluation method and device, electronic equipment and storage medium
CN115630708A (en) Model updating method and device, electronic equipment, storage medium and product
CN115481694A (en) Data enhancement method, device, equipment and storage medium for training sample set
CN114936204A (en) Feature screening method and device, storage medium and electronic equipment
CN111815442B (en) Link prediction method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination