CN111048207B

CN111048207B - Plasma donor evaluation method and system

Info

Publication number: CN111048207B
Application number: CN201911374472.2A
Authority: CN
Inventors: 杨智钧; 杨佑禄; 白永明
Original assignee: Sichuan Jiuba Village Information Technology Co ltd
Current assignee: Sichuan Jiuba Village Information Technology Co ltd
Priority date: 2019-12-27
Filing date: 2019-12-27
Publication date: 2023-06-16
Anticipated expiration: 2039-12-27
Also published as: CN111048207A

Abstract

The invention discloses a plasma donor evaluation method, which belongs to the technical field of data processing analysis, and comprises the following steps: extracting a plurality of groups of data sources; performing data preliminary preprocessing on each group of data sources to obtain each input characteristic of each group of data; performing correlation analysis to calculate correlation coefficients between different input features; according to the obtained correlation coefficients, calculating the weights corresponding to the input features in each group of data through a hierarchical analysis method; normalizing each input feature, and respectively carrying out weighted summation on each input feature in each group of data; and automatically clustering the weighted sum result set through a machine learning algorithm to grade the serous staff corresponding to each group of data so as to realize the input information of the serous staff of the service system of the serous station to infer the market potential of each serous staff, thereby enabling the serous station and the biological company to pertinently take different popularization measures for different serous staff.

Description

Plasma donor evaluation method and system

Technical Field

The invention belongs to the technical field of data processing analysis, relates to the technical fields of correlation analysis, hierarchical analysis, cluster analysis, python computer language, data feature engineering, machine learning and the like, and particularly relates to a plasma donor evaluation method and system.

Background

The slurry station is faced with a large number of new, fixed, young and senior slurry members each day, the slurry members coming from various places, the slurry members being of different sexes and the slurry members being at different distances from the slurry station. Most of the slurry stations realize full automation of the slurry collecting flow at present, so that the efficiency of the service flow of the slurry stations is greatly improved, but the slurry collecting amount of the slurry stations at present in China has a tendency of reduction, and the collection amount of the plasma in China is insufficient according to the self-sufficient standard given by the world health organization. Compared with the United states, the population of the United states is only one fourth of that of the United states, and the pulp yield of the United states in 2017 is 2.5 times that of China, so that the population has a large lifting space in the aspect of pulp yield.

At present, in order to expand the service of the pulpers, publicize and determine the crowd of advertising marks, the pulp station and the biological company judge whether the grade of the pulpers is a high-quality pulper or a general pulper according to the past experience or one or two standards, so that a great amount of judgment errors often exist, because the grading dimension of the pulpers is more than two, a decision maker can hardly comprehensively analyze a plurality of dimensions, and the support of a great amount of data is also lacking.

Based on the above, it is necessary to analyze the grades of the pulpers from multiple dimensions according to a large amount of data, and there is no method for grading the pulpers according to the characteristics of the pulpers in the market, which results in blindness of the propaganda of the pulp station and increases the cost of the propaganda of the pulp station.

Disclosure of Invention

In view of the above, in order to solve the above problems in the prior art, an object of the present invention is to provide a plasma supplier evaluation method and system for estimating market potential of each plasma supplier from input information of the plasma supplier of a plasma supplier business system, so as to enable a plasma supplier and a biological company to pertinently take different popularization measures for different plasma suppliers.

The technical scheme adopted by the invention is as follows: a method of evaluating a donor, the method comprising:

extracting a plurality of groups of data sources;

performing data preliminary preprocessing on each group of data sources to obtain each input characteristic of each group of data;

performing correlation analysis to calculate correlation coefficients between different input features;

according to the obtained correlation coefficients, calculating the weights corresponding to the input features in each group of data through a hierarchical analysis method;

normalizing each input feature, and respectively carrying out weighted summation on each input feature in each group of data;

automatically clustering the weighted sum result set through a machine learning algorithm;

and grading the pulpers corresponding to each group of data by taking the output of the automatic clustering as a grading basis.

Further, the input characteristics of each set of data include pulp donation frequency, potential pulp donation time, type of pulper, and gender.

Further, the types of the sizing agents are divided into fixed sizing agents and non-fixed sizing agents, the definition label of the fixed sizing agents is 1, and the definition label of the non-fixed sizing agents is 0.

Further, the gender defines the gender of the male and female respectively according to the historical pulp donation number proportion of the male and female.

Further, in the correlation analysis, the pulp donation frequency is taken as a main factor, and correlation coefficients between the potential pulp donation time, the type of the pulp player, the sex and the pulp donation frequency are calculated.

Further, the analytic hierarchy process includes:

(a) Establishing a judgment matrix according to each correlation coefficient and each input characteristic;

(b) Analyzing the consistency of the judgment matrix, and if the consistency is not met, re-producing the judgment matrix; and if the consistency is met, calculating to obtain the weight of each input characteristic.

The analytic hierarchy process is a system method which takes a complex multi-objective decision problem as a system, decomposes an objective into a plurality of objectives or criteria, further decomposes the objectives into a plurality of layers of multi-indexes (or criteria and constraints), calculates single-order (weights) and total order of the layers through a qualitative index fuzzy quantization method, and takes the single-order (weights) and total order as objective (multi-index) multi-scheme optimization decisions.

Further, the automatic clustering includes:

(1) Taking a result set obtained by respectively carrying out weighted summation on each input characteristic as a sample of a machine learning algorithm;

(2) Randomly generating N clustering centers;

(3) Dividing the samples into N clusters according to the distances between the samples and the centroids of the clustering centers;

(4) Judging whether each cluster has a change, if so, readjusting each cluster center; if not, outputting each cluster center, and taking each cluster center as a division basis.

Further, the machine learning algorithm employs a K-Means clustering algorithm.

The invention also discloses a blood donor evaluation system, which comprises a data source acquisition module, a data preprocessing module, a data analysis module, a level analysis module and a main operation program module which are sequentially connected in a communication way, wherein the data source acquisition module is used for acquiring a plurality of groups of data sources;

the data preprocessing module is used for carrying out data preliminary preprocessing on each group of data sources and acquiring each input characteristic of each group of data;

the data analysis module is used for calculating correlation coefficients among different input features;

the analytic hierarchy process module is used for calculating the weight corresponding to each input feature in each group of data and obtaining a result set of weighted summation;

the main operation program module is used for calling a machine learning algorithm, automatically clustering the weighted sum result set and grading the pulpers corresponding to each group of data.

Further, the system also comprises a storage module for storing the grading result of the pulper in real time.

The beneficial effects of the invention are as follows:

1. by adopting the plasma donor evaluation method and system provided by the invention, the correlation coefficient of each input characteristic of the plasma donor is analyzed, a basis is provided for the analytic hierarchy process to construct a judgment matrix, and the subjective speculation of people is avoided; each input feature is added with a weight through analytic hierarchy process, so that subjective assumption of human unilateral is avoided; and (3) carrying out cluster analysis by a machine learning algorithm, providing each cluster center as a standard of sizing agent classification, updating once a day, dynamically updating the sizing agent classification in real time every day, and carrying out different popularization for sizing agent users. The whole process is completely automatic based on data analysis and processing; by the method and the system, the pulp operators can be reasonably classified, the pulp stations or biological companies can pertinently expand the service of the pulp operators, and the change condition of each pulp operator class every day can be seen.

Drawings

FIG. 1 is a block diagram of the workflow of the plasma donor evaluation method provided by the present invention;

FIG. 2 is a schematic diagram of the workflow of the K-Means clustering algorithm in the plasma donor evaluation method provided by the present invention;

fig. 3 is a system architecture diagram of a donor evaluation system provided by the present invention.

Detailed Description

Embodiments of the present application are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar modules or modules having like or similar functions throughout. The embodiments described below by referring to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present application. On the contrary, the embodiments of the present application include all alternatives, modifications, and equivalents as may be included within the spirit and scope of the appended claims.

Example 1

In this embodiment, a plasma supplier evaluation method is specifically disclosed, by which a plasma supplier can be reasonably classified, a plasma station or a biological company can pertinently expand the service of the plasma supplier, and can see the change condition of each plasma supplier class every day, as shown in fig. 1, and the method comprises the following steps when in specific application:

1. and (3) data source acquisition: original data are extracted from the database in a targeted manner to obtain a plurality of groups of data sources, and partial data of the data sources are shown in the following table 1:

1	name	sex	age	doctime	regdate	donatetimes	donortype
								2	siberian cocklebur good	1	43	29/8/2019	23/9/2019	2	1
3	To snow	1	31	8/10/2019	8/10/2019	1	3
								4	Wang Dadong	1	27	26/8/2019	26/8/2019	1	3
5	Lin Xiao	2	27	26/8/2019	26/8/2019	1	3
								6	Wang Aiguo	1	55	26/8/2019	26/8/2019	1	3
7	Li Liangui	1	31	26/8/2019	26/8/2019	1	3
								8	Li Jianguo	1	55	26/8/2019	27/8/2019	1	3
9	Wang Xiaoxiao	2	27	26/8/2019	26/8/2019	1	3
								10	Lin Qian	2	23	26/8/2019	26/8/2019	1	3
11	Page Liang Hui	1	41	26/8/2019	26/8/2019	1	3
								12	Cheng Qian diving	2	39	26/8/2019	26/8/2019	1	3
13	Li Liangui	1	31	26/8/2019	27/8/2019	2	1
								14	Page Liang Hui	1	41	26/8/2019	27/8/2019	2	1
15	Is of the type of being able to stop the flow of yang	2	36	29/8/2019	23/9/2019	1	3
								16	Wang Dadong	1	27	26/8/2019	27/8/2019	2	1

In this step, the SQL query program is adopted to select original data in the database in a targeted manner, and the result is stored in a result table and saved as a CSV file.

2. Preliminary pretreatment of data: and carrying out data preliminary preprocessing on each group of data sources to acquire each input characteristic of each group of data. In this embodiment, each input feature in each set of data includes a feed frequency, a potential feed time, a type of member, and a gender, where the feed frequency is determined from the number of feeds recorded in the data source.

The potential pulp donation time is calculated by the following way: statistical age of [18,55 ]]What we pay attention to is the potential future pulp donation time of the pulper, such as: x is x ₁ ,x ₂ ,…,x _n Representing the age of n seroators, we used 55-x ₁ ,55-x ₂ ,…,55-x _n Representing potential time spent by the pulper.

The types of the sizing agents are divided into fixed sizing agents and non-fixed sizing agents, the original field meaning is reserved for the fixed sizing agents or not, the number 1 represents the fixed sizing agents, and the contribution rate of the fixed sizing agents to the sizing agent grade is higher; the number 0 represents a non-stationary pulper. Finally, the definition label for fixed sizing agent is 1 and the definition label for non-fixed sizing agent is 0.

The gender was treated as follows: according to the proportion of the number of pulp donations of men and women, and according to the historical data, a pie chart of the number of pulp donations of men and women is counted, wherein the proportion is as follows: 0.348:0.652. So, the male and female labels are respectively denoted by 0.348 and 0.652, namely the label of male sex is defined as 0.348, and the label of female sex is defined as 0.652, but the data is dynamic and also shows real-time property of the data.

The data after preliminary pretreatment are shown in table 2 below:

1	frq	future_time	after_deal_donortype	after_deal_sex
					2	0.01886792	13	1	0.348
3	0.01515152	25	0	0.348
					4	0.00917431	29	0	0.348
5	0.00917431	29	0	0.652
					6	0.00917431	1	0	0.348
7	0.00917431	25	0	0.348
					8	0.00917431	1	0	0.348
9	0.00917431	29	0	0.652
					10	0.00917431	33	0	0.652
11	0.00917431	15	0	0.348
					12	0.00917431	17	0	0.652
13	0.01834862	25	1	0.348
					14	0.01834862	15	1	0.348
15	0.00943396	20	0	0.652
					16	0.01834862	29	1	0.348

3. performing correlation analysis to calculate correlation coefficients between different input features, wherein in the embodiment, pulp donation frequency is used as the most direct standard in the grading of pulp staff, the higher the pulp donation frequency is, the better the pulp staff is, and the higher the grade is considered, so that the correlation between other input features and pulp donation frequency is calculated;

1) The proportion of men and women in the people who donate the pulp is as follows: 0.348:0.652, the correlation coefficient between gender and pulp donation frequency of the pulper is: 0.0023566340412670616;

2) The correlation coefficient between whether a fixed pulper and the pulp donation frequency of the pulper is: 0.6550817781382937;

3) The correlation coefficient between potential pulp donation time and pulp donation frequency of a pulper is: 0.06690809987447563;

in the step, the correlation coefficient can be calculated by adopting a CORREL function in excel, and any two groups of data are taken as input.

4. Hierarchical analysis: according to the obtained correlation coefficients, calculating the weights corresponding to the input features in each group of data through a hierarchical analysis method; the analytic hierarchy process is a system method which takes a complex multi-objective decision problem as a system, decomposes an objective into a plurality of objectives or criteria, further decomposes the objectives into a plurality of layers of multi-indexes (or criteria and constraints), calculates single-order (weights) and total order of the layers through a qualitative index fuzzy quantization method, and takes the single-order (weights) and total order as objective (multi-index) multi-scheme optimization decisions.

The analytic hierarchy process is to decompose the decision problem into different hierarchical structures according to the sequence of the total target, the sub-targets of each layer and the evaluation criteria until a specific spare power switching scheme, then to calculate the priority weight of each element of each layer to a certain element of the previous layer by solving the matrix feature vector, and finally to merge the final weight of each alternative scheme to the total target in a hierarchical manner by a weighted sum method, wherein the final weight with the largest weight is the optimal scheme.

The analytic hierarchy process is more suitable for the target system with hierarchical staggered evaluation indexes, and the target value is difficult to quantitatively describe, and the specific steps of applying the analytic hierarchy process in the embodiment are as follows:

(a) Constructing a judgment matrix: the correlation coefficients obtained in the step 3, whether the correlation coefficients are fixed pulp men, the population proportion of pulp donation, the sex and the potential pulp donation time are influence factors, wherein the potential pulp donation time belongs to objective factors, the potential pulp donation time is placed in front of the objective factors, a judgment matrix is established, the judgment matrix method is an improvement of a relative comparison method and also belongs to an experience scoring method, all indexes are listed to form an N multiplied by N square matrix, the indexes are compared and scored pairwise, and finally the scores of the indexes are summed and normalized. The method for establishing the judgment matrix is already described in the existing literature, and is not described here in detail, and the judgment matrix in this embodiment is as follows:

(b) Analyzing the consistency of the judgment matrix, and if the consistency is not met, re-producing the judgment matrix; if the consistency is satisfied, the weight of each input feature is calculated as follows:

(b1) Eigenvalues (taking the real part): [4.16540439, -0.1115912, -0.02690659, -0.02690659]

(b2) Eigenvector (taking part):

[[0.79200584，-0.85345804，0.68048194，0.68048194]

[0.28327861，-0.13188751，-0.36522767，-0.36522767]

[0.53691597，0.50279034，0.12210932，0.12210932]

[0.06481678，0.03764203，0.01696366，0.01696366]]

(b3) The maximum characteristic value is 4.165404389255812

The corresponding feature vectors are:

[0.79200584，0.28327861，0.53691597，0.06481678]

(b4) Computing a consistency index CI

CI＝(λ _max -1)/(n-1)

Wherein lambda is _max Represents the maximum feature value, and n represents the feature quantity.

In order to measure the consistency of different judgment matrixes, an average randomness consistency index RI of the judgment matrixes is introduced, and RI values adopted by the judgment matrixes in consistency detection are as follows: 0.9,

calculating a random consensus ratio CR

CR＝CI/RI

If the random uniformity ratio CR <0.1 is calculated, the matrix is considered to have satisfactory uniformity.

The CR value of the judgment matrix in this example is: 0.06126088490956005

Consistency is satisfied through consistency test;

the weight matrix of the level index is as follows:

[0.4722705521131915,0.16891813164079006,0.32016128036588176,0.03865003588013668]。

5. normalization processing and weighted summation: normalization processing is carried out on each input feature, and weighted summation is carried out on each input feature in each group of data, specifically as follows:

normalization is performed to change data of different orders of magnitude to the same order of magnitude, eliminating the effect of the orders of magnitude, as shown in Table 3 below:

based on the normalized table, each feature of each row is multiplied by a weight corresponding to the feature, each input feature has different contribution degrees to the division of the pulp player grade, each column in the table 2 is multiplied by a corresponding weight according to the weight obtained in the step 4, and then partial data is obtained by summing the rows, as shown in the table 4:

6. and (3) cluster analysis: the result set of weighted summation is automatically clustered by a machine learning algorithm, which in this embodiment, as shown in fig. 2, comprises the steps of:

(2) Randomly generating four cluster centers, and extracting the mass centers of the cluster centers;

(3) According to the distance between the sample and the mass center of each clustering center, the sample is divided into four clusters, the K-Means algorithm in unsupervised learning is a clustering algorithm, namely, according to a similarity principle, data objects with higher similarity are divided into the same type of clusters, and data objects with higher dissimilarity are divided into different types of clusters;

(4) Judging whether each cluster has a change, if so, readjusting each cluster center; if not, outputting each clustering center; through the step, the grading of the pulpers can be updated in real time and the grading is based on each clustering center.

7. Grading: defining four classes by a K-Means clustering algorithm in unsupervised machine learning, extracting each clustering center in the step 6 as a division basis, classifying the serous staff corresponding to each group of data into four classes A, B, C, D, and finally obtaining the class corresponding to each serous staff, wherein the division result of part of data is shown in the following table 5:

example 2

On the basis of the plasma donor evaluation method provided in embodiment 1, as shown in fig. 3, in this embodiment, a plasma donor evaluation system is also disclosed, where the system includes a data source acquisition module, a data preprocessing module, a data analysis module, a hierarchical analysis module and a main operation program module that are sequentially connected in communication, and the main operation program module is connected with a storage module, where the data source acquisition module is used to acquire multiple groups of data sources in a database in a targeted manner;

the data preprocessing module is used for carrying out data preliminary preprocessing on each group of data sources and acquiring each input characteristic of each group of data, wherein each input characteristic comprises pulp donation frequency, potential pulp donation time, pulp member type and gender;

the data analysis module is used for calculating correlation coefficients among different input features, in the embodiment, pulp donation frequency is taken as a main factor, and the correlation coefficients among potential pulp donation time, pulp member type, gender and pulp donation frequency are calculated respectively;

the analytic hierarchy process module is used for calculating the weight corresponding to each input feature in each group of data, respectively carrying out weighted summation on each input feature in each group of data, and finally obtaining a weighted summation result set;

the main operation program module is used for calling a machine learning algorithm, the machine learning algorithm is a K-Means algorithm in unsupervised learning, the result set of weighted summation is automatically clustered through the K-Means algorithm, the result set is defined into four types, each clustering center output by the K-Means algorithm is extracted, the grades of the serous staff corresponding to each group of data are classified, and in the embodiment, the grades of the serous staff are classified into A, B, C, D types.

The storage module is used for storing the grading result of the pulper in real time so as to check the grading condition of the pulper in real time and conduct corresponding popularization and propaganda.

Example 3

On the basis of the plasma donor evaluation method provided in embodiment 1, in this embodiment, a readable storage medium is specifically provided, where one or more programs are stored, and the one or more programs may be executed by one or more processors, so as to implement the plasma donor evaluation method described in embodiment 1 above, so as to implement grading of the plasma donors, and perform adaptive popularization for the grading situation.

It should be noted that in the description of the present application, the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. Furthermore, in the description of the present application, unless otherwise indicated, the meaning of "plurality" means at least two.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and further implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the embodiments of the present application.

It is to be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.

Those of ordinary skill in the art will appreciate that all or a portion of the steps carried out in the method of the above-described embodiments may be implemented by a program to instruct related hardware, where the program may be stored in a computer readable storage medium, and where the program, when executed, includes one or a combination of the steps of the method embodiments.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing module, or each unit may exist alone physically, or two or more units may be integrated in one module. The integrated modules may be implemented in hardware or in software functional modules. The integrated modules may also be stored in a computer readable storage medium if implemented in the form of software functional modules and sold or used as a stand-alone product.

The above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, or the like.

In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

Although embodiments of the present application have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the application, and that variations, modifications, alternatives, and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the application.

Claims

1. A method of donor evaluation, the method comprising:

extracting a plurality of groups of data sources;

classifying the pulpers corresponding to each group of data by taking the output of the automatic clustering as a classification basis;

the analytic hierarchy process comprises:

(b) Analyzing the consistency of the judgment matrix, and if the consistency is not met, re-producing the judgment matrix; if the consistency is met, calculating to obtain the weight of each input characteristic;

the method for analyzing the consistency of the judgment matrix comprises the following steps:

the randomness agreement ratio CR is calculated as follows:

CR＝CI/RI；

CI＝(λ _max -1)/(n-1)；

wherein CR represents a random uniformity ratio, CI represents a uniformity index, RI represents an average random uniformity index, lambda _max Representing the maximum characteristic value, and n represents the characteristic quantity;

if CR <0.1, then the consistency is considered satisfactory; otherwise, consider that the consistency is not satisfied;

the automatic clustering includes:

(2) Randomly generating N clustering centers;

(4) Judging whether each cluster has a change, if so, readjusting each cluster center; if not, outputting each cluster center, and taking each cluster center as a division basis;

the input characteristics of each set of data comprise pulp donation frequency, potential pulp donation time, pulp operator type and gender;

the pulp donation frequency is taken as a main factor, and correlation coefficients among the potential pulp donation time, the type of the pulp player and the gender and the pulp donation frequency are calculated.

2. The method according to claim 1, wherein the plasma donor types are classified into fixed plasma donors and non-fixed plasma donors, the definition label for fixed plasma donors is 1, and the definition label for non-fixed plasma donors is 0.

3. The method according to claim 1, wherein the sex is defined by a tag for each sex according to a ratio of the number of plasma donations of a male and a female.

4. The plasma donor evaluation method of claim 1, wherein the machine learning algorithm employs a K-Means clustering algorithm.

5. The blood donor evaluation system is characterized by comprising a data source acquisition module, a data preprocessing module, a data analysis module, a level analysis module and a main operation program module which are sequentially in communication connection, wherein the data source acquisition module is used for acquiring a plurality of groups of data sources;

the main operation program module is used for calling a machine learning algorithm, automatically clustering the weighted sum result set and grading the pulpers corresponding to each group of data;

the analytic hierarchy process module performs the following steps:

the randomness agreement ratio CR is calculated as follows:

CR＝CI/RI；

CI＝(λ _max -1)/(n-1)；

the main operation program module executes the following steps when carrying out automatic clustering:

(2) Randomly generating N clustering centers;

6. The donor evaluation system as recited in claim 5, further comprising a storage module for storing the grading result of the plasma member in real time.