CN111243677B - Time control method and system based on cell quality assurance - Google Patents
Time control method and system based on cell quality assurance Download PDFInfo
- Publication number
- CN111243677B CN111243677B CN202010016064.6A CN202010016064A CN111243677B CN 111243677 B CN111243677 B CN 111243677B CN 202010016064 A CN202010016064 A CN 202010016064A CN 111243677 B CN111243677 B CN 111243677B
- Authority
- CN
- China
- Prior art keywords
- data
- operation data
- rsd
- clustering
- cluster
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
- G16B40/30—Unsupervised data analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B50/00—ICT programming tools or database systems specially adapted for bioinformatics
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/02—Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]
Abstract
The invention provides a time control method and a time control system based on cell quality assurance, wherein the method comprises the steps of monitoring actual operation data c of an operation step in the process of cell culture by experimenters; the actual operation data c and the corresponding experience operation data range [ c ] 1 ,c 2 ]Making a comparison, c 2 Greater than c 1 (ii) a When c is in [ c ] 1 ,c 2 ]In the range, or c < c 1 Or c 2 <c<1.3c 2 If so, outputting a prompt for continuing the next operation step; and outputting actual operation data continuously monitored in other operation steps and corresponding experience operation data range [ c ] 1 ,c 2 ]A prompt to compare; and c is to 2 <c<1.3c 2 Highlighting actual operation data; when c is more than or equal to 1.3c 2 Outputting a prompt for terminating the experiment; the method and the system realize the control of the operation time of experimenters according to the acquired empirical operation data range, thereby improving the comprehensive index of the cultured cells and ensuring the cell quality.
Description
Technical Field
The invention belongs to the technical field of arrangement or management of medical care resources or facilities, and particularly relates to a time control method and a time control system based on cell quality assurance.
Background
In recent years, cell therapy techniques have become a focus of research. Cell preparations are continuously entering clinical research stage as novel medicines, including some immune cell preparations and stem cell preparations, etc. However, unlike conventional biological products, cell products are unique in terms of production process and quality control. The production process of the cell product comprises cell culture and cell preparation configuration, and the influence of the cell culture process on the quality of the cell preparation is important. The production process of cell culture comprises the links of sample storage, recovery, cell collection, separation, purification, passage, freezing storage and the like. The stability of the preparation is affected by various production process steps, parameters and operation of operators in the preparation process. Under the condition that laboratory conditions are ensured to meet GMP (good manufacturing practice) related requirements, in order to ensure the consistency and stability of experimenters in the experimental process, the training of the experimenters is strengthened at regular intervals according to the GMP related requirements, the time length used by each experimenters in the operation step is required to be recorded in the experimental process, the comprehensive indexes of cells cultured by the same experimenters in different operation time lengths under the same environment have differences, and the consistency and the stability of the cells also have differences; in the prior art, a specific value is given to the operation duration in order to standardize the operation of experimenters, but the operation duration and proficiency of different experimenters are different, and the operation of fixed duration is not beneficial to ensuring the cell quality. Therefore, if the empirical operation data range is obtained by collecting the historical operation data during the cell culture process, and then the actual operation data is compared with the empirical operation data range to realize the time control method for ensuring the cell quality, the problem which needs to be solved at present is urgently needed.
Disclosure of Invention
In order to solve the technical problems, the invention provides a time control method and system based on cell quality assurance.
One technical scheme of the invention provides a time control method based on cell quality assurance, which comprises the following steps:
monitoring actual operation data c of an operation step in the process of cell culture by an experimenter;
the actual operation data c and the corresponding experience operation data range [ c ] 1 ,c 2 ]Carrying out a comparison of c 2 Greater than c 1 ;
When c is in [ c 1 ,c 2 ]In the range, or c < c 1 Or c 2 <c<1.3c 2 If so, outputting a prompt for continuing the next operation step; and outputting actual operation data continuously monitored in other operation steps and corresponding experience operation data range [ c ] 1 ,c 2 ]A prompt to compare; and c is to 2 <c<1.3c 2 Highlighting actual operation data; when c is more than or equal to 1.3c 2 And outputting a prompt for terminating the experiment.
Another aspect of the present invention provides a time control system based on cell quality assurance, the time control system comprising:
a timing monitoring module: the timing monitoring module is configured to monitor actual operation data c of a certain link in the process of cell culture performed by an experimenter;
a comparison module configured to compare actual operational data c with a corresponding empirical operational data range [ c ] 1 ,c 2 ]Making a comparison, c 2 Greater than c 1 ;
An output module configured to output a signal when c is at [ c ] 1 ,c 2 ]In the range, or c < c 1 Or c 2 <c<1.3c 2 If so, outputting a prompt for continuing the next operation step; and outputting actual operation data continuously monitored in other operation steps and corresponding experience operation data range [ c ] 1 ,c 2 ]A prompt to compare; and c is to 2 <c<1.3c 2 Highlighting actual operation data; when c is more than or equal to 1.3c 2 And outputting a prompt for terminating the experiment.
According to the time control method and system based on cell quality assurance, provided by the invention, experimenters are subjected to cell culture historical data acquisition, processing, clustering and mining, then corresponding experience operation data ranges are selected for all steps of cell culture according to the indexes of the experimenters aiming at different experimenters, then the acquired actual operation data of a certain step of the experimenters are compared with the corresponding experience operation ranges, when the acquired actual operation data are within the ranges or slightly exceed the ranges, the results are highlighted, and when the acquired operation data are seriously beyond the experience operation ranges, a prompt symbol for stopping experiments is output, so that the operation time of the experimenters is controlled, and the comprehensive indexes of cultured cells are improved.
Drawings
FIG. 1 is a flow chart of a method of time control based on cell quality assurance;
FIG. 2 is a flow chart of an example of a method of time control based on cell quality assurance;
FIG. 3 shows the corresponding empirical operating data range [ c ] 1 ,c 2 ]A flowchart of the acquisition method of (1);
FIG. 4 is a flow chart of a method of pre-processing missing data;
FIG. 5 is a flow chart of a method of preprocessing exception data;
FIG. 6 is a flow chart of mining association rules using the Apriori algorithm;
FIG. 7 is a block diagram of a cell quality assurance-based time control system;
FIG. 8 is a block diagram of a structure of an empirical operation data range acquisition module.
The steps illustrated in the flow charts of the figures may be performed in a computer system such as a set of computer-executable instructions. Although a logical order is shown in the flow diagrams, in some cases, the steps described may be performed in an order different than here.
Detailed Description
Since the method of the present invention is described as being implemented in a computer system, the computer system may be provided in a processor of a server or a client. For example, the methods described herein may be implemented as software executable with control logic that is executed by a CPU in a server. The functionality described herein may be implemented as a set of program instructions stored in a non-transitory tangible computer readable medium. When implemented in this manner, the computer program comprises a set of instructions which, when executed by a computer, cause the computer to perform a method capable of carrying out the functions described above. Programmable logic may be temporarily or permanently installed in a non-transitory tangible computer-readable medium, such as a read-only memory chip, computer memory, disk, or other storage medium. In addition to being implemented in software, the logic described herein may be embodied using discrete components, integrated circuits, programmable logic for use in conjunction with a programmable logic device such as a field programmable gate array, FPGA, or microprocessor, or any other device including any combination thereof. All such implementations are within the scope of the present invention.
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1, one embodiment of the present invention provides a time control method based on cell quality assurance, including:
(1) Monitoring actual operation data c of an operation step in the process of cell culture by an experimenter;
(2) The actual operation data c and the corresponding experience operation data range [ c ] 1 ,c 2 ]Making a comparison, c 2 Greater than c 1 ;
(3) When c is in [ c ] 1 ,c 2 ]In the range, or c < c 1 Or c 2 <c<1.3c 2 If so, outputting a prompt for continuing the next operation step; and outputting actual operation data continuously monitored in other operation steps and corresponding experience operation data range [ c ] 1 ,c 2 ]A prompt to compare; and c is to 2 <c<1.3c 2 Temporal actual operational data highlightingShown in the specification; when c is more than or equal to 1.3c 2 And outputting a prompt for terminating the experiment.
As shown in fig. 2, the above comparison process is illustrated:
operating on a corresponding empirical data range [ c ] 1 ,c 2 ]Setting a timing time, wherein the operation step is step one, setting an empirical operation data range [ c ] 1 ,c 2 ]Is [20min,25min]Setting the time of a timer to be 25min, carrying out the first step by an experimenter, starting countdown of the timer, finishing timing, finishing the operation of the first step by the experimenter, sending a prompt for continuing to operate the second step, and then according to the corresponding operation data range [ c ] of the second step 1 ,c 2 ]Setting a timer, continuing monitoring and comparing, and repeating the steps (1) and (2); when the timing is over, the operation of the experimenter is not over, the recording is continued until the operation is over, the time is counted, if the actual operation time =28min < 32.5min, although the actual operation time exceeds the range, the actual operation time does not seriously exceed the range, a prompt for continuing to operate the step two is sent, and then the operation data range [ c ] corresponding to the step two is checked 1 ,c 2 ]Setting a timer, and continuing monitoring and comparing; highlight 28 min. If the actual operation time =35min > 32.5min and seriously exceeds the range, a prompt for stopping the experiment is sent out, so that the reasonable control of the operation time of experimenters is realized, and the quality of cultured cells is ensured.
As shown in FIG. 3, in some preferred embodiments, the present invention further provides a corresponding empirical operating data range [ c ] 1 ,c 2 ]The method comprises the following steps:
10 Collecting operational data during the cell culture process;
the acquired operation data is historical operation data of an experimenter in a cell culture process; the operation data comprises an index value of an experimenter and operation time of each step; wherein, the operation data in each record comprises an index value of an experimenter and the operation time of each step in the cell culture process; the index value of the experimenter is the ratio of the operating age to the actual age of the experimenter; because the operation age and the actual age of the experimenters have different degrees of influence on the cell culture process, in order to ensure the quality of cultured cells, the time used in the culture process of different experimenters is accurately controlled, and the operation time length of different experimenters is reasonably selected, so that when operation data is collected, the operation age and the actual age of the experimenters need to be collected to obtain the index value of the experimenters;
20 Preprocessing missing data and abnormal data in the acquired operation data;
missing data exists in real data acquired in the experimental process, and the missing data needs to be processed in order to improve the accuracy of subsequent data processing.
As shown in fig. 4, in some preferred embodiments, the method for preprocessing the missing data comprises the following steps:
210 Calculating the weight occupied by each operation data;
the specific method for calculating the weight occupied by each operation data comprises the following steps:
firstly, subjectively assigning operation data, and then carrying out weight self-learning updating through a Bayesian network; see Huwenbin et al, in journal articles "research on weight self-learning method based on Bayesian network". And calculating the weight of the operation data, and judging how to process the missing data according to the weight, so that the accuracy of the operation data processing is improved.
211 ) the specific gravity delta of the missing data in each record is counted,wherein, Z 1 The number of missing data; z is the total number of all data recorded in the strip;
212 A specific gravity δ and a specific gravity threshold δ 1 Comparing, when delta is larger than or equal to delta 1 When the record is deleted, deleting the record; determined by research, delta 1 =0.35; when the proportion of the missing data exceeds 0.35, if the missing data is filled, the accuracy of subsequent processing is affected, so that the data processing efficiency is reduced by over 9 percent; therefore, records with more than 0.35% missing data items need to be deleted;
213 When delta < delta 1 And if not, filling the missing data by using the operation data which has the same index as the experimenter in the record and corresponds to the missing data.
In some preferred embodiments, the padding data of the missing data provided by the present invention is calculated according to the following formula:
wherein n represents the number of records which is the same as the index value of the experimenter containing the missing data records; x is the number of i Representing operation data corresponding to missing data in the ith record;representing an average of the operational data corresponding to the missing data within the n records; 1 in n +1 represents 1 record containing missing data; />Indicates operation data corresponding to missing data and operation data x corresponding to the missing data in n records n+1 Average value of (d);
the following illustrates the padding process for missing data:
taking the cell collection operation duration as an example, the cell collection operation durations in n =5 records, which are the same as the experimenter index of the missing data record, are obtained, and are 9min,10min,16 min and 17min, respectively, and according to the 5 data, the calculation process of filling data for the item lacking the cell collection operation duration is as follows:
Calculated standard deviation SD =3.56
Calculating RSD 5 =0.278
Calculating x 6 And ≈ 17.8, the missing data in record 6 is 17.8.
Missing data is filled by the method, so that the precision of the existing data is not influenced by the data added with the missing data; and further the accuracy of data processing is obviously improved.
Abnormal data may exist in the real data acquired in the experimental process, and the abnormal data needs to be processed, so that the accuracy of subsequent data processing is improved.
As shown in fig. 5, in some preferred embodiments, the present application further provides a method for preprocessing exception data, the method comprising the steps of:
220 Clustering all the records according to the indexes of experimenters by utilizing a K-Means clustering algorithm to obtain L clustering clusters;
the K-Means clustering algorithm is also called K-Means clustering, and comprises the following steps:
(1) First, some classes are selected and their respective center points are randomly initialized. The center point is the same location as the length of each data point vector.
(2) The distance of each data point to the center point is calculated, and the class to which the data point is closest to which center point is classified.
(3) The center point in each class is calculated as the new center point.
(4) The above steps are repeated until the center of each class does not change much after each iteration.
221 Computing RSD of all operation data of corresponding items in each cluster and comparing the RSD with a threshold value RSD 1 Comparing, when RSD is larger than or equal to RSD 1 Judging that abnormal data exist;
222 When abnormal data exists, calculating the average value of all operation data of the corresponding item in each cluster, and then calculating the RSD from the average value k to all operation data t ;
223 When RSD t <RSD 1 Calculating the RSD of all the operation data from the average value t + a t+a A > 0, until the calculated RSD t+a =RSD 1 Stopping the calculation; when RSD t Greater than RSD 1 Calculating the RSD of all the operation data from the average t-a distance t-a Until RSD is calculated t-a =RSD 1 Stopping the calculation;
224 Judging whether the weight of the abnormal data which is not within the distance of t or t +/-a is large, if so, deleting the corresponding item; if not, using RSD 1 And correcting abnormal data corresponding to the record.
30 Performing clustering processing on the preprocessed operation data to form L clustering clusters;
because the change amplitude of the index value of the experimenter is large, and the effect of the general cell comprehensive index is better along with the increase of the index value of the experimenter, all records are clustered according to the index of the experimenter by utilizing a K-Means clustering algorithm to obtain L clustering clusters.
Since the influence of slight change of the operation duration on the cell indexes is obvious, clustering is firstly carried out on each operation data in each clustering cluster by utilizing a K-Means clustering algorithm to form g eb Individual cluster of sub-clusters, g eb Representing the number of clustering sub-clusters corresponding to the b-th item of operation data of the e-th clustering cluster; l, determining the duration range of each operation data of the clustering sub-cluster by using an equal-width discrete method;
the specific method of the uniform-width discretization method of the b-th operation data of the e-th clustering cluster is as follows;
dividing widthWherein, C ebmax B-th operation for representing e-th clusterTaking the maximum value of the data; c ebmin Represents the minimum value of the operation data of the b th item in the e th cluster, and is combined with the operation data of the b th item in the h cluster>Represents the average value of the b-th operation data of the e-th cluster.
By using the method, the indexes of the experimenters and the operation data are clustered respectively, so that errors caused by subjective classification are reduced, and the accuracy of subsequent data processing is improved.
40 Performing association rule mining on the operation data in each cluster to form a frequent item set;
association rule (association rule): is an implication expression of the form X → Y, where X and Y are disjoint sets of terms, namely:the strength of an association rule may be measured in terms of its support (support) and confidence (confidence).
The support degree is as follows: for item set X, setFor the number of X contained in the set D, | D | represents the total number of item sets in the set D; the support of item set X is: />
the support degree of the association rule R is the number count (X # Y) of the set D containing X and Y at the same time; namely:
the confidence level represents the probability of one data appearing after another data appears, or the conditional probability of the data. The confidence of the association rule R is the ratio of the number containing X and Y to the number containing X, i.e.:
the Apriori algorithm is a representative algorithm for Association rule mining (Association rule mining).
The Apriori algorithm comprises the following specific operation steps:
inputting: a data set D, a support degree threshold value alpha;
and (3) outputting: the largest frequent k term set;
1) Scanning the whole data set to obtain all the appeared data as a candidate frequent 1 item set; k =1, and the set of 0 frequent terms is an empty set.
2) And mining a frequent k item set.
a) Scanning data to calculate the support degree of a candidate frequent k item set;
b) And removing the data set with the support degree lower than the threshold value in the candidate frequent k item set to obtain a frequent k item set. And if the obtained frequent k item set is empty, directly returning the set of the frequent k-1 item set as an algorithm result, and ending the algorithm. If the obtained frequent k item set has only one item, directly returning the set of the frequent k item set as an algorithm result, and ending the algorithm;
c) Based on the frequent k item set, generating a candidate frequent k +1 item set in a connected mode;
3) Let k = k +1, proceed to step 2.
As shown in fig. 6, the association rule is mined by Apriori algorithm in the present invention, and the specific method is as follows:
410 Using the operation data in each cluster as a candidate set, wherein L candidate sets are required to be subjected to association rule mining, calculating the support degree of each operation data in the candidate set, and removing items with the support degree smaller than a support degree threshold value to obtain a frequent 1 item set;
411 Connect the frequent 1 item set, obtain the candidate 2 item set, find the support degree is greater than the 2 items of the support degree threshold value, form the frequent 2 item set, so on, until the frequent k item set is empty, return the set of the frequent k-1 item set directly to cooperate as the frequent item set;
50 Computing the confidence coefficient of each subset in the frequent item set, wherein the frequent item set with the confidence coefficient larger than a threshold value forms a strong association rule;
60 According to strong association rules, determining the empirical operation data range [ c ] corresponding to the cell culture step of the experimenter 1 ,c 2 ]。
Wherein the step 60) comprises the following steps:
judging the number of strong association rules, and selecting the record with the highest cell comprehensive index in the strong association rules as the strong association rules when the number of the strong association rules is more than 1;
selecting a strong association rule corresponding to the experimenter index of the experimenter to be experimented as an experience operation data range [ c ] corresponding to the experimenter to carry out the cell culture step 1 ,c 2 ]。
According to the method provided by the invention, through collecting, processing, clustering and mining the cell culture historical data of experimenters, corresponding experience operation data ranges are selected for all steps of cell culture according to the indexes of the experimenters aiming at different experimenters, then the collected actual operation data of a certain step of the experimenters is compared with the corresponding experience operation ranges, when the operation data is within the range or slightly exceeds the range, the result is highlighted, and when the operation data seriously exceeds the experience operation range, a prompt symbol for stopping the experiment is output, so that the operation time of the experimenters is controlled, and the comprehensive indexes of cultured cells are improved.
As shown in fig. 7, another embodiment of the present invention provides a cell quality assurance-based time control system, including:
the timing monitoring module 1 is configured to monitor actual operation data c of a certain link in the cell culture process of an experimenter;
a comparison module 2 configured to compare actual operational data c with a corresponding empirical operational data range [ c [ ] 1 ,c 2 ]Carrying out a comparison of c 2 Is greater than c 1 ;
An output module 3 configured to output when c is [ c ] 1 ,c 2 ]In the range, or c < c 1 Or c 2 <c<1.3c 2 If so, outputting a prompt for continuing the next operation step; and outputting actual operation data continuously monitored in other operation steps and corresponding experience operation data range [ c ] 1 ,c 2 ]A prompt to compare; and c is to 2 <c<1.3c 2 Highlighting actual operation data; when c is more than or equal to 1.3c 2 And outputting a prompt for terminating the experiment.
The system provides comprehensive indexes and quality of cultured cells by controlling the operation time of experimenters.
As shown in fig. 8, in some preferred embodiments, the time control system further comprises an empirical operation data range acquisition module, the empirical operation data range acquisition module comprising:
a data acquisition sub-module 10 configured to acquire operational data during a cell culture process;
the acquired operation data is historical operation data of an experimenter in a cell culture process; the operation data comprises an index value of an experimenter and operation time of each step; wherein the operation data in each record comprises an index value of an experimenter, and the operation time of each step of cell culture is long; the index value of the experimenter is the ratio of the operating age to the actual age of the experimenter; because the operation age and the actual age of the experimenters have different degrees of influence on the cell culture process, in order to ensure the quality of cultured cells, accurately control the time used in the culture process of different experimenters, and reasonably select the operation time for different experimenters, the operation age and the actual age of the experimenters need to be collected when operation data are collected, and the index values of the experimenters are obtained;
a data preprocessing sub-module 20 configured to preprocess missing data and abnormal data within the acquired operation data;
missing data exists in real data acquired in the experimental process, and the missing data needs to be processed in order to improve the accuracy of subsequent data processing.
In some preferred embodiments, the method for preprocessing the missing data comprises the following steps:
calculating the weight occupied by each operation data;
the specific method for calculating the weight occupied by each operation data comprises the following steps:
firstly, subjectively assigning operation data, and then carrying out weight self-learning updating through a Bayesian network; see Huwenbin et al, in journal articles "research on weight self-learning method based on Bayesian network". And calculating the weight of the operation data, and judging how to process the missing data according to the weight, so that the accuracy of the operation data processing is improved.
The proportion delta occupied by the missing data in each record is counted,wherein Z is 1 The number of missing data; z is the total number of all items of the record;
the specific gravity delta is compared with a specific gravity threshold value delta 1 Comparing, when delta is larger than or equal to delta 1 When the record is deleted, deleting the record; determined by research, delta 1 =0.35; when the proportion of the missing data exceeds 0.35, if the missing data is filled, the accuracy of subsequent processing is affected, so that the data processing efficiency is reduced by over 9 percent; therefore, records with more than 0.35% missing data items need to be deleted;
when delta < delta 1 And if not, filling the missing data by using the operation data which has the same indexes as the experimenters in the record and corresponds to the missing data.
In some preferred embodiments, the padding data of the missing data provided by the present invention is calculated according to the following formula:
wherein n represents the number of records which is the same as the index value of the experimenter containing the missing data records; x is the number of i Representing operation data corresponding to missing data in the ith record;representing an average of the operational data corresponding to the missing data within the n records; 1 in n +1 represents 1 record containing missing data; />Indicates operation data corresponding to missing data and operation data x corresponding to the missing data in n records n+1 Average value of (d);
the following illustrates the padding process of missing data:
taking the cell collection operation duration as an example, the cell collection operation durations in n =5 records, which are the same as the experimenter index of the missing data record, are obtained, and are 9min,10min,16 min and 17min, respectively, and according to the 5 data, the calculation process of filling data for the item lacking the cell collection operation duration is as follows:
Calculated standard deviation SD =3.56
Calculating RSD 5 =0.278
Calculating x 6 17.8, so the 6 th entryThe data for the intra-missing data is 17.8.
Missing data is filled by the method, so that the precision of the existing data is not influenced by the data added with the missing data; and further the accuracy of data processing is obviously improved.
Abnormal data may exist in the real data acquired in the experimental process, and the abnormal data needs to be processed, so that the accuracy of subsequent data processing is improved.
In some preferred embodiments, the present application further provides a method for preprocessing exception data, the method comprising the steps of:
clustering all the records by using a K-Means clustering algorithm according to the indexes of the experimenters to obtain L clustering clusters;
the K-Means clustering algorithm is also called K-Means clustering, and comprises the following steps:
(1) First, some classes are selected and their respective center points are randomly initialized. The center point is the same length position as each data point vector.
(2) The distance of each data point to the center point is calculated, and the class to which the data point is closest to which center point is classified.
(3) The center point in each class is calculated as the new center point.
(4) The above steps are repeated until the center of each class does not change much after each iteration.
Calculating the RSD of all operation data of the corresponding item in each cluster and comparing the RSD with a threshold value RSD 1 Comparing, when RSD is larger than or equal to RSD 1 Judging that abnormal data exists;
when abnormal data exists, calculating the average value of all operation data of corresponding items in each cluster, and then calculating the RSD from the average value k to all operation data t ;
When RSD t <RSD 1 Calculating RSD of all operation data from the average value t + a t+a A > 0, up to the calculated RSD t+a =RSD 1 Stopping the calculation; when RSD t Greater than RSD 1 Time, calculateRSD of all operational data from the mean t-a distance t-a Until RSD is calculated t-a =RSD 1 Stopping the calculation;
judging whether the weight of the abnormal data which is not within the distance t or t +/-a is large, and if so, deleting the corresponding item; if not, using RSD 1 And correcting abnormal data corresponding to the record.
A data clustering sub-module 30 configured to cluster the preprocessed operation data to form L clusters;
because the change amplitude of the index value of the experimenter is large, and the effect of the general cell comprehensive index is better along with the increase of the index value of the experimenter, all records are clustered according to the index of the experimenter by utilizing a K-Means clustering algorithm to obtain L clustering clusters.
Since the influence of slight change of the operation duration on the cell indexes is obvious, clustering is firstly carried out on each operation data in each clustering cluster by utilizing a K-Means clustering algorithm to form g eb Individual cluster of sub-clusters, g eb Representing the number of clustering sub-clusters corresponding to the b-th item of operation data of the e-th clustering cluster; l, determining the duration range of each operation data of the clustering sub-clusters by using an equal-width discrete method;
the specific method of the uniform-width discretization method of the b-th operation data of the e-th clustering cluster is as follows;
dividing widthWherein, C ebmax Representing the maximum value of the b-th operation data of the e-th clustering cluster; c ebmin Represents the minimum value of the operation data of the b th item in the e th cluster, and is combined with the operation data of the b th item in the h cluster>Represents the average value of the b-th operation data of the e-th cluster.
By using the method, the indexes of the experimenters and the operation data are clustered respectively, so that errors caused by subjective classification are reduced, and the accuracy of subsequent data processing is improved.
An association rule mining sub-module 40 configured to perform association rule mining on the operation data in each cluster formed through clustering processing to form a frequent item set;
association rule (association rule): is an implication expression of the form X → Y, where X and Y are disjoint sets of terms, i.e.:the strength of an association rule may be measured in terms of its support (support) and confidence (confidence).
The support degree is as follows: for item set X, setFor the number of X contained in the set D, | D | represents the total number of item sets in the set D; then the support of item set X is: />
the support degree of the association rule R is the number count (X # Y) of the set D containing X and Y at the same time; namely:
the confidence level represents the probability of one data appearing after another, or the conditional probability of the data. The confidence of the association rule R is the ratio of the number containing X and Y to the number containing X, i.e.:
the Apriori algorithm is a representative algorithm for Association rule mining (Association rule mining).
The Apriori algorithm comprises the following specific operation steps:
inputting: a data set D, a support degree threshold value alpha;
and (3) outputting: the largest set of frequent k terms;
1) Scanning the whole data set to obtain all the appeared data as a candidate frequent 1 item set; k =1, and the set of 0 frequent terms is an empty set.
2) And mining a frequent k item set.
a) Scanning data to calculate the support degree of a candidate frequent k item set;
b) And removing the data set with the support degree lower than the threshold value in the candidate frequent k item set to obtain the frequent k item set. And if the obtained frequent k item set is empty, directly returning the set of the frequent k-1 item set as an algorithm result, and ending the algorithm. If the obtained frequent k item set has only one item, directly returning the set of the frequent k item set as an algorithm result, and ending the algorithm;
c) Based on the frequent k item set, generating a candidate frequent k +1 item set in a connected mode;
3) Let k = k +1 and proceed to step 2.
The invention utilizes Apriori algorithm to mine association rules, and the specific method is as follows:
taking the operation data in each cluster as a candidate set, wherein L candidate sets are required to be subjected to association rule mining, calculating the support degree of each operation data in the candidate set, and removing items with the support degree smaller than a support degree threshold value to obtain a frequent 1 item set;
connecting the frequent 1 item sets to obtain a candidate 2 item set, finding 2 items with the support degree larger than the support degree threshold value to form a frequent 2 item set, and repeating the steps until the frequent k item set is empty, and directly returning the set of the frequent k-1 item set to serve as a frequent item set;
a strong association rule forming sub-module 50 configured to calculate a confidence level for each subset in the frequent item set, the frequent item set having a confidence level greater than a threshold forming a strong association rule;
an empirical operation data range acquisition submodule 60, theThe experiment operation data range acquisition submodule is configured to determine the experiment operation data range [ c ] corresponding to the cell culture step of the person to be tested according to the strong association rule 1 ,c 2 ]。
The system provided by the invention determines the experience operation data range of experimenters according to historical operation data, and then compares the actual operation data with the experience operation data range, thereby realizing the control of the operation time of the experimenters and further improving the comprehensive index of cultured cells.
The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.
Claims (8)
1. A time control method based on cell quality assurance is characterized by comprising the following steps:
monitoring actual operation data c of an operation step in the process of cell culture by an experimenter;
the actual operation data c and the corresponding experience operation data range [ c ] 1 ,c 2 ]Carrying out a comparison of c 2 Is greater than c 1 ;
When c is in [ c ] 1 ,c 2 ]In the range, or c < c 1 Or c 2 <c<1.3c 2 If so, outputting a prompt for continuing the next operation step; and outputting actual operation data continuously monitored in other operation steps and corresponding experience operation data range [ c ] 1 ,c 2 ]A prompt to compare; and c is to 2 <c<1.3c 2 Highlighting actual operation data; when c is more than or equal to 1.3c 2 Outputting a prompt for terminating the experiment;
the corresponding empirical operating data range [ c ] 1 ,c 2 ]The acquisition method comprises the following stepsThe method comprises the following steps:
collecting operation data in the cell culture process, wherein the operation data comprises an index value of an experimenter and operation time of each step;
preprocessing missing data and abnormal data in the acquired operation data;
clustering the preprocessed operation data to form L clustering clusters;
performing association rule mining on the operation data in each cluster to form a frequent item set;
calculating the confidence coefficient of each subset in the frequent item set, wherein the frequent item set with the confidence coefficient larger than a threshold value forms a strong association rule;
determining the empirical operation data range [ c ] corresponding to the cell culture step of the personnel to be tested according to the strong association rule 1 ,c 2 ]。
2. The cell quality assurance-based time control method of claim 1, wherein the method of preprocessing the missing data comprises the steps of:
calculating the weight occupied by each operation data;
the proportion delta occupied by the missing data in each record is counted,wherein Z is 1 The number of missing data; z is the total number of all data recorded in the strip;
the specific gravity delta is compared with a specific gravity threshold value delta 1 Comparing, when delta is larger than or equal to delta 1 When the record is deleted, deleting the record;
when delta < delta 1 And if not, filling the missing data by using the operation data which has the same index as the experimenter in the record and corresponds to the missing data.
3. The cell quality assurance-based time control method of claim 2, wherein the padding data of the missing data is calculated according to the following formula:
wherein n represents the number of records which is the same as the index value of the experimenter containing the missing data records; x is the number of i Indicating operation data corresponding to the missing data in the ith record;representing an average of the operational data corresponding to the missing data within the n records; 1 in n +1 represents 1 record containing missing data; />Indicates operation data corresponding to missing data and operation data x corresponding to the missing data in n records n+1 Average value of (a).
4. The cell quality assurance-based time control method according to claim 1, wherein the method of preprocessing the abnormal data comprises the steps of:
clustering all the records by using a K-Means clustering algorithm according to the indexes of the experimenters to obtain L clustering clusters;
calculating RSD of all operation data of corresponding items in each cluster and comparing the RSD with a threshold value RSD 1 Comparing, when RSD is larger than or equal to RSD 1 Judging that abnormal data exist;
when abnormal data exist, calculating the average value of all operation data of corresponding items in each cluster, and then calculating the RSD of the distance k from the average value to all operation data t ;
When RSD t <RSD 1 Calculating the RSD of all the operation data from the average value t + a t+a A > 0, until the calculated RSD t+a =RSD 1 Stopping the calculation; when RSD t Greater than RSD 1 Calculating RSD of all operation data from the average t-a distance t-a Until RSD is calculated t-a =RSD 1 Stopping the calculation;
judging whether the weight of the abnormal data which is not within the distance t or t +/-a is large, and if so, deleting the corresponding item; if not, using RSD 1 And correcting abnormal data corresponding to the record.
5. The cell quality assurance-based time control method of claim 1, wherein clustering the preprocessed operational data includes clustering the experimenter indicators within the operational data using a K-Means clustering algorithm.
6. The cell quality assurance-based time control method of claim 1, wherein the clustering the preprocessed operation data comprises clustering the operation durations of the respective steps as follows:
clustering each operation data in each cluster by using a K-Means clustering algorithm to form g eb Individual cluster of sub-clusters, g eb Representing the number of clustering sub-clusters corresponding to the b-th item of operation data of the e-th clustering cluster; l, determining the duration range of each operation data of the clustering sub-cluster by using an equal-width discrete method; the specific method of the uniform width discretization method of the b-th operation data of the e-th clustering cluster is as follows;
7. The cell quality assurance-based time control method according to claim 1, wherein the association rule mining of the operation data within each cluster is association rule mining using Apriori algorithm.
8. A time control system based on cell quality assurance, the time control system comprising:
timing monitoring module: the timing monitoring module is configured to monitor actual operation data c of a certain link in the process of cell culture of an experimenter;
a comparison module configured to compare actual operational data c with a corresponding empirical operational data range [ c ] 1 ,c 2 ]Carrying out a comparison of c 2 Greater than c 1 ;
An output module configured to output a signal when c is at [ c ] 1 ,c 2 ]In the range, or c < c 1 Or c 2 <c<1.3c 2 If so, outputting a prompt for continuing the next operation step; and outputting actual operation data continuously monitored in other operation steps and corresponding experience operation data range [ c ] 1 ,c 2 ]A prompt to compare; and c is to 2 <c<1.3c 2 Highlighting actual operation data; when c is more than or equal to 1.3c 2 Outputting a prompt for terminating the experiment;
the time control system further comprises an empirical operation data range acquisition module, the empirical operation data range acquisition module comprising:
a data acquisition submodule configured to acquire operation data in a cell culture process, the operation data including an experimenter index value and operation durations of each step;
a data preprocessing sub-module configured to preprocess missing data and abnormal data within the obtained operation data;
the data clustering submodule is configured to perform clustering processing on the preprocessed operation data to form L clustering clusters;
the association rule mining submodule is configured to mine association rules for the operation data in each cluster to form a frequent item set;
a strong association rule forming sub-module configured to calculate a confidence of each subset in the frequent item set, the frequent item set having a confidence greater than a threshold forming a strong association rule;
an empirical operation data range acquisition submodule configured to determine an empirical operation data range [ c ] corresponding to a cell culture step performed by a person to be tested according to a strong association rule 1 ,c 2 ]。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010016064.6A CN111243677B (en) | 2020-01-07 | 2020-01-07 | Time control method and system based on cell quality assurance |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010016064.6A CN111243677B (en) | 2020-01-07 | 2020-01-07 | Time control method and system based on cell quality assurance |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111243677A CN111243677A (en) | 2020-06-05 |
CN111243677B true CN111243677B (en) | 2023-04-14 |
Family
ID=70877603
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010016064.6A Active CN111243677B (en) | 2020-01-07 | 2020-01-07 | Time control method and system based on cell quality assurance |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111243677B (en) |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2006230333A (en) * | 2005-02-28 | 2006-09-07 | Hitachi Medical Corp | Flow site meter, method for analyzing cell, cell-analyzing method, method for setting sensitivity of fluorescent light detector and method for setting standard gate in positive rate-judging method |
CN101675339A (en) * | 2007-04-16 | 2010-03-17 | 动量制药公司 | The method that relates to cell surface glycosylation |
EP3404090A1 (en) * | 2017-05-15 | 2018-11-21 | Eppendorf AG | Incubator, system and method for monitored cell growth |
WO2019018684A1 (en) * | 2017-07-21 | 2019-01-24 | The Board Of Trustees Of The Leland Stanford Junior University | Systems and methods for analyzing mixed cell populations |
WO2019051130A1 (en) * | 2017-09-06 | 2019-03-14 | uBiome, Inc. | Nasal-related characterization associated with the nose microbiome |
WO2019099716A1 (en) * | 2017-11-16 | 2019-05-23 | The United States Of America, As Represented By The Secretary, Department Of Health And Human Services | Clustering methods using a grand canonical ensemble |
CN110023759A (en) * | 2016-09-19 | 2019-07-16 | 血液学有限公司 | For using system, method and the product of multidimensional analysis detection abnormal cell |
JPWO2018092321A1 (en) * | 2016-11-21 | 2019-10-10 | オリンパス株式会社 | Method for analyzing reprogramming of somatic cells and method for creating quality evaluation criteria for iPS cells using the same |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100153082A1 (en) * | 2008-09-05 | 2010-06-17 | Newman Richard D | Systems and methods for cell-centric simulation of biological events and cell based-models produced therefrom |
WO2017075636A2 (en) * | 2015-10-28 | 2017-05-04 | Chiscan Holdings, Llc | Methods of cross correlation of biofield scans to enome database, genome database, blood test, and phenotype data |
-
2020
- 2020-01-07 CN CN202010016064.6A patent/CN111243677B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2006230333A (en) * | 2005-02-28 | 2006-09-07 | Hitachi Medical Corp | Flow site meter, method for analyzing cell, cell-analyzing method, method for setting sensitivity of fluorescent light detector and method for setting standard gate in positive rate-judging method |
CN101675339A (en) * | 2007-04-16 | 2010-03-17 | 动量制药公司 | The method that relates to cell surface glycosylation |
CN110023759A (en) * | 2016-09-19 | 2019-07-16 | 血液学有限公司 | For using system, method and the product of multidimensional analysis detection abnormal cell |
JPWO2018092321A1 (en) * | 2016-11-21 | 2019-10-10 | オリンパス株式会社 | Method for analyzing reprogramming of somatic cells and method for creating quality evaluation criteria for iPS cells using the same |
EP3404090A1 (en) * | 2017-05-15 | 2018-11-21 | Eppendorf AG | Incubator, system and method for monitored cell growth |
WO2019018684A1 (en) * | 2017-07-21 | 2019-01-24 | The Board Of Trustees Of The Leland Stanford Junior University | Systems and methods for analyzing mixed cell populations |
WO2019051130A1 (en) * | 2017-09-06 | 2019-03-14 | uBiome, Inc. | Nasal-related characterization associated with the nose microbiome |
WO2019099716A1 (en) * | 2017-11-16 | 2019-05-23 | The United States Of America, As Represented By The Secretary, Department Of Health And Human Services | Clustering methods using a grand canonical ensemble |
Non-Patent Citations (1)
Title |
---|
人源间充质干细胞质量鉴定体系初步构建;彭冬秀;《中国优秀硕士学位论文全文数据库》;20180312(第12期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN111243677A (en) | 2020-06-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Qin et al. | A machine learning methodology for diagnosing chronic kidney disease | |
CN111261282A (en) | Sepsis early prediction method based on machine learning | |
WO2020181805A1 (en) | Diabetes prediction method and apparatus, storage medium, and computer device | |
Karabulut et al. | Analysis of cardiotocogram data for fetal distress determination by decision tree based adaptive boosting approach | |
CN108847285B (en) | Down syndrome screening method for pre-pregnancy and mid-pregnancy based on machine learning | |
KR101756827B1 (en) | Biometric information based notification System and mehod for abnormality sign | |
KR20180044739A (en) | Method and apparatus for optimizing rule using deep learning | |
CN112837799B (en) | Remote internet big data intelligent medical system based on block chain | |
EP3422222A1 (en) | Method and state machine system for detecting an operation status for a sensor | |
CN113871009A (en) | Sepsis prediction system, storage medium and apparatus in intensive care unit | |
CN113053535A (en) | Medical information prediction system and medical information prediction method | |
CN113810792B (en) | Edge data acquisition and analysis system based on cloud computing | |
CN110522446A (en) | A kind of electroencephalogramsignal signal analysis method that accuracy high practicability is strong | |
CN111243677B (en) | Time control method and system based on cell quality assurance | |
Rabcan et al. | Electroencephalogram Signals Classification by Ordered Fuzzy Decision Tree. | |
Dhakal et al. | Prediction of anemia using machine learning algorithms | |
Ghane et al. | Diabetes Prediction using Feature Extraction and Machine Learning Models | |
CN115336977B (en) | Accurate ICU alarm grading evaluation method | |
CN106650284B (en) | A kind of rehabilitation evaluation system | |
CN114707608B (en) | Medical quality control data processing method, device, equipment, medium and program product | |
Oktavianti et al. | Implementation of naive bayes classification algorithm on infant and toddler nutritional status | |
Wang et al. | Classification of neonatal amplitude-integrated EEG using random forest model with combined feature | |
Magoev et al. | Application of clustering methods for detecting critical acute coronary syndrome patients | |
Nistal-Nuño | Artificial intelligence forecasting mortality at an intensive care unit and comparison to a logistic regression system | |
Charuvaka et al. | Multi-task learning for classifying proteins using dual hierarchies |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |