CN111243677B - Time control method and system based on cell quality assurance - Google Patents

Time control method and system based on cell quality assurance Download PDF

Info

Publication number
CN111243677B
CN111243677B CN202010016064.6A CN202010016064A CN111243677B CN 111243677 B CN111243677 B CN 111243677B CN 202010016064 A CN202010016064 A CN 202010016064A CN 111243677 B CN111243677 B CN 111243677B
Authority
CN
China
Prior art keywords
data
operation data
rsd
clustering
cluster
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010016064.6A
Other languages
Chinese (zh)
Other versions
CN111243677A (en
Inventor
曹毓琳
杨光
滕睿頔
杨蕊
刘鹏宇
白志惠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Tangyihuikang Biomedical Technology Co ltd
Original Assignee
Beijing Tangyihuikang Biomedical Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Tangyihuikang Biomedical Technology Co ltd filed Critical Beijing Tangyihuikang Biomedical Technology Co ltd
Priority to CN202010016064.6A priority Critical patent/CN111243677B/en
Publication of CN111243677A publication Critical patent/CN111243677A/en
Application granted granted Critical
Publication of CN111243677B publication Critical patent/CN111243677B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/30Unsupervised data analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/02Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]

Abstract

The invention provides a time control method and a time control system based on cell quality assurance, wherein the method comprises the steps of monitoring actual operation data c of an operation step in the process of cell culture by experimenters; the actual operation data c and the corresponding experience operation data range [ c ] 1 ,c 2 ]Making a comparison, c 2 Greater than c 1 (ii) a When c is in [ c ] 1 ,c 2 ]In the range, or c < c 1 Or c 2 <c<1.3c 2 If so, outputting a prompt for continuing the next operation step; and outputting actual operation data continuously monitored in other operation steps and corresponding experience operation data range [ c ] 1 ,c 2 ]A prompt to compare; and c is to 2 <c<1.3c 2 Highlighting actual operation data; when c is more than or equal to 1.3c 2 Outputting a prompt for terminating the experiment; the method and the system realize the control of the operation time of experimenters according to the acquired empirical operation data range, thereby improving the comprehensive index of the cultured cells and ensuring the cell quality.

Description

Time control method and system based on cell quality assurance
Technical Field
The invention belongs to the technical field of arrangement or management of medical care resources or facilities, and particularly relates to a time control method and a time control system based on cell quality assurance.
Background
In recent years, cell therapy techniques have become a focus of research. Cell preparations are continuously entering clinical research stage as novel medicines, including some immune cell preparations and stem cell preparations, etc. However, unlike conventional biological products, cell products are unique in terms of production process and quality control. The production process of the cell product comprises cell culture and cell preparation configuration, and the influence of the cell culture process on the quality of the cell preparation is important. The production process of cell culture comprises the links of sample storage, recovery, cell collection, separation, purification, passage, freezing storage and the like. The stability of the preparation is affected by various production process steps, parameters and operation of operators in the preparation process. Under the condition that laboratory conditions are ensured to meet GMP (good manufacturing practice) related requirements, in order to ensure the consistency and stability of experimenters in the experimental process, the training of the experimenters is strengthened at regular intervals according to the GMP related requirements, the time length used by each experimenters in the operation step is required to be recorded in the experimental process, the comprehensive indexes of cells cultured by the same experimenters in different operation time lengths under the same environment have differences, and the consistency and the stability of the cells also have differences; in the prior art, a specific value is given to the operation duration in order to standardize the operation of experimenters, but the operation duration and proficiency of different experimenters are different, and the operation of fixed duration is not beneficial to ensuring the cell quality. Therefore, if the empirical operation data range is obtained by collecting the historical operation data during the cell culture process, and then the actual operation data is compared with the empirical operation data range to realize the time control method for ensuring the cell quality, the problem which needs to be solved at present is urgently needed.
Disclosure of Invention
In order to solve the technical problems, the invention provides a time control method and system based on cell quality assurance.
One technical scheme of the invention provides a time control method based on cell quality assurance, which comprises the following steps:
monitoring actual operation data c of an operation step in the process of cell culture by an experimenter;
the actual operation data c and the corresponding experience operation data range [ c ] 1 ,c 2 ]Carrying out a comparison of c 2 Greater than c 1
When c is in [ c 1 ,c 2 ]In the range, or c < c 1 Or c 2 <c<1.3c 2 If so, outputting a prompt for continuing the next operation step; and outputting actual operation data continuously monitored in other operation steps and corresponding experience operation data range [ c ] 1 ,c 2 ]A prompt to compare; and c is to 2 <c<1.3c 2 Highlighting actual operation data; when c is more than or equal to 1.3c 2 And outputting a prompt for terminating the experiment.
Another aspect of the present invention provides a time control system based on cell quality assurance, the time control system comprising:
a timing monitoring module: the timing monitoring module is configured to monitor actual operation data c of a certain link in the process of cell culture performed by an experimenter;
a comparison module configured to compare actual operational data c with a corresponding empirical operational data range [ c ] 1 ,c 2 ]Making a comparison, c 2 Greater than c 1
An output module configured to output a signal when c is at [ c ] 1 ,c 2 ]In the range, or c < c 1 Or c 2 <c<1.3c 2 If so, outputting a prompt for continuing the next operation step; and outputting actual operation data continuously monitored in other operation steps and corresponding experience operation data range [ c ] 1 ,c 2 ]A prompt to compare; and c is to 2 <c<1.3c 2 Highlighting actual operation data; when c is more than or equal to 1.3c 2 And outputting a prompt for terminating the experiment.
According to the time control method and system based on cell quality assurance, provided by the invention, experimenters are subjected to cell culture historical data acquisition, processing, clustering and mining, then corresponding experience operation data ranges are selected for all steps of cell culture according to the indexes of the experimenters aiming at different experimenters, then the acquired actual operation data of a certain step of the experimenters are compared with the corresponding experience operation ranges, when the acquired actual operation data are within the ranges or slightly exceed the ranges, the results are highlighted, and when the acquired operation data are seriously beyond the experience operation ranges, a prompt symbol for stopping experiments is output, so that the operation time of the experimenters is controlled, and the comprehensive indexes of cultured cells are improved.
Drawings
FIG. 1 is a flow chart of a method of time control based on cell quality assurance;
FIG. 2 is a flow chart of an example of a method of time control based on cell quality assurance;
FIG. 3 shows the corresponding empirical operating data range [ c ] 1 ,c 2 ]A flowchart of the acquisition method of (1);
FIG. 4 is a flow chart of a method of pre-processing missing data;
FIG. 5 is a flow chart of a method of preprocessing exception data;
FIG. 6 is a flow chart of mining association rules using the Apriori algorithm;
FIG. 7 is a block diagram of a cell quality assurance-based time control system;
FIG. 8 is a block diagram of a structure of an empirical operation data range acquisition module.
The steps illustrated in the flow charts of the figures may be performed in a computer system such as a set of computer-executable instructions. Although a logical order is shown in the flow diagrams, in some cases, the steps described may be performed in an order different than here.
Detailed Description
Since the method of the present invention is described as being implemented in a computer system, the computer system may be provided in a processor of a server or a client. For example, the methods described herein may be implemented as software executable with control logic that is executed by a CPU in a server. The functionality described herein may be implemented as a set of program instructions stored in a non-transitory tangible computer readable medium. When implemented in this manner, the computer program comprises a set of instructions which, when executed by a computer, cause the computer to perform a method capable of carrying out the functions described above. Programmable logic may be temporarily or permanently installed in a non-transitory tangible computer-readable medium, such as a read-only memory chip, computer memory, disk, or other storage medium. In addition to being implemented in software, the logic described herein may be embodied using discrete components, integrated circuits, programmable logic for use in conjunction with a programmable logic device such as a field programmable gate array, FPGA, or microprocessor, or any other device including any combination thereof. All such implementations are within the scope of the present invention.
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1, one embodiment of the present invention provides a time control method based on cell quality assurance, including:
(1) Monitoring actual operation data c of an operation step in the process of cell culture by an experimenter;
(2) The actual operation data c and the corresponding experience operation data range [ c ] 1 ,c 2 ]Making a comparison, c 2 Greater than c 1
(3) When c is in [ c ] 1 ,c 2 ]In the range, or c < c 1 Or c 2 <c<1.3c 2 If so, outputting a prompt for continuing the next operation step; and outputting actual operation data continuously monitored in other operation steps and corresponding experience operation data range [ c ] 1 ,c 2 ]A prompt to compare; and c is to 2 <c<1.3c 2 Temporal actual operational data highlightingShown in the specification; when c is more than or equal to 1.3c 2 And outputting a prompt for terminating the experiment.
As shown in fig. 2, the above comparison process is illustrated:
operating on a corresponding empirical data range [ c ] 1 ,c 2 ]Setting a timing time, wherein the operation step is step one, setting an empirical operation data range [ c ] 1 ,c 2 ]Is [20min,25min]Setting the time of a timer to be 25min, carrying out the first step by an experimenter, starting countdown of the timer, finishing timing, finishing the operation of the first step by the experimenter, sending a prompt for continuing to operate the second step, and then according to the corresponding operation data range [ c ] of the second step 1 ,c 2 ]Setting a timer, continuing monitoring and comparing, and repeating the steps (1) and (2); when the timing is over, the operation of the experimenter is not over, the recording is continued until the operation is over, the time is counted, if the actual operation time =28min < 32.5min, although the actual operation time exceeds the range, the actual operation time does not seriously exceed the range, a prompt for continuing to operate the step two is sent, and then the operation data range [ c ] corresponding to the step two is checked 1 ,c 2 ]Setting a timer, and continuing monitoring and comparing; highlight 28 min. If the actual operation time =35min > 32.5min and seriously exceeds the range, a prompt for stopping the experiment is sent out, so that the reasonable control of the operation time of experimenters is realized, and the quality of cultured cells is ensured.
As shown in FIG. 3, in some preferred embodiments, the present invention further provides a corresponding empirical operating data range [ c ] 1 ,c 2 ]The method comprises the following steps:
10 Collecting operational data during the cell culture process;
the acquired operation data is historical operation data of an experimenter in a cell culture process; the operation data comprises an index value of an experimenter and operation time of each step; wherein, the operation data in each record comprises an index value of an experimenter and the operation time of each step in the cell culture process; the index value of the experimenter is the ratio of the operating age to the actual age of the experimenter; because the operation age and the actual age of the experimenters have different degrees of influence on the cell culture process, in order to ensure the quality of cultured cells, the time used in the culture process of different experimenters is accurately controlled, and the operation time length of different experimenters is reasonably selected, so that when operation data is collected, the operation age and the actual age of the experimenters need to be collected to obtain the index value of the experimenters;
20 Preprocessing missing data and abnormal data in the acquired operation data;
missing data exists in real data acquired in the experimental process, and the missing data needs to be processed in order to improve the accuracy of subsequent data processing.
As shown in fig. 4, in some preferred embodiments, the method for preprocessing the missing data comprises the following steps:
210 Calculating the weight occupied by each operation data;
the specific method for calculating the weight occupied by each operation data comprises the following steps:
firstly, subjectively assigning operation data, and then carrying out weight self-learning updating through a Bayesian network; see Huwenbin et al, in journal articles "research on weight self-learning method based on Bayesian network". And calculating the weight of the operation data, and judging how to process the missing data according to the weight, so that the accuracy of the operation data processing is improved.
211 ) the specific gravity delta of the missing data in each record is counted,
Figure BDA0002358492160000061
wherein, Z 1 The number of missing data; z is the total number of all data recorded in the strip;
212 A specific gravity δ and a specific gravity threshold δ 1 Comparing, when delta is larger than or equal to delta 1 When the record is deleted, deleting the record; determined by research, delta 1 =0.35; when the proportion of the missing data exceeds 0.35, if the missing data is filled, the accuracy of subsequent processing is affected, so that the data processing efficiency is reduced by over 9 percent; therefore, records with more than 0.35% missing data items need to be deleted;
213 When delta < delta 1 And if not, filling the missing data by using the operation data which has the same index as the experimenter in the record and corresponds to the missing data.
In some preferred embodiments, the padding data of the missing data provided by the present invention is calculated according to the following formula:
Figure BDA0002358492160000071
wherein n represents the number of records which is the same as the index value of the experimenter containing the missing data records; x is the number of i Representing operation data corresponding to missing data in the ith record;
Figure BDA0002358492160000072
representing an average of the operational data corresponding to the missing data within the n records; 1 in n +1 represents 1 record containing missing data; />
Figure BDA0002358492160000073
Indicates operation data corresponding to missing data and operation data x corresponding to the missing data in n records n+1 Average value of (d);
the following illustrates the padding process for missing data:
taking the cell collection operation duration as an example, the cell collection operation durations in n =5 records, which are the same as the experimenter index of the missing data record, are obtained, and are 9min,10min,16 min and 17min, respectively, and according to the 5 data, the calculation process of filling data for the item lacking the cell collection operation duration is as follows:
calculating the mean value
Figure BDA0002358492160000074
Calculated standard deviation SD =3.56
Calculating RSD 5 =0.278
Calculating the average value of 6 records of missing data needing to be filled
Figure BDA0002358492160000075
Figure BDA0002358492160000076
Calculating x 6 And ≈ 17.8, the missing data in record 6 is 17.8.
Missing data is filled by the method, so that the precision of the existing data is not influenced by the data added with the missing data; and further the accuracy of data processing is obviously improved.
Abnormal data may exist in the real data acquired in the experimental process, and the abnormal data needs to be processed, so that the accuracy of subsequent data processing is improved.
As shown in fig. 5, in some preferred embodiments, the present application further provides a method for preprocessing exception data, the method comprising the steps of:
220 Clustering all the records according to the indexes of experimenters by utilizing a K-Means clustering algorithm to obtain L clustering clusters;
the K-Means clustering algorithm is also called K-Means clustering, and comprises the following steps:
(1) First, some classes are selected and their respective center points are randomly initialized. The center point is the same location as the length of each data point vector.
(2) The distance of each data point to the center point is calculated, and the class to which the data point is closest to which center point is classified.
(3) The center point in each class is calculated as the new center point.
(4) The above steps are repeated until the center of each class does not change much after each iteration.
221 Computing RSD of all operation data of corresponding items in each cluster and comparing the RSD with a threshold value RSD 1 Comparing, when RSD is larger than or equal to RSD 1 Judging that abnormal data exist;
222 When abnormal data exists, calculating the average value of all operation data of the corresponding item in each cluster, and then calculating the RSD from the average value k to all operation data t
223 When RSD t <RSD 1 Calculating the RSD of all the operation data from the average value t + a t+a A > 0, until the calculated RSD t+a =RSD 1 Stopping the calculation; when RSD t Greater than RSD 1 Calculating the RSD of all the operation data from the average t-a distance t-a Until RSD is calculated t-a =RSD 1 Stopping the calculation;
224 Judging whether the weight of the abnormal data which is not within the distance of t or t +/-a is large, if so, deleting the corresponding item; if not, using RSD 1 And correcting abnormal data corresponding to the record.
30 Performing clustering processing on the preprocessed operation data to form L clustering clusters;
because the change amplitude of the index value of the experimenter is large, and the effect of the general cell comprehensive index is better along with the increase of the index value of the experimenter, all records are clustered according to the index of the experimenter by utilizing a K-Means clustering algorithm to obtain L clustering clusters.
Since the influence of slight change of the operation duration on the cell indexes is obvious, clustering is firstly carried out on each operation data in each clustering cluster by utilizing a K-Means clustering algorithm to form g eb Individual cluster of sub-clusters, g eb Representing the number of clustering sub-clusters corresponding to the b-th item of operation data of the e-th clustering cluster; l, determining the duration range of each operation data of the clustering sub-cluster by using an equal-width discrete method;
the specific method of the uniform-width discretization method of the b-th operation data of the e-th clustering cluster is as follows;
dividing width
Figure BDA0002358492160000091
Wherein, C ebmax B-th operation for representing e-th clusterTaking the maximum value of the data; c ebmin Represents the minimum value of the operation data of the b th item in the e th cluster, and is combined with the operation data of the b th item in the h cluster>
Figure BDA0002358492160000092
Represents the average value of the b-th operation data of the e-th cluster.
By using the method, the indexes of the experimenters and the operation data are clustered respectively, so that errors caused by subjective classification are reduced, and the accuracy of subsequent data processing is improved.
40 Performing association rule mining on the operation data in each cluster to form a frequent item set;
association rule (association rule): is an implication expression of the form X → Y, where X and Y are disjoint sets of terms, namely:
Figure BDA0002358492160000093
the strength of an association rule may be measured in terms of its support (support) and confidence (confidence).
The support degree is as follows: for item set X, set
Figure BDA0002358492160000094
For the number of X contained in the set D, | D | represents the total number of item sets in the set D; the support of item set X is: />
Figure BDA0002358492160000095
An association rule R:
Figure BDA0002358492160000096
the support degree of the association rule R is the number count (X # Y) of the set D containing X and Y at the same time; namely:
Figure BDA0002358492160000097
the confidence level represents the probability of one data appearing after another data appears, or the conditional probability of the data. The confidence of the association rule R is the ratio of the number containing X and Y to the number containing X, i.e.:
Figure BDA0002358492160000098
the Apriori algorithm is a representative algorithm for Association rule mining (Association rule mining).
The Apriori algorithm comprises the following specific operation steps:
inputting: a data set D, a support degree threshold value alpha;
and (3) outputting: the largest frequent k term set;
1) Scanning the whole data set to obtain all the appeared data as a candidate frequent 1 item set; k =1, and the set of 0 frequent terms is an empty set.
2) And mining a frequent k item set.
a) Scanning data to calculate the support degree of a candidate frequent k item set;
b) And removing the data set with the support degree lower than the threshold value in the candidate frequent k item set to obtain a frequent k item set. And if the obtained frequent k item set is empty, directly returning the set of the frequent k-1 item set as an algorithm result, and ending the algorithm. If the obtained frequent k item set has only one item, directly returning the set of the frequent k item set as an algorithm result, and ending the algorithm;
c) Based on the frequent k item set, generating a candidate frequent k +1 item set in a connected mode;
3) Let k = k +1, proceed to step 2.
As shown in fig. 6, the association rule is mined by Apriori algorithm in the present invention, and the specific method is as follows:
410 Using the operation data in each cluster as a candidate set, wherein L candidate sets are required to be subjected to association rule mining, calculating the support degree of each operation data in the candidate set, and removing items with the support degree smaller than a support degree threshold value to obtain a frequent 1 item set;
411 Connect the frequent 1 item set, obtain the candidate 2 item set, find the support degree is greater than the 2 items of the support degree threshold value, form the frequent 2 item set, so on, until the frequent k item set is empty, return the set of the frequent k-1 item set directly to cooperate as the frequent item set;
50 Computing the confidence coefficient of each subset in the frequent item set, wherein the frequent item set with the confidence coefficient larger than a threshold value forms a strong association rule;
60 According to strong association rules, determining the empirical operation data range [ c ] corresponding to the cell culture step of the experimenter 1 ,c 2 ]。
Wherein the step 60) comprises the following steps:
judging the number of strong association rules, and selecting the record with the highest cell comprehensive index in the strong association rules as the strong association rules when the number of the strong association rules is more than 1;
selecting a strong association rule corresponding to the experimenter index of the experimenter to be experimented as an experience operation data range [ c ] corresponding to the experimenter to carry out the cell culture step 1 ,c 2 ]。
According to the method provided by the invention, through collecting, processing, clustering and mining the cell culture historical data of experimenters, corresponding experience operation data ranges are selected for all steps of cell culture according to the indexes of the experimenters aiming at different experimenters, then the collected actual operation data of a certain step of the experimenters is compared with the corresponding experience operation ranges, when the operation data is within the range or slightly exceeds the range, the result is highlighted, and when the operation data seriously exceeds the experience operation range, a prompt symbol for stopping the experiment is output, so that the operation time of the experimenters is controlled, and the comprehensive indexes of cultured cells are improved.
As shown in fig. 7, another embodiment of the present invention provides a cell quality assurance-based time control system, including:
the timing monitoring module 1 is configured to monitor actual operation data c of a certain link in the cell culture process of an experimenter;
a comparison module 2 configured to compare actual operational data c with a corresponding empirical operational data range [ c [ ] 1 ,c 2 ]Carrying out a comparison of c 2 Is greater than c 1
An output module 3 configured to output when c is [ c ] 1 ,c 2 ]In the range, or c < c 1 Or c 2 <c<1.3c 2 If so, outputting a prompt for continuing the next operation step; and outputting actual operation data continuously monitored in other operation steps and corresponding experience operation data range [ c ] 1 ,c 2 ]A prompt to compare; and c is to 2 <c<1.3c 2 Highlighting actual operation data; when c is more than or equal to 1.3c 2 And outputting a prompt for terminating the experiment.
The system provides comprehensive indexes and quality of cultured cells by controlling the operation time of experimenters.
As shown in fig. 8, in some preferred embodiments, the time control system further comprises an empirical operation data range acquisition module, the empirical operation data range acquisition module comprising:
a data acquisition sub-module 10 configured to acquire operational data during a cell culture process;
the acquired operation data is historical operation data of an experimenter in a cell culture process; the operation data comprises an index value of an experimenter and operation time of each step; wherein the operation data in each record comprises an index value of an experimenter, and the operation time of each step of cell culture is long; the index value of the experimenter is the ratio of the operating age to the actual age of the experimenter; because the operation age and the actual age of the experimenters have different degrees of influence on the cell culture process, in order to ensure the quality of cultured cells, accurately control the time used in the culture process of different experimenters, and reasonably select the operation time for different experimenters, the operation age and the actual age of the experimenters need to be collected when operation data are collected, and the index values of the experimenters are obtained;
a data preprocessing sub-module 20 configured to preprocess missing data and abnormal data within the acquired operation data;
missing data exists in real data acquired in the experimental process, and the missing data needs to be processed in order to improve the accuracy of subsequent data processing.
In some preferred embodiments, the method for preprocessing the missing data comprises the following steps:
calculating the weight occupied by each operation data;
the specific method for calculating the weight occupied by each operation data comprises the following steps:
firstly, subjectively assigning operation data, and then carrying out weight self-learning updating through a Bayesian network; see Huwenbin et al, in journal articles "research on weight self-learning method based on Bayesian network". And calculating the weight of the operation data, and judging how to process the missing data according to the weight, so that the accuracy of the operation data processing is improved.
The proportion delta occupied by the missing data in each record is counted,
Figure BDA0002358492160000131
wherein Z is 1 The number of missing data; z is the total number of all items of the record;
the specific gravity delta is compared with a specific gravity threshold value delta 1 Comparing, when delta is larger than or equal to delta 1 When the record is deleted, deleting the record; determined by research, delta 1 =0.35; when the proportion of the missing data exceeds 0.35, if the missing data is filled, the accuracy of subsequent processing is affected, so that the data processing efficiency is reduced by over 9 percent; therefore, records with more than 0.35% missing data items need to be deleted;
when delta < delta 1 And if not, filling the missing data by using the operation data which has the same indexes as the experimenters in the record and corresponds to the missing data.
In some preferred embodiments, the padding data of the missing data provided by the present invention is calculated according to the following formula:
Figure BDA0002358492160000132
wherein n represents the number of records which is the same as the index value of the experimenter containing the missing data records; x is the number of i Representing operation data corresponding to missing data in the ith record;
Figure BDA0002358492160000133
representing an average of the operational data corresponding to the missing data within the n records; 1 in n +1 represents 1 record containing missing data; />
Figure BDA0002358492160000134
Indicates operation data corresponding to missing data and operation data x corresponding to the missing data in n records n+1 Average value of (d);
the following illustrates the padding process of missing data:
taking the cell collection operation duration as an example, the cell collection operation durations in n =5 records, which are the same as the experimenter index of the missing data record, are obtained, and are 9min,10min,16 min and 17min, respectively, and according to the 5 data, the calculation process of filling data for the item lacking the cell collection operation duration is as follows:
calculating the mean value
Figure BDA0002358492160000141
Calculated standard deviation SD =3.56
Calculating RSD 5 =0.278
Calculating the average value of 6 records of missing data needing to be filled
Figure BDA0002358492160000142
Figure BDA0002358492160000143
Calculating x 6 17.8, so the 6 th entryThe data for the intra-missing data is 17.8.
Missing data is filled by the method, so that the precision of the existing data is not influenced by the data added with the missing data; and further the accuracy of data processing is obviously improved.
Abnormal data may exist in the real data acquired in the experimental process, and the abnormal data needs to be processed, so that the accuracy of subsequent data processing is improved.
In some preferred embodiments, the present application further provides a method for preprocessing exception data, the method comprising the steps of:
clustering all the records by using a K-Means clustering algorithm according to the indexes of the experimenters to obtain L clustering clusters;
the K-Means clustering algorithm is also called K-Means clustering, and comprises the following steps:
(1) First, some classes are selected and their respective center points are randomly initialized. The center point is the same length position as each data point vector.
(2) The distance of each data point to the center point is calculated, and the class to which the data point is closest to which center point is classified.
(3) The center point in each class is calculated as the new center point.
(4) The above steps are repeated until the center of each class does not change much after each iteration.
Calculating the RSD of all operation data of the corresponding item in each cluster and comparing the RSD with a threshold value RSD 1 Comparing, when RSD is larger than or equal to RSD 1 Judging that abnormal data exists;
when abnormal data exists, calculating the average value of all operation data of corresponding items in each cluster, and then calculating the RSD from the average value k to all operation data t
When RSD t <RSD 1 Calculating RSD of all operation data from the average value t + a t+a A > 0, up to the calculated RSD t+a =RSD 1 Stopping the calculation; when RSD t Greater than RSD 1 Time, calculateRSD of all operational data from the mean t-a distance t-a Until RSD is calculated t-a =RSD 1 Stopping the calculation;
judging whether the weight of the abnormal data which is not within the distance t or t +/-a is large, and if so, deleting the corresponding item; if not, using RSD 1 And correcting abnormal data corresponding to the record.
A data clustering sub-module 30 configured to cluster the preprocessed operation data to form L clusters;
because the change amplitude of the index value of the experimenter is large, and the effect of the general cell comprehensive index is better along with the increase of the index value of the experimenter, all records are clustered according to the index of the experimenter by utilizing a K-Means clustering algorithm to obtain L clustering clusters.
Since the influence of slight change of the operation duration on the cell indexes is obvious, clustering is firstly carried out on each operation data in each clustering cluster by utilizing a K-Means clustering algorithm to form g eb Individual cluster of sub-clusters, g eb Representing the number of clustering sub-clusters corresponding to the b-th item of operation data of the e-th clustering cluster; l, determining the duration range of each operation data of the clustering sub-clusters by using an equal-width discrete method;
the specific method of the uniform-width discretization method of the b-th operation data of the e-th clustering cluster is as follows;
dividing width
Figure BDA0002358492160000151
Wherein, C ebmax Representing the maximum value of the b-th operation data of the e-th clustering cluster; c ebmin Represents the minimum value of the operation data of the b th item in the e th cluster, and is combined with the operation data of the b th item in the h cluster>
Figure BDA0002358492160000161
Represents the average value of the b-th operation data of the e-th cluster.
By using the method, the indexes of the experimenters and the operation data are clustered respectively, so that errors caused by subjective classification are reduced, and the accuracy of subsequent data processing is improved.
An association rule mining sub-module 40 configured to perform association rule mining on the operation data in each cluster formed through clustering processing to form a frequent item set;
association rule (association rule): is an implication expression of the form X → Y, where X and Y are disjoint sets of terms, i.e.:
Figure BDA0002358492160000162
the strength of an association rule may be measured in terms of its support (support) and confidence (confidence).
The support degree is as follows: for item set X, set
Figure BDA0002358492160000163
For the number of X contained in the set D, | D | represents the total number of item sets in the set D; then the support of item set X is: />
Figure BDA0002358492160000164
An association rule R:
Figure BDA0002358492160000165
the support degree of the association rule R is the number count (X # Y) of the set D containing X and Y at the same time; namely:
Figure BDA0002358492160000166
the confidence level represents the probability of one data appearing after another, or the conditional probability of the data. The confidence of the association rule R is the ratio of the number containing X and Y to the number containing X, i.e.:
Figure BDA0002358492160000167
the Apriori algorithm is a representative algorithm for Association rule mining (Association rule mining).
The Apriori algorithm comprises the following specific operation steps:
inputting: a data set D, a support degree threshold value alpha;
and (3) outputting: the largest set of frequent k terms;
1) Scanning the whole data set to obtain all the appeared data as a candidate frequent 1 item set; k =1, and the set of 0 frequent terms is an empty set.
2) And mining a frequent k item set.
a) Scanning data to calculate the support degree of a candidate frequent k item set;
b) And removing the data set with the support degree lower than the threshold value in the candidate frequent k item set to obtain the frequent k item set. And if the obtained frequent k item set is empty, directly returning the set of the frequent k-1 item set as an algorithm result, and ending the algorithm. If the obtained frequent k item set has only one item, directly returning the set of the frequent k item set as an algorithm result, and ending the algorithm;
c) Based on the frequent k item set, generating a candidate frequent k +1 item set in a connected mode;
3) Let k = k +1 and proceed to step 2.
The invention utilizes Apriori algorithm to mine association rules, and the specific method is as follows:
taking the operation data in each cluster as a candidate set, wherein L candidate sets are required to be subjected to association rule mining, calculating the support degree of each operation data in the candidate set, and removing items with the support degree smaller than a support degree threshold value to obtain a frequent 1 item set;
connecting the frequent 1 item sets to obtain a candidate 2 item set, finding 2 items with the support degree larger than the support degree threshold value to form a frequent 2 item set, and repeating the steps until the frequent k item set is empty, and directly returning the set of the frequent k-1 item set to serve as a frequent item set;
a strong association rule forming sub-module 50 configured to calculate a confidence level for each subset in the frequent item set, the frequent item set having a confidence level greater than a threshold forming a strong association rule;
an empirical operation data range acquisition submodule 60, theThe experiment operation data range acquisition submodule is configured to determine the experiment operation data range [ c ] corresponding to the cell culture step of the person to be tested according to the strong association rule 1 ,c 2 ]。
The system provided by the invention determines the experience operation data range of experimenters according to historical operation data, and then compares the actual operation data with the experience operation data range, thereby realizing the control of the operation time of the experimenters and further improving the comprehensive index of cultured cells.
The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (8)

1. A time control method based on cell quality assurance is characterized by comprising the following steps:
monitoring actual operation data c of an operation step in the process of cell culture by an experimenter;
the actual operation data c and the corresponding experience operation data range [ c ] 1 ,c 2 ]Carrying out a comparison of c 2 Is greater than c 1
When c is in [ c ] 1 ,c 2 ]In the range, or c < c 1 Or c 2 <c<1.3c 2 If so, outputting a prompt for continuing the next operation step; and outputting actual operation data continuously monitored in other operation steps and corresponding experience operation data range [ c ] 1 ,c 2 ]A prompt to compare; and c is to 2 <c<1.3c 2 Highlighting actual operation data; when c is more than or equal to 1.3c 2 Outputting a prompt for terminating the experiment;
the corresponding empirical operating data range [ c ] 1 ,c 2 ]The acquisition method comprises the following stepsThe method comprises the following steps:
collecting operation data in the cell culture process, wherein the operation data comprises an index value of an experimenter and operation time of each step;
preprocessing missing data and abnormal data in the acquired operation data;
clustering the preprocessed operation data to form L clustering clusters;
performing association rule mining on the operation data in each cluster to form a frequent item set;
calculating the confidence coefficient of each subset in the frequent item set, wherein the frequent item set with the confidence coefficient larger than a threshold value forms a strong association rule;
determining the empirical operation data range [ c ] corresponding to the cell culture step of the personnel to be tested according to the strong association rule 1 ,c 2 ]。
2. The cell quality assurance-based time control method of claim 1, wherein the method of preprocessing the missing data comprises the steps of:
calculating the weight occupied by each operation data;
the proportion delta occupied by the missing data in each record is counted,
Figure FDA0004085315470000021
wherein Z is 1 The number of missing data; z is the total number of all data recorded in the strip;
the specific gravity delta is compared with a specific gravity threshold value delta 1 Comparing, when delta is larger than or equal to delta 1 When the record is deleted, deleting the record;
when delta < delta 1 And if not, filling the missing data by using the operation data which has the same index as the experimenter in the record and corresponds to the missing data.
3. The cell quality assurance-based time control method of claim 2, wherein the padding data of the missing data is calculated according to the following formula:
Figure FDA0004085315470000022
wherein n represents the number of records which is the same as the index value of the experimenter containing the missing data records; x is the number of i Indicating operation data corresponding to the missing data in the ith record;
Figure FDA0004085315470000023
representing an average of the operational data corresponding to the missing data within the n records; 1 in n +1 represents 1 record containing missing data; />
Figure FDA0004085315470000024
Indicates operation data corresponding to missing data and operation data x corresponding to the missing data in n records n+1 Average value of (a).
4. The cell quality assurance-based time control method according to claim 1, wherein the method of preprocessing the abnormal data comprises the steps of:
clustering all the records by using a K-Means clustering algorithm according to the indexes of the experimenters to obtain L clustering clusters;
calculating RSD of all operation data of corresponding items in each cluster and comparing the RSD with a threshold value RSD 1 Comparing, when RSD is larger than or equal to RSD 1 Judging that abnormal data exist;
when abnormal data exist, calculating the average value of all operation data of corresponding items in each cluster, and then calculating the RSD of the distance k from the average value to all operation data t
When RSD t <RSD 1 Calculating the RSD of all the operation data from the average value t + a t+a A > 0, until the calculated RSD t+a =RSD 1 Stopping the calculation; when RSD t Greater than RSD 1 Calculating RSD of all operation data from the average t-a distance t-a Until RSD is calculated t-a =RSD 1 Stopping the calculation;
judging whether the weight of the abnormal data which is not within the distance t or t +/-a is large, and if so, deleting the corresponding item; if not, using RSD 1 And correcting abnormal data corresponding to the record.
5. The cell quality assurance-based time control method of claim 1, wherein clustering the preprocessed operational data includes clustering the experimenter indicators within the operational data using a K-Means clustering algorithm.
6. The cell quality assurance-based time control method of claim 1, wherein the clustering the preprocessed operation data comprises clustering the operation durations of the respective steps as follows:
clustering each operation data in each cluster by using a K-Means clustering algorithm to form g eb Individual cluster of sub-clusters, g eb Representing the number of clustering sub-clusters corresponding to the b-th item of operation data of the e-th clustering cluster; l, determining the duration range of each operation data of the clustering sub-cluster by using an equal-width discrete method; the specific method of the uniform width discretization method of the b-th operation data of the e-th clustering cluster is as follows;
dividing width
Figure FDA0004085315470000031
Wherein, C ebmax Representing the maximum value of the b-th operation data of the e-th clustering cluster; c ebmin Represents the minimum value of the b-th operation data of the e-th cluster>
Figure FDA0004085315470000032
Represents the average value of the b-th operation data of the e-th cluster.
7. The cell quality assurance-based time control method according to claim 1, wherein the association rule mining of the operation data within each cluster is association rule mining using Apriori algorithm.
8. A time control system based on cell quality assurance, the time control system comprising:
timing monitoring module: the timing monitoring module is configured to monitor actual operation data c of a certain link in the process of cell culture of an experimenter;
a comparison module configured to compare actual operational data c with a corresponding empirical operational data range [ c ] 1 ,c 2 ]Carrying out a comparison of c 2 Greater than c 1
An output module configured to output a signal when c is at [ c ] 1 ,c 2 ]In the range, or c < c 1 Or c 2 <c<1.3c 2 If so, outputting a prompt for continuing the next operation step; and outputting actual operation data continuously monitored in other operation steps and corresponding experience operation data range [ c ] 1 ,c 2 ]A prompt to compare; and c is to 2 <c<1.3c 2 Highlighting actual operation data; when c is more than or equal to 1.3c 2 Outputting a prompt for terminating the experiment;
the time control system further comprises an empirical operation data range acquisition module, the empirical operation data range acquisition module comprising:
a data acquisition submodule configured to acquire operation data in a cell culture process, the operation data including an experimenter index value and operation durations of each step;
a data preprocessing sub-module configured to preprocess missing data and abnormal data within the obtained operation data;
the data clustering submodule is configured to perform clustering processing on the preprocessed operation data to form L clustering clusters;
the association rule mining submodule is configured to mine association rules for the operation data in each cluster to form a frequent item set;
a strong association rule forming sub-module configured to calculate a confidence of each subset in the frequent item set, the frequent item set having a confidence greater than a threshold forming a strong association rule;
an empirical operation data range acquisition submodule configured to determine an empirical operation data range [ c ] corresponding to a cell culture step performed by a person to be tested according to a strong association rule 1 ,c 2 ]。
CN202010016064.6A 2020-01-07 2020-01-07 Time control method and system based on cell quality assurance Active CN111243677B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010016064.6A CN111243677B (en) 2020-01-07 2020-01-07 Time control method and system based on cell quality assurance

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010016064.6A CN111243677B (en) 2020-01-07 2020-01-07 Time control method and system based on cell quality assurance

Publications (2)

Publication Number Publication Date
CN111243677A CN111243677A (en) 2020-06-05
CN111243677B true CN111243677B (en) 2023-04-14

Family

ID=70877603

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010016064.6A Active CN111243677B (en) 2020-01-07 2020-01-07 Time control method and system based on cell quality assurance

Country Status (1)

Country Link
CN (1) CN111243677B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006230333A (en) * 2005-02-28 2006-09-07 Hitachi Medical Corp Flow site meter, method for analyzing cell, cell-analyzing method, method for setting sensitivity of fluorescent light detector and method for setting standard gate in positive rate-judging method
CN101675339A (en) * 2007-04-16 2010-03-17 动量制药公司 The method that relates to cell surface glycosylation
EP3404090A1 (en) * 2017-05-15 2018-11-21 Eppendorf AG Incubator, system and method for monitored cell growth
WO2019018684A1 (en) * 2017-07-21 2019-01-24 The Board Of Trustees Of The Leland Stanford Junior University Systems and methods for analyzing mixed cell populations
WO2019051130A1 (en) * 2017-09-06 2019-03-14 uBiome, Inc. Nasal-related characterization associated with the nose microbiome
WO2019099716A1 (en) * 2017-11-16 2019-05-23 The United States Of America, As Represented By The Secretary, Department Of Health And Human Services Clustering methods using a grand canonical ensemble
CN110023759A (en) * 2016-09-19 2019-07-16 血液学有限公司 For using system, method and the product of multidimensional analysis detection abnormal cell
JPWO2018092321A1 (en) * 2016-11-21 2019-10-10 オリンパス株式会社 Method for analyzing reprogramming of somatic cells and method for creating quality evaluation criteria for iPS cells using the same

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100153082A1 (en) * 2008-09-05 2010-06-17 Newman Richard D Systems and methods for cell-centric simulation of biological events and cell based-models produced therefrom
WO2017075636A2 (en) * 2015-10-28 2017-05-04 Chiscan Holdings, Llc Methods of cross correlation of biofield scans to enome database, genome database, blood test, and phenotype data

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006230333A (en) * 2005-02-28 2006-09-07 Hitachi Medical Corp Flow site meter, method for analyzing cell, cell-analyzing method, method for setting sensitivity of fluorescent light detector and method for setting standard gate in positive rate-judging method
CN101675339A (en) * 2007-04-16 2010-03-17 动量制药公司 The method that relates to cell surface glycosylation
CN110023759A (en) * 2016-09-19 2019-07-16 血液学有限公司 For using system, method and the product of multidimensional analysis detection abnormal cell
JPWO2018092321A1 (en) * 2016-11-21 2019-10-10 オリンパス株式会社 Method for analyzing reprogramming of somatic cells and method for creating quality evaluation criteria for iPS cells using the same
EP3404090A1 (en) * 2017-05-15 2018-11-21 Eppendorf AG Incubator, system and method for monitored cell growth
WO2019018684A1 (en) * 2017-07-21 2019-01-24 The Board Of Trustees Of The Leland Stanford Junior University Systems and methods for analyzing mixed cell populations
WO2019051130A1 (en) * 2017-09-06 2019-03-14 uBiome, Inc. Nasal-related characterization associated with the nose microbiome
WO2019099716A1 (en) * 2017-11-16 2019-05-23 The United States Of America, As Represented By The Secretary, Department Of Health And Human Services Clustering methods using a grand canonical ensemble

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
人源间充质干细胞质量鉴定体系初步构建;彭冬秀;《中国优秀硕士学位论文全文数据库》;20180312(第12期);全文 *

Also Published As

Publication number Publication date
CN111243677A (en) 2020-06-05

Similar Documents

Publication Publication Date Title
Qin et al. A machine learning methodology for diagnosing chronic kidney disease
CN111261282A (en) Sepsis early prediction method based on machine learning
WO2020181805A1 (en) Diabetes prediction method and apparatus, storage medium, and computer device
Karabulut et al. Analysis of cardiotocogram data for fetal distress determination by decision tree based adaptive boosting approach
CN108847285B (en) Down syndrome screening method for pre-pregnancy and mid-pregnancy based on machine learning
KR101756827B1 (en) Biometric information based notification System and mehod for abnormality sign
KR20180044739A (en) Method and apparatus for optimizing rule using deep learning
CN112837799B (en) Remote internet big data intelligent medical system based on block chain
EP3422222A1 (en) Method and state machine system for detecting an operation status for a sensor
CN113871009A (en) Sepsis prediction system, storage medium and apparatus in intensive care unit
CN113053535A (en) Medical information prediction system and medical information prediction method
CN113810792B (en) Edge data acquisition and analysis system based on cloud computing
CN110522446A (en) A kind of electroencephalogramsignal signal analysis method that accuracy high practicability is strong
CN111243677B (en) Time control method and system based on cell quality assurance
Rabcan et al. Electroencephalogram Signals Classification by Ordered Fuzzy Decision Tree.
Dhakal et al. Prediction of anemia using machine learning algorithms
Ghane et al. Diabetes Prediction using Feature Extraction and Machine Learning Models
CN115336977B (en) Accurate ICU alarm grading evaluation method
CN106650284B (en) A kind of rehabilitation evaluation system
CN114707608B (en) Medical quality control data processing method, device, equipment, medium and program product
Oktavianti et al. Implementation of naive bayes classification algorithm on infant and toddler nutritional status
Wang et al. Classification of neonatal amplitude-integrated EEG using random forest model with combined feature
Magoev et al. Application of clustering methods for detecting critical acute coronary syndrome patients
Nistal-Nuño Artificial intelligence forecasting mortality at an intensive care unit and comparison to a logistic regression system
Charuvaka et al. Multi-task learning for classifying proteins using dual hierarchies

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant