CN114112980A - Medicine component detection method and system based on data analysis - Google Patents
Medicine component detection method and system based on data analysis Download PDFInfo
- Publication number
- CN114112980A CN114112980A CN202210077693.9A CN202210077693A CN114112980A CN 114112980 A CN114112980 A CN 114112980A CN 202210077693 A CN202210077693 A CN 202210077693A CN 114112980 A CN114112980 A CN 114112980A
- Authority
- CN
- China
- Prior art keywords
- functional group
- data
- model
- training
- drug
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 239000003814 drug Substances 0.000 title claims abstract description 180
- 229940079593 drug Drugs 0.000 title claims abstract description 138
- 238000007405 data analysis Methods 0.000 title claims abstract description 30
- 238000001514 detection method Methods 0.000 title claims description 108
- 125000000524 functional group Chemical group 0.000 claims abstract description 384
- 102000004169 proteins and genes Human genes 0.000 claims abstract description 54
- 108090000623 proteins and genes Proteins 0.000 claims abstract description 54
- 238000004458 analytical method Methods 0.000 claims abstract description 26
- 238000000034 method Methods 0.000 claims abstract description 22
- 238000001328 terahertz time-domain spectroscopy Methods 0.000 claims abstract description 10
- 238000005516 engineering process Methods 0.000 claims abstract description 8
- 238000012549 training Methods 0.000 claims description 165
- 239000000126 substance Substances 0.000 claims description 15
- 230000008685 targeting Effects 0.000 claims description 15
- 239000013598 vector Substances 0.000 claims description 15
- 239000008194 pharmaceutical composition Substances 0.000 claims description 14
- 238000001228 spectrum Methods 0.000 claims description 14
- 230000000694 effects Effects 0.000 claims description 13
- 238000004364 calculation method Methods 0.000 claims description 6
- 238000012360 testing method Methods 0.000 claims description 2
- 238000004587 chromatography analysis Methods 0.000 abstract description 6
- 238000006243 chemical reaction Methods 0.000 abstract description 4
- 230000009286 beneficial effect Effects 0.000 abstract description 3
- 238000004519 manufacturing process Methods 0.000 description 4
- 238000011478 gradient descent method Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000004940 physical analysis method Methods 0.000 description 2
- 230000001225 therapeutic effect Effects 0.000 description 2
- 238000005481 NMR spectroscopy Methods 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000009614 chemical analysis method Methods 0.000 description 1
- 230000008094 contradictory effect Effects 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 230000004060 metabolic process Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000005220 pharmaceutical analysis Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000003908 quality control method Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000001172 regenerating effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N21/00—Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
- G01N21/17—Systems in which incident light is modified in accordance with the properties of the material investigated
- G01N21/25—Colour; Spectral properties, i.e. comparison of effect of material on the light at two or more different wavelengths or wavelength bands
- G01N21/31—Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry
- G01N21/35—Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light
- G01N21/3581—Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light using far infrared light; using Terahertz radiation
- G01N21/3586—Investigating relative effect of material at wavelengths characteristic of specific elements or molecules, e.g. atomic absorption spectrometry using infrared light using far infrared light; using Terahertz radiation by Terahertz time domain spectroscopy [THz-TDS]
Abstract
The invention provides a method and a system for detecting medicine components based on data analysis, wherein the method comprises the following steps: the method comprises the steps of obtaining a first target protein, a target drug to be detected and a second target protein after the target drug acts on the first target protein, extracting functional group data of the target protein through a terahertz time-domain spectroscopy technology, inputting the data into a generation model to generate a result through obtaining related data of the target protein and the target drug, and correcting the result through a discrimination network. The invention has the beneficial effects that: the target drug is not only subjected to chromatographic analysis, but also subjected to comprehensive analysis from data obtained after reaction of the target drug and the target protein, so that the accuracy of result analysis is improved, and the result is more real and credible.
Description
Technical Field
The invention relates to the field of digital medical treatment, in particular to a method and a system for detecting medicine components based on data analysis.
Background
The pharmaceutical analysis is an important branch in analytical chemistry, gradually becomes a relatively independent discipline in analytical chemistry along with the development of pharmaceutical chemistry, and has wide application in the aspects of quality control of medicines, new medicine research, medicine metabolism, chiral medicine analysis and the like.
At present, the analysis of drugs only comprises chemical or physical analysis of drugs, the traditional chemical analysis method is complex and has high analysis cost, while the physical analysis method mainly comprises detection through chromatogram and nuclear magnetic resonance, however, the analysis method cannot well analyze drug components and has large errors.
Disclosure of Invention
The invention mainly aims to provide a method, a device, equipment and a storage medium for detecting medicine components based on data analysis, and aims to solve the problem that the existing physical analysis method has larger errors.
The invention provides a medicine component detection method based on data analysis, which comprises the following steps:
obtaining a first target protein, a target drug to be detected and a second target protein after the target drug acts on the first target protein;
obtaining a first chromatogram of the first targeting protein, a second chromatogram of the targeting drug and a third chromatogram of the second targeting protein by a terahertz time-domain spectroscopy technology;
acquiring corresponding first functional group data from the first chromatogram, acquiring corresponding second functional group data from the second chromatogram, and acquiring corresponding third functional group data from the third chromatogram;
comparing the third functional group data with the first functional group data to obtain fourth functional group data with reduced third functional group data and fifth functional group data with increased third functional group;
inputting the second functional group data into a generative model, and inputting the fifth functional group data, the first functional group data, the third functional group data and the fourth functional group data into a discriminant model; wherein, the generation model and the discrimination model are formed by synchronously training different functional group data and corresponding medicine components;
and correcting the result output by the generated model according to the output result of the discrimination model to obtain the medicine component output by the generated model.
Further, before the step of inputting the second functional group data into a generative model and inputting the fifth functional group data, the first functional group data, the third functional group data, and the fourth functional group data into a discriminant model, the method further includes:
acquiring a functional group training sample set; wherein a set of data in the functional group training sample set comprises drug componentsFirst functional group training dataSecond functional group training dataThird functional group training dataFourth functional group training dataAnd fifth functional group training data;
Training the first functional group to dataInputting the data into an initial generation model to obtain the best predicted valueThe pharmaceutical composition is preparedInputting into the initial generation model by formulaPerforming initial training on the initial generation model and obtaining a trained temporary predicted valueAnd an intermediate generation model, and generating the model,
and training the second functional group with dataThe third functional group training dataThe fourth functional group training dataAnd said fifth functional group training dataCarrying out vector splicing to obtain comprehensive training dataCombining the training dataInputting into an initial discrimination model by formulaCarrying out initial training on the initial generation model to obtain an intermediate discrimination model; wherein the content of the first and second substances,,,a set of parameters representing the generative model,a set of parameters representing a discriminant model;
according to the formulaCarrying out secondary training on the intermediate generation model and the intermediate discrimination model, and obtaining the generation model and the discrimination model after the training is finished; whereinExpression is taken on the premise that the formula is satisfiedMinimum value of andis measured.
Further, after the step of obtaining the generated model and the discriminant model after the training is completed, the method further includes:
acquiring a functional group detection sample set and real medicine components; wherein a set of data in the functional group detection sample set comprises a drug component, first functional group detection data, second functional group detection data, third functional group detection data, fourth functional group detection data, and fifth functional group detection data;
detecting the first functional groupInputting into the generative model, and detecting the second functional group data, theCarrying out vector splicing on the third functional group detection data, the fourth functional group detection data and the fifth functional group detection data to obtain comprehensive detection data, inputting the comprehensive detection data into the judgment model, and correcting the generated model to obtain a predicted medicine component;
obtaining the comprehensive loss value of the generation model and the intermediate discrimination model according to the predicted medicine component and the real medicine component;
judging whether the comprehensive loss value is smaller than a preset loss value or not;
if yes, judging that the generated model and the discrimination model obtained after training meet the training requirements.
Further, before the step of inputting the second functional group data into a generative model and inputting the fifth functional group data, the first functional group data, the third functional group data, and the fourth functional group data into a discriminant model, the method further includes:
inputting the second functional group data into a preset drug component analysis model to obtain a plurality of target drug components; the drug component analysis model is trained according to various drug components and corresponding functional groups;
inputting each target drug component into the generative model to be used as an output channel of the generative model respectively.
Further, after the step of comparing the third functional group data with the first functional group data to obtain the fourth functional group data with decreased third functional group data and the fifth functional group data with increased third functional group data, the method further comprises:
carrying out weighted average on the fourth functional group data and the fifth functional group data to obtain sixth functional group data;
acquiring the number of first functional groups according to the data of the sixth functional group;
dividing the number of the first functional groups by corresponding second functional group data in the second functional group data to obtain a binding score of the targeted drug;
and judging the curative effect of the targeted drug according to the binding score.
The invention also provides a drug component detection system based on data analysis, comprising:
the first acquisition module is used for acquiring a first target protein, a target drug to be detected and a second target protein after the target drug acts on the first target protein;
the second acquisition module is used for acquiring a first chromatogram of the first targeting protein, a second chromatogram of the targeting drug and a third chromatogram of the second targeting protein by a terahertz time-domain spectroscopy technology;
a third obtaining module, configured to obtain corresponding first functional group data from the first color spectrum, obtain corresponding second functional group data from the second color spectrum, and obtain corresponding third functional group data from the third color spectrum;
a comparison module, configured to compare the third functional group data with the first functional group data to obtain fourth functional group data in which the third functional group data is reduced and fifth functional group data in which the third functional group is increased;
an input module, configured to input the second functional group data into a generative model, and input the fifth functional group data, the first functional group data, the third functional group data, and the fourth functional group data into a discriminant model; wherein, the generation model and the discrimination model are formed by synchronously training different functional group data and corresponding medicine components;
and the correcting module is used for correcting the result output by the generating model according to the output result of the judging model to obtain the medicine component output by the generating model.
Further, the drug component detection system based on data analysis further comprises:
the training sample set acquisition module is used for acquiring a functional group training sample set; wherein a set of data in the functional group training sample set comprises drug componentsFirst functional group training dataSecond functional group training dataThird functional group training dataFourth functional group training dataAnd fifth functional group training data;
A training data input module for inputting the first functional group training dataInputting the data into an initial generation model to obtain the best predicted valueThe pharmaceutical composition is preparedInputting into the initial generation model by formulaPerforming initial training on the initial generation model and obtaining a trained temporary predicted valueAnd an intermediate generation model, and generating the model,
and training the second functional group with dataThe third functional group training dataThe fourth functional group training dataAnd said fifth functional group training dataCarrying out vector splicing to obtain comprehensive training dataCombining the training dataInputting into an initial discrimination model by formulaCarrying out initial training on the initial generation model to obtain an intermediate discrimination model; wherein the content of the first and second substances,,,a set of parameters representing the generative model,a set of parameters representing a discriminant model;
a secondary training module for generating a formulaPerforming secondary training on the intermediate generation model and the intermediate discrimination model to obtain the intermediate discrimination model after the training is finishedThe generation model and the discrimination model; whereinExpression is taken on the premise that the formula is satisfiedMinimum value of andis measured.
Further, the drug component detection system based on data analysis further comprises:
the detection sample set acquisition module is used for acquiring a functional group detection sample set and real medicine components; wherein a set of data in the functional group detection sample set comprises a drug component, first functional group detection data, second functional group detection data, third functional group detection data, fourth functional group detection data, and fifth functional group detection data;
a detection data input module for detecting the first functional groupInputting the data into the generated model, performing vector splicing on the second functional group detection data, the third functional group detection data, the fourth functional group detection data and the fifth functional group detection data to obtain comprehensive detection data, inputting the comprehensive detection data into the discrimination model, and correcting the generated model to obtain a predicted medicine component;
a comprehensive loss value calculation module for obtaining a comprehensive loss value of the generation model and the intermediate discrimination model according to the predicted drug component and the real drug component;
the comprehensive loss value judging module is used for judging whether the comprehensive loss value is smaller than a preset loss value or not;
and the judging module is used for judging that the generated model and the judging model obtained after the training is finished meet the training requirement if the judgment is positive.
Further, the drug component detection system based on data analysis further comprises:
the functional group data input module is used for inputting the second functional group data into a preset drug component analysis model to obtain a plurality of target drug components; the drug component analysis model is trained according to various drug components and corresponding functional groups;
and the medicine component input module is used for inputting each target medicine component into the generative model and respectively used as an output channel of the generative model.
Further, the drug component detection system based on data analysis further comprises:
the weighted average module is used for carrying out weighted average on the fourth functional group data and the fifth functional group data to obtain sixth functional group data;
a root number obtaining module, configured to obtain a first functional group number according to the sixth functional group data;
calculating the binding score of the targeted drug by dividing the number of the first functional groups by the corresponding second functional group data in the second functional group data;
and the curative effect judging module is used for judging the curative effect of the targeted drug according to the binding score.
The invention has the beneficial effects that: by acquiring the related data of the targeted protein and the targeted drug, generating the result by a generation model and correcting by adopting a discrimination network, the chromatographic analysis of the targeted drug is realized, and the comprehensive analysis is performed from the data after the chromatographic analysis of the targeted drug reacts with the targeted protein, so that the accuracy of the result analysis is improved, and the result is more real and credible.
Drawings
FIG. 1 is a schematic flow chart of a method for detecting a pharmaceutical composition based on data analysis according to an embodiment of the present invention;
fig. 2 is a block diagram schematically illustrating the structure of a drug component detection system based on data analysis according to an embodiment of the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that all directional indicators (such as up, down, left, right, front, back, etc.) in the embodiments of the present invention are only used to explain the relative position relationship between the components, the motion situation, etc. in a specific posture (as shown in the drawings), and if the specific posture is changed, the directional indicator is changed accordingly, and the connection may be a direct connection or an indirect connection.
The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and B, may mean: a exists alone, A and B exist simultaneously, and B exists alone.
In addition, the descriptions related to "first", "second", etc. in the present invention are only for descriptive purposes and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In addition, technical solutions between various embodiments may be combined with each other, but must be realized by a person skilled in the art, and when the technical solutions are contradictory or cannot be realized, such a combination should not be considered to exist, and is not within the protection scope of the present invention.
Referring to fig. 1, the present invention provides a method for detecting a pharmaceutical composition based on data analysis, comprising:
s1: obtaining a first target protein, a target drug to be detected and a second target protein after the target drug acts on the first target protein;
s2: obtaining a first chromatogram of the first targeting protein, a second chromatogram of the targeting drug and a third chromatogram of the second targeting protein by a terahertz time-domain spectroscopy technology;
s3: acquiring corresponding first functional group data from the first chromatogram, acquiring corresponding second functional group data from the second chromatogram, and acquiring corresponding third functional group data from the third chromatogram;
s4: comparing the third functional group data with the first functional group data to obtain fourth functional group data with reduced third functional group data and fifth functional group data with increased third functional group;
s5: inputting the second functional group data into a generative model, and inputting the fifth functional group data, the first functional group data, the third functional group data and the fourth functional group data into a discriminant model; wherein, the generation model and the discrimination model are formed by synchronously training different functional group data and corresponding medicine components;
s6: and correcting the result output by the generated model according to the output result of the discrimination model to obtain the medicine component output by the generated model.
As described in step S1, a first target protein, a target drug to be detected, and a second target protein after the target drug acts on the first target protein are obtained, where the first target protein, the target drug, and the second target protein can be understood as specific substances and can be obtained directly from a laboratory.
As described in step S2 above, the first chromatogram of the first target protein, the second chromatogram of the target drug, and the third chromatogram of the second target protein are obtained by a terahertz time-domain spectroscopy. The terahertz time-domain spectroscopy technology can detect physical and chemical information of a material in a terahertz wave band, so that the obtained chromatogram has more accurate information compared with a common chromatogram, the chromatogram contains signals of various chemical bonds, and the content of the chemical bonds can be judged according to the strength of the signals, wherein the wide-spectrum terahertz time-domain spectroscopy technology can be a terahertz time-domain spectrometer THz-TDS.
As described in step S3, the first, second and third functional group data are obtained from the first, second and third color spectra, wherein the first, second and third functional group data each include the kind and number of functional groups, and the number is a relative number because it is not certain how much the amount of the detection substance is, for example, the minimum number of functional groups may be recorded as 1, and the remaining number of functional groups may be obtained according to the ratio in the chromatogram.
Comparing the third functional group data with the first functional group data to obtain the fourth functional group data with reduced third functional group data and the fifth functional group data with increased third functional group data as described in the above step S4. The targeted drug treatment effect is better because the targeted drug and the targeted protein are combined in a more chemical combination mode, and the chemical combination brings about the generation of new functional groups and the reduction of old functional groups, namely, the third functional group data and the fourth functional group data reflect the main functional group information of the reaction, and the main functional group information is extracted as a factor so as to be convenient for better analysis of drug components.
As described in the above steps S5-S6, the second functional group data is input into the generation model, and the fifth functional group data, the first functional group data, the third functional group data and the fourth functional group data are input into the discrimination model, and the generation model takes charge of generating a result, but the result is not necessarily accurate, so that the generation model is corrected by using the discrimination network, that is, the generation model generates a final result mainly from the second functional group data of the target drug, and the discrimination network inputs the fifth functional group data, the first functional group data, the third functional group data and the fourth functional group data, and corrects the result of the generation model, wherein the correction is performed by verifying the output result of the generation model through the discrimination network, and if the verification fails, the result is fed back to the generation model to change parameters thereof, and regenerating the output result until the output result is verified by the discrimination model, wherein in addition, the specific training mode of the model is described in detail later, and the detailed description is omitted here. Therefore, the target drug is not only subjected to chromatographic analysis, but also subjected to comprehensive analysis from the data after the target drug reacts with the target protein, the accuracy of result analysis is improved, and the result is more real and credible.
In one embodiment, before the step S5 of inputting the second functional group data into a generative model and inputting the fifth functional group data, the first functional group data, the third functional group data, and the fourth functional group data into a discriminant model, the method further comprises:
s401: acquiring a functional group training sample set; wherein a set of data in the functional group training sample set comprises drug componentsFirst functional group training dataSecond functional group training dataThird functional group training dataFourth functional group training dataAnd fifth functional group training data;
S402: training the first functional group to dataInputting the data into an initial generation model to obtain the best predicted valueThe pharmaceutical composition is preparedInputting into the initial generation model by formulaPerforming initial training on the initial generation model and obtaining a trained temporary predicted valueAnd an intermediate generative model;
and training the second functional group with dataThe third functional group training dataThe fourth functional group training dataAnd said fifth functional group training dataCarrying out vector splicing to obtain comprehensive training dataCombining the training dataInput deviceTo the initial discrimination model by formulaCarrying out initial training on the initial generation model to obtain an intermediate discrimination model; wherein the content of the first and second substances,,,a set of parameters representing the generative model,a set of parameters representing a discriminant model;
s403: according to the formulaCarrying out secondary training on the intermediate generation model and the intermediate discrimination model, and obtaining the generation model and the discrimination model after the training is finished; whereinExpression is taken on the premise that the formula is satisfiedMinimum value of andis measured.
As described in step S401, a functional group training sample set is obtained, wherein the training sample set is also related to the target drug and target protein data, and the data to be calculated has more drug components, so as to correct the result. It is noted that the above-mentioned pharmaceutical compositionFirst functional group training dataSecond functional group training dataThird functional group training dataFourth functional group training dataAnd fifth functional group training dataAll vectors are formed according to corresponding data, taking first functional group training data as an example, the first functional group training data comprises the number of each functional group and the type of the functional group, the digital representation corresponding to each functional group can be established in advance, the number of the digital representation is attached to the functional group, and then the corresponding vectors are obtained by splicing, namely the vectors areThe pharmaceutical composition may be data corresponding to the composition, and thus the relationship between the pharmaceutical composition and each functional group, the vector formed by the two and the kind of the original parameter may be different.
As described in the above steps S402-S403, for each sample, the first functional group training data contained thereinInputting the parameters into an initial generation model, wherein the initial generation model has random parameter sets, and the parameter sets are pre-constructed parameter sets, so that the result can be normally output for training through a formulaAnd (3) training, wherein the training mode is to update by adopting a random gradient descent method, namely, after the training of the current sample is finished, the training of the next sample is carried out, and the parameter set is updated after each training is finished, so that the training of the initial generation model is finished. By formula in the same wayTraining the intermediate discrimination model, updating the parameter set after each training, and completing the training of the initial generation model, wherein the updating mode can be a random gradient descent method, and specifically, the updating mode can be a formulaAnd performing synthesis, and performing secondary training on the initial generation model and the discrimination model, wherein it needs to be noted that each sample needs to be trained by the three formulas, that is, in the training process of a group of samples, the samples need to be updated twice. Finally obtaining the parameter set of the intermediate generation modelAnd intermediate discrimination model parameter setIn order to make the discrimination effect of the model better and the obtained drug components more accurate, the parameter set of the intermediate generation model should be generated as much as possibleTaking the minimum value, and distinguishing the middle discrimination model parameter setTaking the maximum value.
In an embodiment, after the step S403 of obtaining the generated model and the discriminant model after the training is completed, the method further includes:
s4031: acquiring a functional group detection sample set and real medicine components; wherein a set of data in the functional group detection sample set comprises a drug component, first functional group detection data, second functional group detection data, third functional group detection data, fourth functional group detection data, and fifth functional group detection data;
s4032: detecting the first functional groupInputting the data into the generated model, performing vector splicing on the second functional group detection data, the third functional group detection data, the fourth functional group detection data and the fifth functional group detection data to obtain comprehensive detection data, inputting the comprehensive detection data into the discrimination model, and correcting the generated model to obtain a predicted medicine component;
s4033: obtaining the comprehensive loss value of the generation model and the intermediate discrimination model according to the predicted medicine component and the real medicine component;
s4034: judging whether the comprehensive loss value is smaller than a preset loss value or not;
s4035: if yes, judging that the generated model and the discrimination model obtained after training meet the training requirements.
As described in the foregoing steps S4031-S4035, training detection on the production model and the discriminant model is implemented, that is, a functional group detection sample set and an actual drug component are obtained, where the functional group detection sample set and the actual drug component may be obtained from the functional group training sample set, or may be additional data, which is not limited in this application, and in order to avoid an error caused by a result, it is preferable to obtain additional data as a detection sample set, input the additional data into the production model and the discriminant model, so as to obtain a predicted drug component, where a combined loss value of the production model and the intermediate discriminant model may be obtained according to the predicted drug component and the actual drug component, and a manner of calculating a loss value may be a manner of obtaining a combined loss value of the production model and the intermediate discriminant modelWherein, in the step (A),representing the true value in the ith test datum,denotes a predicted value obtained from the ith detected data, n denotes the number of the detected data,a value of a parameter that is preset is indicated,represents a preset weight value corresponding to the ith detection data,representing the integrated loss value. If the comprehensive loss value is smaller than a preset loss value, the generated model and the judgment model meet the training requirement, and if the comprehensive loss value is not smaller than the preset loss value, the training is required to be continued until the training requirement is met.
In one embodiment, before the step S5 of inputting the second functional group data into a generative model and inputting the fifth functional group data, the first functional group data, the third functional group data, and the fourth functional group data into a discriminant model, the method further comprises:
s411: inputting the second functional group data into a preset drug component analysis model to obtain a plurality of target drug components; the drug component analysis model is trained according to various drug components and corresponding functional groups;
s412: inputting each target drug component into the generative model to be used as an output channel of the generative model respectively.
As described in the foregoing steps S411 to S412, the setting of the output channel of the generated model is realized, that is, some drug components that may be contained may be obtained according to the second functional group data, but further determination is required, and two drug components that are completely impossible to contain may be omitted, so that the calculation amount of the generated model may be reduced, the calculation of the output probability of each output channel is avoided, and only the probability of the set output channel needs to be calculated, so that the efficiency of generating the model is improved, and the calculation amount thereof is reduced.
In one embodiment, after the step S4 of comparing the third functional group data with the first functional group data to obtain the fourth functional group data with reduced third functional group data and the fifth functional group data with increased third functional group data, the method further comprises:
s501: carrying out weighted average on the fourth functional group data and the fifth functional group data to obtain sixth functional group data;
s502: acquiring the number of first functional groups according to the data of the sixth functional group;
s503: dividing the number of the first functional groups by corresponding second functional group data in the second functional group data to obtain a binding score of the targeted drug;
s504: and judging the curative effect of the targeted drug according to the binding score.
As described in the above steps S501 to S504, the prediction of the therapeutic effect of the targeted drug is realized, wherein the sixth functional group data may represent the binding site of the targeted drug and the targeted protein, and if the acting target is the site mainly causing diseases of the targeted protein, the drug may be considered to have a certain therapeutic effect. The efficacy score is thus calculated from the number of binding sites. And judging whether the targeted drug is firmly combined with the targeted protein or not to obtain the binding score of the targeted protein. It should be understood that the more binding sites or the more formed chemical bonds, the more chemical reactions occur, the more secure the binding is, and conversely, the less secure the binding is, so the curative effect of the targeted drug can be determined according to the binding score, wherein the determination mode can be directly embodied according to the binding score, or the binding score can be converted according to a preset conversion method, so that the curative effect of the targeted drug can be obtained.
Referring to fig. 2, a system for detecting a pharmaceutical composition based on data analysis includes:
the first acquisition module 10 is used for acquiring a first target protein, a target drug to be detected and a second target protein after the target drug acts on the first target protein;
a second obtaining module 20, configured to obtain a first color spectrum of the first target protein, a second color spectrum of the target drug, and a third color spectrum of the second target protein through a terahertz time-domain spectroscopy;
a third obtaining module 30, configured to obtain corresponding first functional group data from the first color spectrum, obtain corresponding second functional group data from the second color spectrum, and obtain corresponding third functional group data from the third color spectrum;
a comparing module 40, configured to compare the third functional group data with the first functional group data to obtain fourth functional group data with reduced third functional group data and fifth functional group data with increased third functional group data;
an input module 50, configured to input the second functional group data into a generative model, and input the fifth functional group data, the first functional group data, the third functional group data, and the fourth functional group data into a discriminant model; wherein, the generation model and the discrimination model are formed by synchronously training different functional group data and corresponding medicine components;
and the correcting module 60 is configured to correct the result output by the generated model according to the output result of the discriminant model, so as to obtain the drug component output by the generated model.
In one embodiment, the data analysis-based drug component detection system further comprises:
the training sample set acquisition module is used for acquiring a functional group training sample set; wherein a set of data in the functional group training sample set comprises drug componentsFirst functional group training dataSecond functional group training dataThird functional group training dataFourth functional group training dataAnd fifth functional group training data;
A training data input module for inputting the first functional group training dataInputting the data into an initial generation model to obtain the best predicted valueThe pharmaceutical composition is preparedInputting into the initial generation model by formulaPerforming initial training on the initial generation model and obtaining a trained temporary predicted valueAnd an intermediate generation model, and generating the model,
and training the second functional group with dataThe third functional group training dataThe fourth functional group training dataAnd said fifth functional group training dataCarrying out vector splicing to obtain comprehensive training dataCombining the training dataInputting into an initial discrimination model by formulaCarrying out initial training on the initial generation model to obtain an intermediate discrimination model; wherein the content of the first and second substances,,,a set of parameters representing the generative model,a set of parameters representing a discriminant model;
a secondary training module for generating a formulaCarrying out secondary training on the intermediate generation model and the intermediate discrimination model, and obtaining the generation model and the discrimination model after the training is finished; whereinExpression is taken on the premise that the formula is satisfiedMinimum value of andis measured.
In one embodiment, the data analysis-based drug component detection system further comprises:
the detection sample set acquisition module is used for acquiring a functional group detection sample set and real medicine components; wherein a set of data in the functional group detection sample set comprises a drug component, first functional group detection data, second functional group detection data, third functional group detection data, fourth functional group detection data, and fifth functional group detection data;
a detection data input module for detecting the first functional groupInputting the data into the generated model, performing vector splicing on the second functional group detection data, the third functional group detection data, the fourth functional group detection data and the fifth functional group detection data to obtain comprehensive detection data, inputting the comprehensive detection data into the discrimination model, and correcting the generated model to obtain a predicted medicine component;
a comprehensive loss value calculation module for obtaining a comprehensive loss value of the generation model and the intermediate discrimination model according to the predicted drug component and the real drug component;
the comprehensive loss value judging module is used for judging whether the comprehensive loss value is smaller than a preset loss value or not;
and the judging module is used for judging that the generated model and the judging model obtained after the training is finished meet the training requirement if the judgment is positive.
In one embodiment, the data analysis-based drug component detection system further comprises:
the functional group data input module is used for inputting the second functional group data into a preset drug component analysis model to obtain a plurality of target drug components; the drug component analysis model is trained according to various drug components and corresponding functional groups;
and the medicine component input module is used for inputting each target medicine component into the generative model and respectively used as an output channel of the generative model.
In one embodiment, the data analysis-based drug component detection system further comprises:
the weighted average module is used for carrying out weighted average on the fourth functional group data and the fifth functional group data to obtain sixth functional group data;
a root number obtaining module, configured to obtain a first functional group number according to the sixth functional group data;
calculating the binding score of the targeted drug by dividing the number of the first functional groups by the corresponding second functional group data in the second functional group data;
and the curative effect judging module is used for judging the curative effect of the targeted drug according to the binding score.
The invention has the beneficial effects that: by acquiring the related data of the targeted protein and the targeted drug, generating the result by a generation model and correcting by adopting a discrimination network, the chromatographic analysis of the targeted drug is realized, and the comprehensive analysis is performed from the data after the chromatographic analysis of the targeted drug reacts with the targeted protein, so that the accuracy of the result analysis is improved, and the result is more real and credible.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the claims of the present invention.
Claims (10)
1. A method for detecting a drug component based on data analysis, comprising:
obtaining a first target protein, a target drug to be detected and a second target protein after the target drug acts on the first target protein;
obtaining a first chromatogram of the first targeting protein, a second chromatogram of the targeting drug and a third chromatogram of the second targeting protein by a terahertz time-domain spectroscopy technology;
acquiring corresponding first functional group data from the first chromatogram, acquiring corresponding second functional group data from the second chromatogram, and acquiring corresponding third functional group data from the third chromatogram;
comparing the third functional group data with the first functional group data to obtain fourth functional group data with reduced third functional group data and fifth functional group data with increased third functional group;
inputting the second functional group data into a generative model, and inputting the fifth functional group data, the first functional group data, the third functional group data and the fourth functional group data into a discriminant model; wherein, the generation model and the discrimination model are formed by synchronously training different functional group data and corresponding medicine components;
and correcting the result output by the generated model according to the output result of the discrimination model to obtain the medicine component output by the generated model.
2. The data analysis-based method for drug component detection according to claim 1, wherein the step of inputting the second functional group data into a generative model and the fifth functional group data, the first functional group data, the third functional group data, and the fourth functional group data into a discriminant model further comprises:
acquiring a functional group training sample set; wherein a set of data in the functional group training sample set comprises drug componentsFirst functional group training dataSecond functional group training dataThird functional group training dataFourth functional group training dataAnd fifth functional group training data;
Training the first functional group to dataInputting the data into an initial generation model to obtain the best predicted valueThe pharmaceutical composition is preparedInputting into the initial generation model by formulaPerforming initial training on the initial generation model and obtaining a trained temporary predicted valueAnd intermediate generationThe model is a model of a human body,
and training the second functional group with dataThe third functional group training dataThe fourth functional group training dataAnd said fifth functional group training dataCarrying out vector splicing to obtain comprehensive training dataCombining the training dataInputting into an initial discrimination model by formulaCarrying out initial training on the initial generation model to obtain an intermediate discrimination model; wherein the content of the first and second substances,,,a set of parameters representing the generative model,presentation judgmentA set of parameters for the other model;
according to the formulaCarrying out secondary training on the intermediate generation model and the intermediate discrimination model, and obtaining the generation model and the discrimination model after the training is finished; whereinExpression is taken on the premise that the formula is satisfiedMinimum value of andis measured.
3. The method for drug component detection based on data analysis of claim 2, wherein after the step of obtaining the generative model and the discriminant model after the training, further comprising:
acquiring a functional group detection sample set and real medicine components; wherein a set of data in the functional group detection sample set comprises a drug component, first functional group detection data, second functional group detection data, third functional group detection data, fourth functional group detection data, and fifth functional group detection data;
detecting the first functional groupInputting the data into the generated model, performing vector splicing on the second functional group detection data, the third functional group detection data, the fourth functional group detection data and the fifth functional group detection data to obtain comprehensive detection data, inputting the comprehensive detection data into the discrimination model, and correcting the generated model to obtain a predicted medicine component;
obtaining the comprehensive loss value of the generation model and the intermediate discrimination model according to the predicted medicine component and the real medicine component;
judging whether the comprehensive loss value is smaller than a preset loss value or not;
if yes, judging that the generated model and the discrimination model obtained after training meet the training requirements.
4. The data analysis-based method for drug component detection according to claim 1, wherein the step of inputting the second functional group data into a generative model and the fifth functional group data, the first functional group data, the third functional group data, and the fourth functional group data into a discriminant model further comprises:
inputting the second functional group data into a preset drug component analysis model to obtain a plurality of target drug components; the drug component analysis model is trained according to various drug components and corresponding functional groups;
inputting each target drug component into the generative model to be used as an output channel of the generative model respectively.
5. The data analysis-based method for testing a pharmaceutical composition according to claim 1, wherein the step of comparing the third functional group data with the first functional group data to obtain a fourth functional group data with a reduced third functional group data and a fifth functional group data with an increased third functional group data further comprises:
carrying out weighted average on the fourth functional group data and the fifth functional group data to obtain sixth functional group data;
acquiring the number of first functional groups according to the data of the sixth functional group;
dividing the number of the first functional groups by corresponding second functional group data in the second functional group data to obtain a binding score of the targeted drug;
and judging the curative effect of the targeted drug according to the binding score.
6. A system for detecting a pharmaceutical composition based on data analysis, comprising:
the first acquisition module is used for acquiring a first target protein, a target drug to be detected and a second target protein after the target drug acts on the first target protein;
the second acquisition module is used for acquiring a first chromatogram of the first targeting protein, a second chromatogram of the targeting drug and a third chromatogram of the second targeting protein by a terahertz time-domain spectroscopy technology;
a third obtaining module, configured to obtain corresponding first functional group data from the first color spectrum, obtain corresponding second functional group data from the second color spectrum, and obtain corresponding third functional group data from the third color spectrum;
a comparison module, configured to compare the third functional group data with the first functional group data to obtain fourth functional group data in which the third functional group data is reduced and fifth functional group data in which the third functional group is increased;
an input module, configured to input the second functional group data into a generative model, and input the fifth functional group data, the first functional group data, the third functional group data, and the fourth functional group data into a discriminant model; wherein, the generation model and the discrimination model are formed by synchronously training different functional group data and corresponding medicine components;
and the correcting module is used for correcting the result output by the generating model according to the output result of the judging model to obtain the medicine component output by the generating model.
7. The data analysis-based drug component detection system of claim 6, further comprising:
training sample set acquisition module for acquiring officerTraining a sample set by an energy cluster; wherein a set of data in the functional group training sample set comprises drug componentsFirst functional group training dataSecond functional group training dataThird functional group training dataFourth functional group training dataAnd fifth functional group training data;
A training data input module for inputting the first functional group training dataInputting the data into an initial generation model to obtain the best predicted valueThe pharmaceutical composition is preparedInputting into the initial generation model by formulaPerforming initial training on the initial generation model and obtaining a trained temporary predicted valueAnd an intermediate generation model, and generating the model,
and training the second functional group with dataThe third functional group training dataThe fourth functional group training dataAnd said fifth functional group training dataCarrying out vector splicing to obtain comprehensive training dataCombining the training dataInputting into an initial discrimination model by formulaCarrying out initial training on the initial generation model to obtain an intermediate discrimination model; wherein the content of the first and second substances,,,a set of parameters representing the generative model,a set of parameters representing a discriminant model;
a secondary training module for generating a formulaCarrying out secondary training on the intermediate generation model and the intermediate discrimination model, and obtaining the generation model and the discrimination model after the training is finished; whereinExpression is taken on the premise that the formula is satisfiedMinimum value of andis measured.
8. The data analysis-based drug component detection system of claim 7, further comprising:
the detection sample set acquisition module is used for acquiring a functional group detection sample set and real medicine components; wherein a set of data in the functional group detection sample set comprises a drug component, first functional group detection data, second functional group detection data, third functional group detection data, fourth functional group detection data, and fifth functional group detection data;
a detection data input module for detecting the first functional groupInputting into the generative model, and detecting the second functional group, the third functional group, the fourth functional group, and the fifth functional groupVector splicing is carried out on the cluster detection data to obtain comprehensive detection data, the comprehensive detection data is input into the judgment model, and the generated model is corrected to obtain a predicted medicine component;
a comprehensive loss value calculation module for obtaining a comprehensive loss value of the generation model and the intermediate discrimination model according to the predicted drug component and the real drug component;
the comprehensive loss value judging module is used for judging whether the comprehensive loss value is smaller than a preset loss value or not;
and the judging module is used for judging that the generated model and the judging model obtained after the training is finished meet the training requirement if the judgment is positive.
9. The data analysis-based drug component detection system of claim 6, further comprising:
the functional group data input module is used for inputting the second functional group data into a preset drug component analysis model to obtain a plurality of target drug components; the drug component analysis model is trained according to various drug components and corresponding functional groups;
and the medicine component input module is used for inputting each target medicine component into the generative model and respectively used as an output channel of the generative model.
10. The data analysis-based drug component detection method of claim 6, wherein the data analysis-based drug component detection system further comprises:
the weighted average module is used for carrying out weighted average on the fourth functional group data and the fifth functional group data to obtain sixth functional group data;
a root number obtaining module, configured to obtain a first functional group number according to the sixth functional group data;
calculating the binding score of the targeted drug by dividing the number of the first functional groups by the corresponding second functional group data in the second functional group data;
and the curative effect judging module is used for judging the curative effect of the targeted drug according to the binding score.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210077693.9A CN114112980B (en) | 2022-01-24 | 2022-01-24 | Medicine component detection method and system based on data analysis |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210077693.9A CN114112980B (en) | 2022-01-24 | 2022-01-24 | Medicine component detection method and system based on data analysis |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114112980A true CN114112980A (en) | 2022-03-01 |
CN114112980B CN114112980B (en) | 2022-05-10 |
Family
ID=80361076
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210077693.9A Active CN114112980B (en) | 2022-01-24 | 2022-01-24 | Medicine component detection method and system based on data analysis |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114112980B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115825316A (en) * | 2023-02-15 | 2023-03-21 | 武汉宏韧生物医药股份有限公司 | Method and device for analyzing active ingredients of medicine based on supercritical chromatography |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5885786A (en) * | 1996-04-19 | 1999-03-23 | John Wayne Cancer Institute | Methods for screening of substances for inhibition of multidrug resistance |
US20020019019A1 (en) * | 1999-06-18 | 2002-02-14 | Markku Hamalainen | Method and apparatus for assaying a drug candidate to estimate a pharmacokinetic parameter associated therewith |
US20030044843A1 (en) * | 2001-01-09 | 2003-03-06 | Mitsubishi Pharma Corporation | Novel proteome analysis method and devices therefor |
US20050042771A1 (en) * | 2003-01-16 | 2005-02-24 | Hubert Koster | Capture compounds, collections thereof and methods for analyzing the proteome and complex compositions |
US20050148100A1 (en) * | 2003-12-30 | 2005-07-07 | Intel Corporation | Methods and devices for using Raman-active probe constructs to assay biological samples |
CN101253408A (en) * | 2005-07-01 | 2008-08-27 | 惠氏公司 | Methods of determining pharmacokinetics of targeted therapies |
US20090221436A1 (en) * | 2000-11-17 | 2009-09-03 | Slanetz Alfred E | Process for determining target function and identifying drug leads |
US20190285650A1 (en) * | 2016-11-21 | 2019-09-19 | RUHR-UNIVERSITäT BOCHUM | Method for the preselection of drugs for protein misfolding diseases |
US20200080980A1 (en) * | 2017-05-22 | 2020-03-12 | Valisure Llc | Methods for validating medication |
CN113450870A (en) * | 2021-06-11 | 2021-09-28 | 北京大学 | Method and system for matching drug with target protein |
-
2022
- 2022-01-24 CN CN202210077693.9A patent/CN114112980B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5885786A (en) * | 1996-04-19 | 1999-03-23 | John Wayne Cancer Institute | Methods for screening of substances for inhibition of multidrug resistance |
US20020019019A1 (en) * | 1999-06-18 | 2002-02-14 | Markku Hamalainen | Method and apparatus for assaying a drug candidate to estimate a pharmacokinetic parameter associated therewith |
US20090221436A1 (en) * | 2000-11-17 | 2009-09-03 | Slanetz Alfred E | Process for determining target function and identifying drug leads |
US20030044843A1 (en) * | 2001-01-09 | 2003-03-06 | Mitsubishi Pharma Corporation | Novel proteome analysis method and devices therefor |
US20050042771A1 (en) * | 2003-01-16 | 2005-02-24 | Hubert Koster | Capture compounds, collections thereof and methods for analyzing the proteome and complex compositions |
US20050148100A1 (en) * | 2003-12-30 | 2005-07-07 | Intel Corporation | Methods and devices for using Raman-active probe constructs to assay biological samples |
CN101253408A (en) * | 2005-07-01 | 2008-08-27 | 惠氏公司 | Methods of determining pharmacokinetics of targeted therapies |
US20190285650A1 (en) * | 2016-11-21 | 2019-09-19 | RUHR-UNIVERSITäT BOCHUM | Method for the preselection of drugs for protein misfolding diseases |
US20200080980A1 (en) * | 2017-05-22 | 2020-03-12 | Valisure Llc | Methods for validating medication |
CN113450870A (en) * | 2021-06-11 | 2021-09-28 | 北京大学 | Method and system for matching drug with target protein |
Non-Patent Citations (2)
Title |
---|
张活: "基于太赫兹时域光谱技术的中药检测方法研究", 《中国优秀博硕士学位论文全文数据库(博士) 信息科技辑》 * |
曾银珠 等: "高效液相色谱-质谱联用技术在药物分析中的应用", 《生物化工》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115825316A (en) * | 2023-02-15 | 2023-03-21 | 武汉宏韧生物医药股份有限公司 | Method and device for analyzing active ingredients of medicine based on supercritical chromatography |
Also Published As
Publication number | Publication date |
---|---|
CN114112980B (en) | 2022-05-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Hood et al. | Revolutionizing medicine in the 21st century through systems approaches | |
Zheng et al. | Prediction and diagnosis of renal cell carcinoma using nuclear magnetic resonance-based serum metabolomics and self-organizing maps | |
Nassar et al. | Precision medicine: steps along the road to combat human cancer | |
CN114112980B (en) | Medicine component detection method and system based on data analysis | |
CN111710372B (en) | Exhaled air detection device and method for establishing exhaled air marker thereof | |
KR101461615B1 (en) | Apparatus for diagnosis cancer | |
CN109791158A (en) | The method that more attributes for complex sample monitor | |
CN106778072B (en) | For the process bearing calibration of second generation Oncogenome high-flux sequence data | |
Chen et al. | Next-generation sequencing technologies for personalized medicine: promising but challenging | |
Baumgartner et al. | A novel network-based approach for discovering dynamic metabolic biomarkers in cardiovascular disease | |
CN108931590B (en) | Correction method for multi-batch targeted metabonomics data | |
WO2011123837A2 (en) | Method and system using computer simulation for the quantitative analysis of glycan biosynthesis | |
Garrido-Martín et al. | A fast non-parametric test of association for multiple traits | |
CN108872423A (en) | Glucolactone and pyroglutamic acid are as macrosomia's auxiliary diagnosis marker and its application | |
CN114220480A (en) | Method and system for analyzing medicine components | |
CN108872424A (en) | Dodecanoic acid and prostaglandin E2 combination are used as macrosomia's auxiliary diagnosis marker and its application | |
CN111972353B (en) | Method for constructing group pharmacokinetic model of compound salvia miltiorrhiza dropping pill multi-component in rat body | |
US10937525B2 (en) | System that generates pharmacokinetic analyses of oligonucleotide total effects from full-scan mass spectra | |
CN114705766A (en) | Large-scale omics data correction method and system based on IS combined SVR | |
CN114740135A (en) | Biomarker suitable for early discovery, early prediction or early diagnosis of severe chronic obstructive pulmonary disease, and application and screening method thereof | |
CN114295766A (en) | Metabonomics data processing method and device based on stable isotope labeling | |
CN114005529A (en) | Recognition method of ncRNA with protein coding potential | |
Olson et al. | Calculation of the isotope cluster for polypeptides by probability grouping | |
Lutz et al. | Multiparametric statistical quantification of pH heterogeneity by 1H MRS and MRSI of extracellular pH markers: Proof of principle | |
EP2133808A1 (en) | Model system for diagnosing lipid metabolism |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |