CN114112980A

CN114112980A - Medicine component detection method and system based on data analysis

Info

Publication number: CN114112980A
Application number: CN202210077693.9A
Authority: CN
Inventors: 张杨; 陈桂英; 庄炜平; 姜宏梁
Original assignee: Wuhan Hongren Biomedical Co ltd
Current assignee: Wuhan Hongren Biomedical Co ltd
Priority date: 2022-01-24
Filing date: 2022-01-24
Publication date: 2022-03-01
Anticipated expiration: 2042-01-24
Also published as: CN114112980B

Abstract

The invention provides a method and a system for detecting medicine components based on data analysis, wherein the method comprises the following steps: the method comprises the steps of obtaining a first target protein, a target drug to be detected and a second target protein after the target drug acts on the first target protein, extracting functional group data of the target protein through a terahertz time-domain spectroscopy technology, inputting the data into a generation model to generate a result through obtaining related data of the target protein and the target drug, and correcting the result through a discrimination network. The invention has the beneficial effects that: the target drug is not only subjected to chromatographic analysis, but also subjected to comprehensive analysis from data obtained after reaction of the target drug and the target protein, so that the accuracy of result analysis is improved, and the result is more real and credible.

Description

Medicine component detection method and system based on data analysis

Technical Field

The invention relates to the field of digital medical treatment, in particular to a method and a system for detecting medicine components based on data analysis.

Background

The pharmaceutical analysis is an important branch in analytical chemistry, gradually becomes a relatively independent discipline in analytical chemistry along with the development of pharmaceutical chemistry, and has wide application in the aspects of quality control of medicines, new medicine research, medicine metabolism, chiral medicine analysis and the like.

At present, the analysis of drugs only comprises chemical or physical analysis of drugs, the traditional chemical analysis method is complex and has high analysis cost, while the physical analysis method mainly comprises detection through chromatogram and nuclear magnetic resonance, however, the analysis method cannot well analyze drug components and has large errors.

Disclosure of Invention

The invention mainly aims to provide a method, a device, equipment and a storage medium for detecting medicine components based on data analysis, and aims to solve the problem that the existing physical analysis method has larger errors.

The invention provides a medicine component detection method based on data analysis, which comprises the following steps:

obtaining a first target protein, a target drug to be detected and a second target protein after the target drug acts on the first target protein;

obtaining a first chromatogram of the first targeting protein, a second chromatogram of the targeting drug and a third chromatogram of the second targeting protein by a terahertz time-domain spectroscopy technology;

acquiring corresponding first functional group data from the first chromatogram, acquiring corresponding second functional group data from the second chromatogram, and acquiring corresponding third functional group data from the third chromatogram;

comparing the third functional group data with the first functional group data to obtain fourth functional group data with reduced third functional group data and fifth functional group data with increased third functional group;

inputting the second functional group data into a generative model, and inputting the fifth functional group data, the first functional group data, the third functional group data and the fourth functional group data into a discriminant model; wherein, the generation model and the discrimination model are formed by synchronously training different functional group data and corresponding medicine components;

and correcting the result output by the generated model according to the output result of the discrimination model to obtain the medicine component output by the generated model.

Further, before the step of inputting the second functional group data into a generative model and inputting the fifth functional group data, the first functional group data, the third functional group data, and the fourth functional group data into a discriminant model, the method further includes:

acquiring a functional group training sample set; wherein a set of data in the functional group training sample set comprises drug components

First functional group training data

Second functional group training data

Third functional group training data

Fourth functional group training data

And fifth functional group training data

；

Training the first functional group to data

Inputting the data into an initial generation model to obtain the best predicted value

The pharmaceutical composition is prepared

Inputting into the initial generation model by formula

Performing initial training on the initial generation model and obtaining a trained temporary predicted value

And an intermediate generation model, and generating the model,

and training the second functional group with data

The third functional group training data

The fourth functional group training data

And said fifth functional group training data

Carrying out vector splicing to obtain comprehensive training data

Combining the training data

Inputting into an initial discrimination model by formula

Carrying out initial training on the initial generation model to obtain an intermediate discrimination model; wherein the content of the first and second substances,

，

，

a set of parameters representing the generative model,

a set of parameters representing a discriminant model;

according to the formula

Carrying out secondary training on the intermediate generation model and the intermediate discrimination model, and obtaining the generation model and the discrimination model after the training is finished; wherein

Expression is taken on the premise that the formula is satisfied

Minimum value of and

is measured.

Further, after the step of obtaining the generated model and the discriminant model after the training is completed, the method further includes:

acquiring a functional group detection sample set and real medicine components; wherein a set of data in the functional group detection sample set comprises a drug component, first functional group detection data, second functional group detection data, third functional group detection data, fourth functional group detection data, and fifth functional group detection data;

detecting the first functional group

Inputting into the generative model, and detecting the second functional group data, theCarrying out vector splicing on the third functional group detection data, the fourth functional group detection data and the fifth functional group detection data to obtain comprehensive detection data, inputting the comprehensive detection data into the judgment model, and correcting the generated model to obtain a predicted medicine component;

obtaining the comprehensive loss value of the generation model and the intermediate discrimination model according to the predicted medicine component and the real medicine component;

judging whether the comprehensive loss value is smaller than a preset loss value or not;

if yes, judging that the generated model and the discrimination model obtained after training meet the training requirements.

inputting the second functional group data into a preset drug component analysis model to obtain a plurality of target drug components; the drug component analysis model is trained according to various drug components and corresponding functional groups;

inputting each target drug component into the generative model to be used as an output channel of the generative model respectively.

Further, after the step of comparing the third functional group data with the first functional group data to obtain the fourth functional group data with decreased third functional group data and the fifth functional group data with increased third functional group data, the method further comprises:

carrying out weighted average on the fourth functional group data and the fifth functional group data to obtain sixth functional group data;

acquiring the number of first functional groups according to the data of the sixth functional group;

dividing the number of the first functional groups by corresponding second functional group data in the second functional group data to obtain a binding score of the targeted drug;

and judging the curative effect of the targeted drug according to the binding score.

The invention also provides a drug component detection system based on data analysis, comprising:

the first acquisition module is used for acquiring a first target protein, a target drug to be detected and a second target protein after the target drug acts on the first target protein;

the second acquisition module is used for acquiring a first chromatogram of the first targeting protein, a second chromatogram of the targeting drug and a third chromatogram of the second targeting protein by a terahertz time-domain spectroscopy technology;

a third obtaining module, configured to obtain corresponding first functional group data from the first color spectrum, obtain corresponding second functional group data from the second color spectrum, and obtain corresponding third functional group data from the third color spectrum;

a comparison module, configured to compare the third functional group data with the first functional group data to obtain fourth functional group data in which the third functional group data is reduced and fifth functional group data in which the third functional group is increased;

an input module, configured to input the second functional group data into a generative model, and input the fifth functional group data, the first functional group data, the third functional group data, and the fourth functional group data into a discriminant model; wherein, the generation model and the discrimination model are formed by synchronously training different functional group data and corresponding medicine components;

and the correcting module is used for correcting the result output by the generating model according to the output result of the judging model to obtain the medicine component output by the generating model.

Further, the drug component detection system based on data analysis further comprises:

the training sample set acquisition module is used for acquiring a functional group training sample set; wherein a set of data in the functional group training sample set comprises drug components

First functional group training data

Second functional group training data

Third functional group training data

Fourth functional group training data

And fifth functional group training data

；

A training data input module for inputting the first functional group training data

The pharmaceutical composition is prepared

Inputting into the initial generation model by formula

And an intermediate generation model, and generating the model,

and training the second functional group with data

The third functional group training data

The fourth functional group training data

And said fifth functional group training data

Carrying out vector splicing to obtain comprehensive training data

Combining the training data

Inputting into an initial discrimination model by formula

，

，

a set of parameters representing the generative model,

a set of parameters representing a discriminant model;

a secondary training module for generating a formula

Performing secondary training on the intermediate generation model and the intermediate discrimination model to obtain the intermediate discrimination model after the training is finishedThe generation model and the discrimination model; wherein

Expression is taken on the premise that the formula is satisfied

Minimum value of and

is measured.

the detection sample set acquisition module is used for acquiring a functional group detection sample set and real medicine components; wherein a set of data in the functional group detection sample set comprises a drug component, first functional group detection data, second functional group detection data, third functional group detection data, fourth functional group detection data, and fifth functional group detection data;

a detection data input module for detecting the first functional group

Inputting the data into the generated model, performing vector splicing on the second functional group detection data, the third functional group detection data, the fourth functional group detection data and the fifth functional group detection data to obtain comprehensive detection data, inputting the comprehensive detection data into the discrimination model, and correcting the generated model to obtain a predicted medicine component;

a comprehensive loss value calculation module for obtaining a comprehensive loss value of the generation model and the intermediate discrimination model according to the predicted drug component and the real drug component;

the comprehensive loss value judging module is used for judging whether the comprehensive loss value is smaller than a preset loss value or not;

and the judging module is used for judging that the generated model and the judging model obtained after the training is finished meet the training requirement if the judgment is positive.

the functional group data input module is used for inputting the second functional group data into a preset drug component analysis model to obtain a plurality of target drug components; the drug component analysis model is trained according to various drug components and corresponding functional groups;

and the medicine component input module is used for inputting each target medicine component into the generative model and respectively used as an output channel of the generative model.

the weighted average module is used for carrying out weighted average on the fourth functional group data and the fifth functional group data to obtain sixth functional group data;

a root number obtaining module, configured to obtain a first functional group number according to the sixth functional group data;

calculating the binding score of the targeted drug by dividing the number of the first functional groups by the corresponding second functional group data in the second functional group data;

and the curative effect judging module is used for judging the curative effect of the targeted drug according to the binding score.

The invention has the beneficial effects that: by acquiring the related data of the targeted protein and the targeted drug, generating the result by a generation model and correcting by adopting a discrimination network, the chromatographic analysis of the targeted drug is realized, and the comprehensive analysis is performed from the data after the chromatographic analysis of the targeted drug reacts with the targeted protein, so that the accuracy of the result analysis is improved, and the result is more real and credible.

Drawings

FIG. 1 is a schematic flow chart of a method for detecting a pharmaceutical composition based on data analysis according to an embodiment of the present invention;

fig. 2 is a block diagram schematically illustrating the structure of a drug component detection system based on data analysis according to an embodiment of the present invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that all directional indicators (such as up, down, left, right, front, back, etc.) in the embodiments of the present invention are only used to explain the relative position relationship between the components, the motion situation, etc. in a specific posture (as shown in the drawings), and if the specific posture is changed, the directional indicator is changed accordingly, and the connection may be a direct connection or an indirect connection.

The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and B, may mean: a exists alone, A and B exist simultaneously, and B exists alone.

In addition, the descriptions related to "first", "second", etc. in the present invention are only for descriptive purposes and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In addition, technical solutions between various embodiments may be combined with each other, but must be realized by a person skilled in the art, and when the technical solutions are contradictory or cannot be realized, such a combination should not be considered to exist, and is not within the protection scope of the present invention.

Referring to fig. 1, the present invention provides a method for detecting a pharmaceutical composition based on data analysis, comprising:

s1: obtaining a first target protein, a target drug to be detected and a second target protein after the target drug acts on the first target protein;

s2: obtaining a first chromatogram of the first targeting protein, a second chromatogram of the targeting drug and a third chromatogram of the second targeting protein by a terahertz time-domain spectroscopy technology;

s3: acquiring corresponding first functional group data from the first chromatogram, acquiring corresponding second functional group data from the second chromatogram, and acquiring corresponding third functional group data from the third chromatogram;

s4: comparing the third functional group data with the first functional group data to obtain fourth functional group data with reduced third functional group data and fifth functional group data with increased third functional group;

s5: inputting the second functional group data into a generative model, and inputting the fifth functional group data, the first functional group data, the third functional group data and the fourth functional group data into a discriminant model; wherein, the generation model and the discrimination model are formed by synchronously training different functional group data and corresponding medicine components;

s6: and correcting the result output by the generated model according to the output result of the discrimination model to obtain the medicine component output by the generated model.

As described in step S1, a first target protein, a target drug to be detected, and a second target protein after the target drug acts on the first target protein are obtained, where the first target protein, the target drug, and the second target protein can be understood as specific substances and can be obtained directly from a laboratory.

As described in step S2 above, the first chromatogram of the first target protein, the second chromatogram of the target drug, and the third chromatogram of the second target protein are obtained by a terahertz time-domain spectroscopy. The terahertz time-domain spectroscopy technology can detect physical and chemical information of a material in a terahertz wave band, so that the obtained chromatogram has more accurate information compared with a common chromatogram, the chromatogram contains signals of various chemical bonds, and the content of the chemical bonds can be judged according to the strength of the signals, wherein the wide-spectrum terahertz time-domain spectroscopy technology can be a terahertz time-domain spectrometer THz-TDS.

As described in step S3, the first, second and third functional group data are obtained from the first, second and third color spectra, wherein the first, second and third functional group data each include the kind and number of functional groups, and the number is a relative number because it is not certain how much the amount of the detection substance is, for example, the minimum number of functional groups may be recorded as 1, and the remaining number of functional groups may be obtained according to the ratio in the chromatogram.

Comparing the third functional group data with the first functional group data to obtain the fourth functional group data with reduced third functional group data and the fifth functional group data with increased third functional group data as described in the above step S4. The targeted drug treatment effect is better because the targeted drug and the targeted protein are combined in a more chemical combination mode, and the chemical combination brings about the generation of new functional groups and the reduction of old functional groups, namely, the third functional group data and the fourth functional group data reflect the main functional group information of the reaction, and the main functional group information is extracted as a factor so as to be convenient for better analysis of drug components.

As described in the above steps S5-S6, the second functional group data is input into the generation model, and the fifth functional group data, the first functional group data, the third functional group data and the fourth functional group data are input into the discrimination model, and the generation model takes charge of generating a result, but the result is not necessarily accurate, so that the generation model is corrected by using the discrimination network, that is, the generation model generates a final result mainly from the second functional group data of the target drug, and the discrimination network inputs the fifth functional group data, the first functional group data, the third functional group data and the fourth functional group data, and corrects the result of the generation model, wherein the correction is performed by verifying the output result of the generation model through the discrimination network, and if the verification fails, the result is fed back to the generation model to change parameters thereof, and regenerating the output result until the output result is verified by the discrimination model, wherein in addition, the specific training mode of the model is described in detail later, and the detailed description is omitted here. Therefore, the target drug is not only subjected to chromatographic analysis, but also subjected to comprehensive analysis from the data after the target drug reacts with the target protein, the accuracy of result analysis is improved, and the result is more real and credible.

In one embodiment, before the step S5 of inputting the second functional group data into a generative model and inputting the fifth functional group data, the first functional group data, the third functional group data, and the fourth functional group data into a discriminant model, the method further comprises:

s401: acquiring a functional group training sample set; wherein a set of data in the functional group training sample set comprises drug components

First functional group training data

Second functional group training data

Third functional group training data

Fourth functional group training data

And fifth functional group training data

；

S402: training the first functional group to data

The pharmaceutical composition is prepared

Inputting into the initial generation model by formula

And an intermediate generative model;

and training the second functional group with data

The third functional group training data

The fourth functional group training data

And said fifth functional group training data

Carrying out vector splicing to obtain comprehensive training data

Combining the training data

Input deviceTo the initial discrimination model by formula

，

，

a set of parameters representing the generative model,

a set of parameters representing a discriminant model;

s403: according to the formula

Expression is taken on the premise that the formula is satisfied

Minimum value of and

is measured.

As described in step S401, a functional group training sample set is obtained, wherein the training sample set is also related to the target drug and target protein data, and the data to be calculated has more drug components, so as to correct the result. It is noted that the above-mentioned pharmaceutical composition

First functional group training data

Second functional group training data

Third functional group training data

Fourth functional group training data

And fifth functional group training data

All vectors are formed according to corresponding data, taking first functional group training data as an example, the first functional group training data comprises the number of each functional group and the type of the functional group, the digital representation corresponding to each functional group can be established in advance, the number of the digital representation is attached to the functional group, and then the corresponding vectors are obtained by splicing, namely the vectors are

The pharmaceutical composition may be data corresponding to the composition, and thus the relationship between the pharmaceutical composition and each functional group, the vector formed by the two and the kind of the original parameter may be different.

As described in the above steps S402-S403, for each sample, the first functional group training data contained therein

Inputting the parameters into an initial generation model, wherein the initial generation model has random parameter sets, and the parameter sets are pre-constructed parameter sets, so that the result can be normally output for training through a formula

And (3) training, wherein the training mode is to update by adopting a random gradient descent method, namely, after the training of the current sample is finished, the training of the next sample is carried out, and the parameter set is updated after each training is finished, so that the training of the initial generation model is finished. By formula in the same way

Training the intermediate discrimination model, updating the parameter set after each training, and completing the training of the initial generation model, wherein the updating mode can be a random gradient descent method, and specifically, the updating mode can be a formula

And performing synthesis, and performing secondary training on the initial generation model and the discrimination model, wherein it needs to be noted that each sample needs to be trained by the three formulas, that is, in the training process of a group of samples, the samples need to be updated twice. Finally obtaining the parameter set of the intermediate generation model

And intermediate discrimination model parameter set

In order to make the discrimination effect of the model better and the obtained drug components more accurate, the parameter set of the intermediate generation model should be generated as much as possible

Taking the minimum value, and distinguishing the middle discrimination model parameter set

Taking the maximum value.

In an embodiment, after the step S403 of obtaining the generated model and the discriminant model after the training is completed, the method further includes:

s4031: acquiring a functional group detection sample set and real medicine components; wherein a set of data in the functional group detection sample set comprises a drug component, first functional group detection data, second functional group detection data, third functional group detection data, fourth functional group detection data, and fifth functional group detection data;

s4032: detecting the first functional group

s4033: obtaining the comprehensive loss value of the generation model and the intermediate discrimination model according to the predicted medicine component and the real medicine component;

s4034: judging whether the comprehensive loss value is smaller than a preset loss value or not;

s4035: if yes, judging that the generated model and the discrimination model obtained after training meet the training requirements.

As described in the foregoing steps S4031-S4035, training detection on the production model and the discriminant model is implemented, that is, a functional group detection sample set and an actual drug component are obtained, where the functional group detection sample set and the actual drug component may be obtained from the functional group training sample set, or may be additional data, which is not limited in this application, and in order to avoid an error caused by a result, it is preferable to obtain additional data as a detection sample set, input the additional data into the production model and the discriminant model, so as to obtain a predicted drug component, where a combined loss value of the production model and the intermediate discriminant model may be obtained according to the predicted drug component and the actual drug component, and a manner of calculating a loss value may be a manner of obtaining a combined loss value of the production model and the intermediate discriminant model

Wherein, in the step (A),

representing the true value in the ith test datum,

denotes a predicted value obtained from the ith detected data, n denotes the number of the detected data,

a value of a parameter that is preset is indicated,

represents a preset weight value corresponding to the ith detection data,

representing the integrated loss value. If the comprehensive loss value is smaller than a preset loss value, the generated model and the judgment model meet the training requirement, and if the comprehensive loss value is not smaller than the preset loss value, the training is required to be continued until the training requirement is met.

s411: inputting the second functional group data into a preset drug component analysis model to obtain a plurality of target drug components; the drug component analysis model is trained according to various drug components and corresponding functional groups;

s412: inputting each target drug component into the generative model to be used as an output channel of the generative model respectively.

As described in the foregoing steps S411 to S412, the setting of the output channel of the generated model is realized, that is, some drug components that may be contained may be obtained according to the second functional group data, but further determination is required, and two drug components that are completely impossible to contain may be omitted, so that the calculation amount of the generated model may be reduced, the calculation of the output probability of each output channel is avoided, and only the probability of the set output channel needs to be calculated, so that the efficiency of generating the model is improved, and the calculation amount thereof is reduced.

In one embodiment, after the step S4 of comparing the third functional group data with the first functional group data to obtain the fourth functional group data with reduced third functional group data and the fifth functional group data with increased third functional group data, the method further comprises:

s501: carrying out weighted average on the fourth functional group data and the fifth functional group data to obtain sixth functional group data;

s502: acquiring the number of first functional groups according to the data of the sixth functional group;

s503: dividing the number of the first functional groups by corresponding second functional group data in the second functional group data to obtain a binding score of the targeted drug;

s504: and judging the curative effect of the targeted drug according to the binding score.

As described in the above steps S501 to S504, the prediction of the therapeutic effect of the targeted drug is realized, wherein the sixth functional group data may represent the binding site of the targeted drug and the targeted protein, and if the acting target is the site mainly causing diseases of the targeted protein, the drug may be considered to have a certain therapeutic effect. The efficacy score is thus calculated from the number of binding sites. And judging whether the targeted drug is firmly combined with the targeted protein or not to obtain the binding score of the targeted protein. It should be understood that the more binding sites or the more formed chemical bonds, the more chemical reactions occur, the more secure the binding is, and conversely, the less secure the binding is, so the curative effect of the targeted drug can be determined according to the binding score, wherein the determination mode can be directly embodied according to the binding score, or the binding score can be converted according to a preset conversion method, so that the curative effect of the targeted drug can be obtained.

Referring to fig. 2, a system for detecting a pharmaceutical composition based on data analysis includes:

the first acquisition module 10 is used for acquiring a first target protein, a target drug to be detected and a second target protein after the target drug acts on the first target protein;

a second obtaining module 20, configured to obtain a first color spectrum of the first target protein, a second color spectrum of the target drug, and a third color spectrum of the second target protein through a terahertz time-domain spectroscopy;

a third obtaining module 30, configured to obtain corresponding first functional group data from the first color spectrum, obtain corresponding second functional group data from the second color spectrum, and obtain corresponding third functional group data from the third color spectrum;

a comparing module 40, configured to compare the third functional group data with the first functional group data to obtain fourth functional group data with reduced third functional group data and fifth functional group data with increased third functional group data;

an input module 50, configured to input the second functional group data into a generative model, and input the fifth functional group data, the first functional group data, the third functional group data, and the fourth functional group data into a discriminant model; wherein, the generation model and the discrimination model are formed by synchronously training different functional group data and corresponding medicine components;

and the correcting module 60 is configured to correct the result output by the generated model according to the output result of the discriminant model, so as to obtain the drug component output by the generated model.

In one embodiment, the data analysis-based drug component detection system further comprises:

First functional group training data

Second functional group training data

Third functional group training data

Fourth functional group training data

And fifth functional group training data

；

The pharmaceutical composition is prepared

Inputting into the initial generation model by formula

And an intermediate generation model, and generating the model,

and training the second functional group with data

The third functional group training data

The fourth functional group training data

And said fifth functional group training data

Carrying out vector splicing to obtain comprehensive training data

Combining the training data

Inputting into an initial discrimination model by formula

，

，

a set of parameters representing the generative model,

a set of parameters representing a discriminant model;

a secondary training module for generating a formula

Expression is taken on the premise that the formula is satisfied

Minimum value of and

is measured.

a detection data input module for detecting the first functional group

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the claims of the present invention.

Claims

1. A method for detecting a drug component based on data analysis, comprising:

2. The data analysis-based method for drug component detection according to claim 1, wherein the step of inputting the second functional group data into a generative model and the fifth functional group data, the first functional group data, the third functional group data, and the fourth functional group data into a discriminant model further comprises:

First functional group training data

Second functional group training data

Third functional group training data

Fourth functional group training data

And fifth functional group training data

；

Training the first functional group to data

The pharmaceutical composition is prepared

Inputting into the initial generation model by formula

And intermediate generationThe model is a model of a human body,

and training the second functional group with data

The third functional group training data

The fourth functional group training data

And said fifth functional group training data

Carrying out vector splicing to obtain comprehensive training data

Combining the training data

Inputting into an initial discrimination model by formula

，

，

a set of parameters representing the generative model,

presentation judgmentA set of parameters for the other model;

according to the formula

Expression is taken on the premise that the formula is satisfied

Minimum value of and

is measured.

3. The method for drug component detection based on data analysis of claim 2, wherein after the step of obtaining the generative model and the discriminant model after the training, further comprising:

detecting the first functional group

4. The data analysis-based method for drug component detection according to claim 1, wherein the step of inputting the second functional group data into a generative model and the fifth functional group data, the first functional group data, the third functional group data, and the fourth functional group data into a discriminant model further comprises:

5. The data analysis-based method for testing a pharmaceutical composition according to claim 1, wherein the step of comparing the third functional group data with the first functional group data to obtain a fourth functional group data with a reduced third functional group data and a fifth functional group data with an increased third functional group data further comprises:

6. A system for detecting a pharmaceutical composition based on data analysis, comprising:

7. The data analysis-based drug component detection system of claim 6, further comprising:

training sample set acquisition module for acquiring officerTraining a sample set by an energy cluster; wherein a set of data in the functional group training sample set comprises drug components

First functional group training data

Second functional group training data

Third functional group training data

Fourth functional group training data

And fifth functional group training data

；

The pharmaceutical composition is prepared

Inputting into the initial generation model by formula

And an intermediate generation model, and generating the model,

and training the second functional group with data

The third functional group training data

The fourth functional group training data

And said fifth functional group training data

Carrying out vector splicing to obtain comprehensive training data

Combining the training data

Inputting into an initial discrimination model by formula

，

，

a set of parameters representing the generative model,

a set of parameters representing a discriminant model;

a secondary training module for generating a formula

Expression is taken on the premise that the formula is satisfied

Minimum value of and

is measured.

8. The data analysis-based drug component detection system of claim 7, further comprising:

a detection data input module for detecting the first functional group

Inputting into the generative model, and detecting the second functional group, the third functional group, the fourth functional group, and the fifth functional groupVector splicing is carried out on the cluster detection data to obtain comprehensive detection data, the comprehensive detection data is input into the judgment model, and the generated model is corrected to obtain a predicted medicine component;

9. The data analysis-based drug component detection system of claim 6, further comprising:

10. The data analysis-based drug component detection method of claim 6, wherein the data analysis-based drug component detection system further comprises: