CN110322972B - Intelligent drug toxicity judgment method and device and computer readable storage medium - Google Patents

Intelligent drug toxicity judgment method and device and computer readable storage medium Download PDF

Info

Publication number
CN110322972B
CN110322972B CN201910467872.1A CN201910467872A CN110322972B CN 110322972 B CN110322972 B CN 110322972B CN 201910467872 A CN201910467872 A CN 201910467872A CN 110322972 B CN110322972 B CN 110322972B
Authority
CN
China
Prior art keywords
drug
molecular structure
value
coding
input
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910467872.1A
Other languages
Chinese (zh)
Other versions
CN110322972A (en
Inventor
王健宗
彭俊清
瞿晓阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201910467872.1A priority Critical patent/CN110322972B/en
Publication of CN110322972A publication Critical patent/CN110322972A/en
Application granted granted Critical
Publication of CN110322972B publication Critical patent/CN110322972B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/30Prediction of properties of chemical compounds, compositions or mixtures
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/50Molecular design, e.g. of drugs
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/70Machine learning, data mining or chemometrics
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H70/00ICT specially adapted for the handling or processing of medical references
    • G16H70/40ICT specially adapted for the handling or processing of medical references relating to drugs, e.g. their side effects or intended usage

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Chemical & Material Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Medicinal Chemistry (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Molecular Biology (AREA)
  • Medical Informatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Biomedical Technology (AREA)
  • Epidemiology (AREA)
  • Toxicology (AREA)
  • Primary Health Care (AREA)
  • Public Health (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Medical Treatment And Welfare Office Work (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

The invention relates to an artificial intelligence technology, and discloses an intelligent drug toxicity judgment method, which comprises the following steps: receiving a medicine data set and a label set comprising a molecular structure sequence, and coding the medicine data set of the molecular structure sequence based on a Huffman coding technology to obtain a medicine coding set; inputting the drug coding set into an LSTM model, inputting the label set into a loss function, training the LSTM model to obtain a training value and inputting the training value into the loss function, calculating a loss value by the loss function, judging the size of the loss value and a preset threshold value, and quitting training until the loss value is smaller than the preset threshold value; and receiving and coding a drug molecular structure sequence input by a user, inputting the coded drug molecular structure sequence into the drug injury judgment model, and outputting the toxicity judgment of the drug molecular structure sequence input by the user. The invention also provides an intelligent medicine toxicity judgment device and a computer readable storage medium. The invention can realize high-efficiency drug toxicity judgment.

Description

Intelligent drug toxicity judgment method and device and computer readable storage medium
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to an intelligent drug toxicity judgment method and device based on pig liver drugs and a computer readable storage medium.
Background
In recent years, since the incidence of diseases in animals is high and tends to increase, large-scale vaccination and medication of swine herds are often required in pig farms for health management of animals, such as swine herds. Drugs are substances used for the prevention, treatment and diagnosis of diseases, and are also specific in nature. However, the types of diseases related to pig livers are many, and whether the medicines are beneficial or not is unknown, so that how to select the medicines for treatment is a big problem, and therefore, the medicine damage detection is necessary for the pig herds. At present, a pig farm mainly adopts manual medicine trial on pigs and observes swinery reaction to detect medicine damage, however, manual treatment is too time-consuming and labor-consuming, and only the stopping reaction of the swinery can be observed, and whether the medicine influences the physiology of the swinery cannot be judged.
Disclosure of Invention
The invention provides an intelligent drug toxicity judgment method, an intelligent drug toxicity judgment device and a computer readable storage medium, and mainly aims to help a user to present an accurate drug toxicity judgment result when the user inputs a certain drug.
In order to achieve the above object, the present invention provides an intelligent method for determining drug toxicity, comprising:
the method comprises the following steps that a data processing layer receives a medicine data set and a label set which comprise a molecular structure sequence, codes the medicine data set of the molecular structure sequence based on a Huffman coding technology to obtain a medicine coding set, and inputs the medicine coding set and the label set into a medicine damage judgment model;
the drug damage judgment model inputs the drug coding set into an LSTM model, inputs the label set into a loss function, the LSTM model obtains a training value based on the drug coding set training and inputs the training value into the loss function, the loss function calculates a loss value according to the label set and the training value, the loss value and a preset threshold value are judged, and the training is quitted until the loss value is smaller than the preset threshold value;
and receiving and coding a drug molecular structure sequence input by a user, inputting the coded drug molecular structure sequence into the drug injury judgment model, and outputting the toxicity judgment of the drug molecular structure sequence input by the user.
Optionally, the encoding the drug data set of the molecular structure sequence based on huffman coding technique to obtain a drug encoding set, includes:
sequentially reading the molecular structure sequence in the drug data set and randomly selecting the moleculesSelecting a central sequence omega of the structural sequence, and calculating an accumulated summation value X by selecting 2c molecular structural sequences before and after the central sequence omegaω
Based on the accumulated sum value XωCarrying out node classification judgment to obtain a Huffman binary tree;
and carrying out Huffman coding based on the binary Huffman tree to obtain the medicine coding set.
Optionally, said calculating a cumulative sum value XωComprises the following steps:
Figure BDA0002077102490000021
wherein, V (ω)i) Is a vector representation of the molecular structure sequence;
the node classification judgment sigma is as follows:
Figure BDA0002077102490000022
wherein the content of the first and second substances,
Figure BDA0002077102490000023
representing said accumulated sum value XωE is an infinite acyclic decimal.
Optionally, the LSTM model includes an input gate, a forgetting gate, and an output gate, where the drug coding set is input to the input gate, and after a memory unit in the input gate is activated, the memory unit sequentially reads codes of the drug coding set, and activates the codes based on an activation function and inputs the codes to the forgetting gate;
the forgetting gate receives codes sequentially input by the memory unit, calculates the codes based on a forgetting method, and inputs the codes to the output gate to obtain a training value, wherein the forgetting method comprises the following steps:
ft=δ(wt[ht-1,xt]+bt)
wherein f istOutput data for said forgetting gate, xtIs the inputT is the current time of the forgetting gate receiving the input code, t-1 is the previous time of the current time, ht-1For output data of said output gate at a time preceding said current time, wtIs the weight of the current time, btIs an offset of the current time]For matrix multiplication operations, δ represents the sigmoid function.
Optionally, the loss value ξ is:
Figure BDA0002077102490000031
wherein n is the number of drug encoding sets,
Figure BDA0002077102490000032
is the training value, yiIs the label set.
In addition, to achieve the above object, the present invention further provides an intelligent drug toxicity judging apparatus, which includes a memory and a processor, wherein the memory stores an intelligent drug toxicity judging program operable on the processor, and the intelligent drug toxicity judging program, when executed by the processor, implements the following steps:
the method comprises the following steps that a data processing layer receives a medicine data set and a label set which comprise a molecular structure sequence, codes the medicine data set of the molecular structure sequence based on a Huffman coding technology to obtain a medicine coding set, and inputs the medicine coding set and the label set into a medicine damage judgment model;
the drug damage judgment model inputs the drug coding set into an LSTM model, inputs the label set into a loss function, the LSTM model obtains a training value based on the drug coding set training and inputs the training value into the loss function, the loss function calculates a loss value according to the label set and the training value, the loss value and a preset threshold value are judged, and the training is quitted until the loss value is smaller than the preset threshold value;
and receiving and coding a drug molecular structure sequence input by a user, inputting the coded drug molecular structure sequence into the drug injury judgment model, and outputting the toxicity judgment of the drug molecular structure sequence input by the user.
Optionally, the encoding the drug data set of the molecular structure sequence based on huffman coding technique to obtain a drug encoding set, includes:
sequentially reading the molecular structure sequences in the drug data set, randomly selecting a central sequence omega of the molecular structure sequences, and selecting 2c molecular structure sequences before and after the central sequence omega to calculate an accumulated sum value Xω
Based on the accumulated sum value XωCarrying out node classification judgment to obtain a Huffman binary tree;
and carrying out Huffman coding based on the binary Huffman tree to obtain the medicine coding set.
Optionally, said calculating a cumulative sum value XωComprises the following steps:
Figure BDA0002077102490000033
wherein, V (ω)i) Is a vector representation of the molecular structure sequence;
the node classification judgment sigma is as follows:
Figure BDA0002077102490000041
wherein the content of the first and second substances,
Figure BDA0002077102490000042
representing said accumulated sum value XωE is an infinite acyclic decimal.
Optionally, the LSTM model is trained to derive training values based on the drug encoding set, including:
the LSTM model comprises an input gate, a forgetting gate and an output gate, the medicine coding set is input into the input gate, after a memory unit in the input gate is activated, the memory unit sequentially reads the codes of the medicine coding set, activates the codes based on an activation function and inputs the codes into the forgetting gate;
the forgetting gate receives codes sequentially input by the memory unit, calculates the codes based on a forgetting method, and inputs the codes to the output gate to obtain a training value, wherein the forgetting method comprises the following steps:
ft=δ(wt[ht-1,xt]+bt)
wherein f istOutput data for said forgetting gate, xtFor the input code, t is the current time of the forgetting gate receiving the input code, t-1 is the previous time of the current time, ht-1For output data of said output gate at a time preceding said current time, wtIs the weight of the current time, btIs an offset of the current time]For matrix multiplication operations, δ represents the sigmoid function.
In addition, to achieve the above object, the present invention also provides a computer readable storage medium having an intelligent drug toxicity judging program stored thereon, the intelligent drug toxicity judging program being executable by one or more processors to implement the steps of the intelligent drug toxicity judging method as described above.
The LSTM model can judge the molecular structure sequence, information which meets the rule can be left, information which does not meet the rule can be forgotten, and the analysis capability of the molecular structure sequence is improved.
Drawings
Fig. 1 is a schematic flow chart of an intelligent method for determining drug toxicity according to an embodiment of the present invention;
fig. 2 is a schematic diagram of an internal structure of an intelligent device for determining drug toxicity according to an embodiment of the present invention;
fig. 3 is a schematic block diagram of an intelligent drug toxicity determination program in the intelligent drug toxicity determination apparatus according to an embodiment of the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The invention provides an intelligent drug toxicity judgment method. Fig. 1 is a schematic flow chart of an intelligent method for determining drug toxicity according to an embodiment of the present invention. The method may be performed by an apparatus, which may be implemented by software and/or hardware.
In this embodiment, the intelligent method for determining drug toxicity comprises:
s1, the data processing layer receives a drug data set and a label set which comprise molecular structure sequences, the drug data set of the molecular structure sequences is coded based on the CBOW model to obtain a drug coding set, and the drug coding set and the label set are input into a drug injury judgment model.
In a preferred embodiment of the invention, each drug in the drug data set is defined in the form of { component, dose } according to the national library of medicine clinical drug standardization nomenclature system, the component being a molecular structural sequence of the drug. According to each drug in the drug data set, a plurality of healthy pigs are selected to feed in sequence, the activity of glutamic-pyruvic transaminase in livers of the pigs is measured through a detector, whether the livers of the pigs are damaged is judged, and according to the judgment result, each drug is labeled (harmful or harmless) in sequence to form the label set.
In the preferred embodiment of the present invention, the molecular structure sequences in the drug data set are sequentially read, the central sequence ω of the molecular structure sequences is randomly selected, and 2c molecular structure sequences before and after the central sequence ω are selected to calculate the cumulative sum value Xω. And based on said accumulated sum value XωAnd carrying out node classification judgment to obtain a Huffman binary tree. Based on the Huffman binaryAnd carrying out Hoffman coding on the tree to obtain the drug coding set. Further, the huffman coding can use different arrangement rules of 01 codes to represent the molecular structure sequence according to the data communication knowledge.
The calculation of the cumulative sum X according to the preferred embodiment of the present inventionωComprises the following steps:
Figure BDA0002077102490000051
wherein, V (ω)i) And for the vector representation of the molecular structure sequence, the node classification judgment sigma is as follows:
Figure BDA0002077102490000061
wherein the content of the first and second substances,
Figure BDA0002077102490000062
representing said accumulated sum value XωE is an infinite acyclic decimal.
S2, the drug damage judgment model inputs the drug code set into an LSTM model, inputs the label set into a loss function, the LSTM model trains based on the drug code set to obtain a training value and inputs the training value into the loss function, the loss function calculates a loss value according to the label set and the training value, and judges the size of the loss value and a preset threshold value until the loss value is smaller than the preset threshold value, and then the training is quit.
The LSTM model in the preferred embodiment of the invention comprises an input gate, a forgetting gate and an output gate, wherein a drug coding set is input into the input gate, after a memory unit in the input gate is activated, the memory unit sequentially reads codes of the drug coding set, activates the codes based on an activation function and inputs the codes to the forgetting gate, and the activation function is a sigmoid function;
the forgetting gate receives codes sequentially input by the memory unit, calculates the codes based on a forgetting method, and inputs the codes to the output gate to obtain a training value, wherein the forgetting method comprises the following steps:
ft=δ(wt[ht-1,xt]+bt)
wherein, ftOutput data for said forgetting gate, xtFor the input code, t is the current time of the forgetting gate receiving the input code, t-1 is the previous time of the current time, ht-1For output data of said output gate at a time preceding said current time, wtIs the weight of the current time, btIs an offset of the current time]For matrix multiplication operations, δ represents the sigmoid function.
In the preferred embodiment of the present invention, the loss function calculates a loss value ξ according to the label set and the training value:
Figure BDA0002077102490000063
wherein n is the number of drug encoding sets,
Figure BDA0002077102490000064
is the training value, yiIs the label set.
In the preferred embodiment of the present invention, when the loss value is greater than the preset threshold, the LSTM model continues to be trained based on the drug encoding set, and continues to update the memory unit of the LSTM model based on the gradient descent algorithm and output a training value.
And S3, receiving and coding the drug molecular structure sequence input by the user, inputting the coded drug molecular structure sequence into the drug injury judgment model, and outputting the toxicity judgment of the drug molecular structure sequence input by the user.
The invention also provides an intelligent drug toxicity judgment device. Fig. 2 is a schematic diagram of an internal structure of an intelligent device for determining drug toxicity according to an embodiment of the present invention.
In the present embodiment, the intelligent drug toxicity determination apparatus 1 may be a PC (Personal Computer), a terminal device such as a smart phone, a tablet Computer, or a mobile Computer, or may be a server. The intelligent drug toxicity judgment device 1 at least comprises a memory 11, a processor 12, a communication bus 13 and a network interface 14.
The memory 11 includes at least one type of readable storage medium, which includes a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, and the like. The memory 11 may be an internal storage unit of the intelligent drug toxicity determination apparatus 1 in some embodiments, for example, a hard disk of the intelligent drug toxicity determination apparatus 1. The memory 11 may also be an external storage device of the intelligent drug toxicity judging apparatus 1 in other embodiments, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like provided on the intelligent drug toxicity judging apparatus 1. Further, the memory 11 may also include both an internal storage unit and an external storage device of the intelligent drug toxicity judging apparatus 1. The memory 11 may be used not only to store application software installed in the intelligent drug toxicity judgment apparatus 1 and various types of data, such as a code of the intelligent drug toxicity judgment program 01, but also to temporarily store data that has been output or will be output.
The processor 12 may be a Central Processing Unit (CPU), a controller, a microcontroller, a microprocessor or other data Processing chip in some embodiments, and is used for executing program codes stored in the memory 11 or Processing data, such as executing the intelligent drug toxicity determining program 01.
The communication bus 13 is used to realize connection communication between these components.
The network interface 14 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), typically used to establish a communication link between the apparatus 1 and other electronic devices.
Optionally, the apparatus 1 may further comprise a user interface, which may comprise a Display (Display), an input unit such as a Keyboard (Keyboard), and optionally a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display may also be referred to as a display screen or a display unit, where appropriate, for displaying information processed in the intelligent drug toxicity assessment apparatus 1 and for displaying a visual user interface.
Fig. 2 shows only the intelligent drug toxicity judgment device 1 having the components 11-14 and the intelligent drug toxicity judgment program 01, and those skilled in the art will appreciate that the structure shown in fig. 1 does not constitute a limitation of the intelligent drug toxicity judgment device 1, and may include fewer or more components than those shown, or a combination of certain components, or a different arrangement of components.
In the embodiment of the apparatus 1 shown in fig. 2, the memory 11 stores an intelligent drug toxicity judgment program 01; the processor 12 executes the intelligent drug toxicity judgment program 01 stored in the memory 11 to implement the following steps:
the method comprises the steps that a data processing layer receives a drug data set and a tag set which comprise molecular structure sequences, the drug data set of the molecular structure sequences is coded based on a CBOW model to obtain a drug coding set, and the drug coding set and the tag set are input into a drug injury judgment model.
In a preferred embodiment of the invention, each drug in the drug data set is defined in the form of { component, dose } according to the national library of medicine clinical drug standardization nomenclature system, the component being a molecular structural sequence of the drug. According to each drug in the drug data set, a plurality of healthy pigs are selected to feed in sequence, the activity of glutamic-pyruvic transaminase in livers of the pigs is measured through a detector, whether the livers of the pigs are damaged is judged, and according to the judgment result, each drug is labeled (harmful or harmless) in sequence to form the label set.
In a preferred embodiment of the invention, the drugs are read sequentiallySelecting a molecular structure sequence in a data set, randomly selecting a central sequence omega of the molecular structure sequence, and selecting 2c molecular structure sequences before and after the central sequence omega to calculate an accumulated summation value Xω. And based on said accumulated sum value XωAnd carrying out node classification judgment to obtain the Huffman binary tree. And performing Huffman coding based on the Huffman binary tree to obtain the medicine coding set. The Huffman coding can use different arrangement rules of 01 codes to represent the molecular structure sequence according to data communication knowledge.
The calculation of the cumulative sum value X according to the preferred embodiment of the present inventionωComprises the following steps:
Figure BDA0002077102490000081
wherein, V (ω)i) And for the vector representation of the molecular structure sequence, the node classification judgment sigma is as follows:
Figure BDA0002077102490000082
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0002077102490000083
representing said accumulated sum value XωE is an infinite acyclic decimal.
Inputting the drug code set into an LSTM model by the drug damage judgment model, inputting the label set into a loss function, training the LSTM model based on the drug code set to obtain a training value and inputting the training value into the loss function, calculating a loss value by the loss function according to the label set and the training value, and judging the size of the loss value and a preset threshold value until the loss value is smaller than the preset threshold value and quitting training.
The LSTM model in the preferred embodiment of the invention comprises an input gate, a forgetting gate and an output gate, wherein a drug coding set is input into the input gate, after a memory unit in the input gate is activated, the memory unit sequentially reads codes of the drug coding set, activates the codes based on an activation function and inputs the codes to the forgetting gate, and the activation function is a sigmoid function;
the forgetting gate receives codes sequentially input by the memory unit, calculates the codes based on a forgetting method, and inputs the codes to the output gate to obtain a training value, wherein the forgetting method comprises the following steps:
ft=δ(wt[ht-1,xt]+bt)
wherein, ftOutput data for said forgetting gate, xtFor the input code, t is the current time of the forgetting gate receiving the input code, t-1 is the previous time of the current time, ht-1For output data of said output gate at a time preceding said current time, wtIs the weight of the current time, btIs an offset of the current time]For matrix multiplication operations, δ represents the sigmoid function.
In the preferred embodiment of the present invention, the loss function calculates a loss value ξ according to the label set and the training value:
Figure BDA0002077102490000091
wherein n is the number of drug encoding sets,
Figure BDA0002077102490000092
is the training value, yiIs the label set.
In the preferred embodiment of the present invention, when the loss value is greater than the preset threshold, the LSTM model continues to be trained based on the drug encoding set, and continues to update the memory unit of the LSTM model based on the gradient descent algorithm and output a training value.
And step three, receiving a drug molecular structure sequence input by a user, encoding the drug molecular structure sequence, inputting the encoded drug molecular structure sequence into the drug injury judgment model, and outputting the toxicity judgment of the drug molecular structure sequence input by the user.
Alternatively, in other embodiments, the intelligent drug toxicity judging program may be further divided into one or more modules, and the one or more modules are stored in the memory 11 and executed by one or more processors (in this embodiment, the processor 12) to implement the present invention, where the module referred to in the present invention refers to a series of computer program instruction segments capable of performing a specific function for describing the execution process of the intelligent drug toxicity judging program in the intelligent drug toxicity judging apparatus.
For example, referring to fig. 3, a schematic diagram of program modules of an intelligent drug toxicity determination program in an embodiment of the intelligent drug toxicity determination apparatus of the present invention is shown, in this embodiment, the intelligent drug toxicity determination program may be divided into a data processing module 10, a model training module 20, and a drug toxicity output module 30, which exemplarily:
the data processing module 10 is configured to: the method comprises the steps of receiving a medicine data set and a label set which comprise a molecular structure sequence, coding the medicine data set of the molecular structure sequence based on a Huffman coding technology to obtain a medicine coding set, and inputting the medicine coding set and the label set into a model training module 20.
The model training module 20 is configured to: inputting the drug code set into an LSTM model, inputting the label set into a loss function, training the LSTM model based on the drug code set to obtain a training value and inputting the training value into the loss function, calculating a loss value by the loss function according to the label set and the training value, and judging the size of the loss value and a preset threshold value until the loss value is smaller than the preset threshold value and quitting training.
The drug toxicity output module 30 is configured to: and receiving a drug molecular structure sequence input by a user, encoding the drug molecular structure sequence, inputting the encoded drug molecular structure sequence into the model training module 20, and outputting toxicity judgment of the drug molecular structure sequence input by the user.
The functions or operation steps of the data processing module 10, the model training module 20, the drug toxicity output module 30 and other program modules implemented when executed are substantially the same as those of the above embodiments, and are not repeated herein.
Furthermore, an embodiment of the present invention further provides a computer-readable storage medium, where an intelligent drug toxicity judgment program is stored, where the intelligent drug toxicity judgment program is executable by one or more processors to implement the following operations:
the method comprises the following steps that a data processing layer receives a medicine data set and a label set which comprise a molecular structure sequence, codes the medicine data set of the molecular structure sequence based on a Huffman coding technology to obtain a medicine coding set, and inputs the medicine coding set and the label set into a medicine damage judgment model;
the drug damage judgment model inputs the drug coding set into an LSTM model, inputs the label set into a loss function, the LSTM model obtains a training value based on the drug coding set training and inputs the training value into the loss function, the loss function calculates a loss value according to the label set and the training value, the loss value and a preset threshold value are judged, and the training is quitted until the loss value is smaller than the preset threshold value;
and receiving and coding a drug molecular structure sequence input by a user, inputting the coded drug molecular structure sequence into the drug injury judgment model, and outputting the toxicity judgment of the drug molecular structure sequence input by the user.
The embodiment of the computer readable storage medium of the present invention is substantially the same as the embodiments of the intelligent drug toxicity determination apparatus and method, and will not be described herein again.
It should be noted that the above-mentioned numbers of the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments. And the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, apparatus, article or method that comprises the element.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention or portions thereof contributing to the prior art may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) as described above and includes several instructions for enabling a terminal device (which may be a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (6)

1. An intelligent drug toxicity judgment method is characterized by comprising the following steps:
the method comprises the following steps that a data processing layer receives a medicine data set and a label set which comprise a molecular structure sequence, the medicine data set of the molecular structure sequence is coded based on a Huffman coding technology to obtain a medicine coding set, and the medicine coding set and the label set are input into a medicine damage judgment model, wherein the medicine data set of the molecular structure sequence is coded based on the Huffman coding technology to obtain the medicine coding set, and the method comprises the following steps: sequentially reading the molecular structure sequences in the drug data set, randomly selecting a central sequence omega of the molecular structure sequences, and selecting 2c molecular structure sequences before and after the central sequence omega to calculate an accumulated sum value Xω(ii) a Based on the accumulated sum value XωCarrying out node classification judgment to obtain a Huffman binary tree; performing Huffman encoding based on the binary Huffman tree,obtaining the drug coding set;
the accumulated sum value XωComprises the following steps:
Figure FDA0003598941790000011
wherein, V (ω)i) Is a vector representation of the molecular structure sequence;
the node classification judgment sigma is as follows:
Figure FDA0003598941790000012
wherein the content of the first and second substances,
Figure FDA0003598941790000013
representing said accumulated sum value XωE is an infinite acyclic decimal;
the drug damage judgment model inputs the drug coding set into an LSTM model, inputs the label set into a loss function, the LSTM model obtains a training value based on the drug coding set training and inputs the training value into the loss function, the loss function calculates a loss value according to the label set and the training value, the loss value and a preset threshold value are judged, and the training is quitted until the loss value is smaller than the preset threshold value;
and receiving and coding a drug molecular structure sequence input by a user, inputting the coded drug molecular structure sequence into the drug injury judgment model, and outputting the toxicity judgment of the drug molecular structure sequence input by the user.
2. The intelligent drug toxicity assessment method according to claim 1, wherein the LSTM model comprises an input gate, a forgetting gate and an output gate, the drug code set is input to the input gate, and after activating a memory unit in the input gate, the memory unit sequentially reads the codes of the drug code set, activates the codes based on an activation function, and inputs the activated codes to the forgetting gate;
the forgetting gate receives codes sequentially input by the memory unit, calculates the codes based on a forgetting method, and inputs the codes to the output gate to obtain a training value, wherein the forgetting method comprises the following steps:
ft=δ(wt[ht-1,xt]+bt)
wherein, ftOutput data for said forgetting gate, xtFor the input code, t is the current time of the forgetting gate receiving the input code, t-1 is the previous time of the current time, ht-1For output data of said output gate at a time preceding said current time, wtIs the weight of the current time, btIs an offset of the current time]For matrix multiplication operations, δ represents a sigmoid function.
3. The intelligent drug toxicity assessment method of claim 1, wherein the loss value ξ is:
Figure FDA0003598941790000021
wherein n is the number of drug encoding sets,
Figure FDA0003598941790000022
is the training value, yiIs the label set.
4. An intelligent drug toxicity assessment apparatus, comprising a memory and a processor, wherein the memory stores an intelligent drug toxicity assessment program operable on the processor, and wherein the processor executes the intelligent drug toxicity assessment program to perform the following steps:
the data processing layer receives a medicine data set and a label set which comprise a molecular structure sequence, codes the medicine data set of the molecular structure sequence based on the Huffman coding technology to obtain a medicine coding set,inputting the drug coding set and the tag set into a drug damage judgment model, wherein the coding of the drug data set of the molecular structure sequence based on the huffman coding technology to obtain the drug coding set comprises: sequentially reading the molecular structure sequences in the drug data set, randomly selecting a central sequence omega of the molecular structure sequences, and selecting 2c molecular structure sequences before and after the central sequence omega to calculate an accumulated sum value Xω(ii) a Based on the accumulated sum value XωCarrying out node classification judgment to obtain a Huffman binary tree; performing Huffman coding based on the binary Huffman tree to obtain the medicine coding set;
the accumulated sum value XωComprises the following steps:
Figure FDA0003598941790000023
wherein, V (ω)i) Is a vector representation of the molecular structure sequence;
the node classification judgment sigma is as follows:
Figure FDA0003598941790000024
wherein the content of the first and second substances,
Figure FDA0003598941790000025
representing said accumulated sum value XωE is an infinite acyclic decimal;
the drug damage judgment model inputs the drug coding set into an LSTM model, inputs the label set into a loss function, the LSTM model obtains a training value based on the drug coding set training and inputs the training value into the loss function, the loss function calculates a loss value according to the label set and the training value, the loss value and a preset threshold value are judged, and the training is quitted until the loss value is smaller than the preset threshold value;
and receiving and coding a drug molecular structure sequence input by a user, inputting the coded drug molecular structure sequence into the drug injury judgment model, and outputting the toxicity judgment of the drug molecular structure sequence input by the user.
5. The intelligent drug toxicity judgment device according to claim 4, wherein the LSTM model comprises an input gate, a forgetting gate and an output gate, the drug code set is input to the input gate, and after a memory unit in the input gate is activated, the memory unit sequentially reads the codes of the drug code set and activates the codes based on an activation function and inputs the codes to the forgetting gate;
the forgetting gate receives codes sequentially input by the memory unit, calculates the codes based on a forgetting method, and inputs the codes to the output gate to obtain a training value, wherein the forgetting method comprises the following steps:
ft=δ(wt[ht-1,xt]+bt)
wherein f istOutput data for said forgetting gate, xtFor the input code, t is the current time of the forgetting gate receiving the input code, t-1 is the previous time of the current time, ht-1For output data of said output gate at a time preceding said current time, wtIs the weight of the current time, btIs an offset of the current time]For matrix multiplication operations, δ represents a sigmoid function.
6. A computer readable storage medium having an intelligent drug toxicity determination program stored thereon, the intelligent drug toxicity determination program being executable by one or more processors to implement the steps of the intelligent drug toxicity determination method according to any one of claims 1 to 3.
CN201910467872.1A 2019-05-29 2019-05-29 Intelligent drug toxicity judgment method and device and computer readable storage medium Active CN110322972B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910467872.1A CN110322972B (en) 2019-05-29 2019-05-29 Intelligent drug toxicity judgment method and device and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910467872.1A CN110322972B (en) 2019-05-29 2019-05-29 Intelligent drug toxicity judgment method and device and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN110322972A CN110322972A (en) 2019-10-11
CN110322972B true CN110322972B (en) 2022-05-20

Family

ID=68119250

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910467872.1A Active CN110322972B (en) 2019-05-29 2019-05-29 Intelligent drug toxicity judgment method and device and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN110322972B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105868540A (en) * 2016-03-25 2016-08-17 哈尔滨理工大学 A polycyclic aromatic hydrocarbon property/toxicity prediction method using an intelligent support vector machine
CN109033738A (en) * 2018-07-09 2018-12-18 湖南大学 A kind of pharmaceutical activity prediction technique based on deep learning
CN109658989A (en) * 2018-11-14 2019-04-19 国网新疆电力有限公司信息通信公司 Class drug compound toxicity prediction method based on deep learning

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180172667A1 (en) * 2015-06-17 2018-06-21 Uti Limited Partnership Systems and methods for predicting cardiotoxicity of molecular parameters of a compound based on machine learning algorithms

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105868540A (en) * 2016-03-25 2016-08-17 哈尔滨理工大学 A polycyclic aromatic hydrocarbon property/toxicity prediction method using an intelligent support vector machine
CN109033738A (en) * 2018-07-09 2018-12-18 湖南大学 A kind of pharmaceutical activity prediction technique based on deep learning
CN109658989A (en) * 2018-11-14 2019-04-19 国网新疆电力有限公司信息通信公司 Class drug compound toxicity prediction method based on deep learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于降噪自编码神经网络的化合物毒性预测方面的研究;黎红等;《计算机应用研究》;20170321;第35卷(第03期);第745-749页 *
深度神经网络在化学中的应用研究;秦琦枫等;《江西化工》;20180615(第03期);第1-5页 *

Also Published As

Publication number Publication date
CN110322972A (en) 2019-10-11

Similar Documents

Publication Publication Date Title
CN109408631B (en) Medicine data processing method, device, computer equipment and storage medium
CN107808124B (en) Electronic device, the recognition methods of medical text entities name and storage medium
CN112732915A (en) Emotion classification method and device, electronic equipment and storage medium
WO2020253043A1 (en) Intelligent text classification method and apparatus, and computer-readable storage medium
CN111950596A (en) Training method for neural network and related equipment
CN113707303A (en) Method, device, equipment and medium for solving medical problems based on knowledge graph
CN111930962A (en) Document data value evaluation method and device, electronic equipment and storage medium
CN110889045B (en) Label analysis method, device and computer readable storage medium
CN111523094B (en) Deep learning model watermark embedding method and device, electronic equipment and storage medium
CN113095475A (en) Neural network training method, image processing method and related equipment
CN112016617B (en) Fine granularity classification method, apparatus and computer readable storage medium
CN113298159A (en) Target detection method and device, electronic equipment and storage medium
Stylianou et al. EBM+: Advancing Evidence-Based Medicine via two level automatic identification of Populations, Interventions, Outcomes in medical literature
CN107943788B (en) Enterprise abbreviation generation method and device and storage medium
CN113360803A (en) Data caching method, device and equipment based on user behavior and storage medium
CN110322972B (en) Intelligent drug toxicity judgment method and device and computer readable storage medium
CN113157864A (en) Key information extraction method and device, electronic equipment and medium
CN116313164B (en) Anti-interference sleep monitoring method, device, equipment and storage medium
CN116383766A (en) Auxiliary diagnosis method, device, equipment and storage medium based on multi-mode data
CN116719891A (en) Clustering method, device, equipment and computer storage medium for traditional Chinese medicine information packet
CN116468025A (en) Electronic medical record structuring method and device, electronic equipment and storage medium
CN115775635A (en) Medicine risk identification method and device based on deep learning model and terminal equipment
CN111414452B (en) Search word matching method and device, electronic equipment and readable storage medium
CN114387522A (en) Intelligent early warning method, device, equipment and medium for working site
CN112347739A (en) Application rule analysis method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant