CN110322972B

CN110322972B - Intelligent drug toxicity judgment method and device and computer readable storage medium

Info

Publication number: CN110322972B
Application number: CN201910467872.1A
Authority: CN
Inventors: 王健宗; 彭俊清; 瞿晓阳
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2019-05-29
Filing date: 2019-05-29
Publication date: 2022-05-20
Anticipated expiration: 2039-05-29
Also published as: CN110322972A

Abstract

The invention relates to an artificial intelligence technology, and discloses an intelligent drug toxicity judgment method, which comprises the following steps: receiving a medicine data set and a label set comprising a molecular structure sequence, and coding the medicine data set of the molecular structure sequence based on a Huffman coding technology to obtain a medicine coding set; inputting the drug coding set into an LSTM model, inputting the label set into a loss function, training the LSTM model to obtain a training value and inputting the training value into the loss function, calculating a loss value by the loss function, judging the size of the loss value and a preset threshold value, and quitting training until the loss value is smaller than the preset threshold value; and receiving and coding a drug molecular structure sequence input by a user, inputting the coded drug molecular structure sequence into the drug injury judgment model, and outputting the toxicity judgment of the drug molecular structure sequence input by the user. The invention also provides an intelligent medicine toxicity judgment device and a computer readable storage medium. The invention can realize high-efficiency drug toxicity judgment.

Description

Intelligent drug toxicity judgment method and device and computer readable storage medium

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to an intelligent drug toxicity judgment method and device based on pig liver drugs and a computer readable storage medium.

Background

In recent years, since the incidence of diseases in animals is high and tends to increase, large-scale vaccination and medication of swine herds are often required in pig farms for health management of animals, such as swine herds. Drugs are substances used for the prevention, treatment and diagnosis of diseases, and are also specific in nature. However, the types of diseases related to pig livers are many, and whether the medicines are beneficial or not is unknown, so that how to select the medicines for treatment is a big problem, and therefore, the medicine damage detection is necessary for the pig herds. At present, a pig farm mainly adopts manual medicine trial on pigs and observes swinery reaction to detect medicine damage, however, manual treatment is too time-consuming and labor-consuming, and only the stopping reaction of the swinery can be observed, and whether the medicine influences the physiology of the swinery cannot be judged.

Disclosure of Invention

The invention provides an intelligent drug toxicity judgment method, an intelligent drug toxicity judgment device and a computer readable storage medium, and mainly aims to help a user to present an accurate drug toxicity judgment result when the user inputs a certain drug.

In order to achieve the above object, the present invention provides an intelligent method for determining drug toxicity, comprising:

the method comprises the following steps that a data processing layer receives a medicine data set and a label set which comprise a molecular structure sequence, codes the medicine data set of the molecular structure sequence based on a Huffman coding technology to obtain a medicine coding set, and inputs the medicine coding set and the label set into a medicine damage judgment model;

the drug damage judgment model inputs the drug coding set into an LSTM model, inputs the label set into a loss function, the LSTM model obtains a training value based on the drug coding set training and inputs the training value into the loss function, the loss function calculates a loss value according to the label set and the training value, the loss value and a preset threshold value are judged, and the training is quitted until the loss value is smaller than the preset threshold value;

and receiving and coding a drug molecular structure sequence input by a user, inputting the coded drug molecular structure sequence into the drug injury judgment model, and outputting the toxicity judgment of the drug molecular structure sequence input by the user.

Optionally, the encoding the drug data set of the molecular structure sequence based on huffman coding technique to obtain a drug encoding set, includes:

sequentially reading the molecular structure sequence in the drug data set and randomly selecting the moleculesSelecting a central sequence omega of the structural sequence, and calculating an accumulated summation value X by selecting 2c molecular structural sequences before and after the central sequence omega_ω；

Based on the accumulated sum value X_ωCarrying out node classification judgment to obtain a Huffman binary tree;

and carrying out Huffman coding based on the binary Huffman tree to obtain the medicine coding set.

Optionally, said calculating a cumulative sum value X_ωComprises the following steps:

wherein, V (ω)_i) Is a vector representation of the molecular structure sequence;

the node classification judgment sigma is as follows:

wherein the content of the first and second substances,

representing said accumulated sum value X_ωE is an infinite acyclic decimal.

Optionally, the LSTM model includes an input gate, a forgetting gate, and an output gate, where the drug coding set is input to the input gate, and after a memory unit in the input gate is activated, the memory unit sequentially reads codes of the drug coding set, and activates the codes based on an activation function and inputs the codes to the forgetting gate;

the forgetting gate receives codes sequentially input by the memory unit, calculates the codes based on a forgetting method, and inputs the codes to the output gate to obtain a training value, wherein the forgetting method comprises the following steps:

f_t＝δ(w_t[h_t-1，x_t]+b_t)

wherein f is_tOutput data for said forgetting gate, x_tIs the inputT is the current time of the forgetting gate receiving the input code, t-1 is the previous time of the current time, h_t-1For output data of said output gate at a time preceding said current time, w_tIs the weight of the current time, b_tIs an offset of the current time]For matrix multiplication operations, δ represents the sigmoid function.

Optionally, the loss value ξ is:

wherein n is the number of drug encoding sets,

is the training value, y_iIs the label set.

In addition, to achieve the above object, the present invention further provides an intelligent drug toxicity judging apparatus, which includes a memory and a processor, wherein the memory stores an intelligent drug toxicity judging program operable on the processor, and the intelligent drug toxicity judging program, when executed by the processor, implements the following steps:

sequentially reading the molecular structure sequences in the drug data set, randomly selecting a central sequence omega of the molecular structure sequences, and selecting 2c molecular structure sequences before and after the central sequence omega to calculate an accumulated sum value X_ω；

the node classification judgment sigma is as follows:

wherein the content of the first and second substances,

representing said accumulated sum value X_ωE is an infinite acyclic decimal.

Optionally, the LSTM model is trained to derive training values based on the drug encoding set, including:

the LSTM model comprises an input gate, a forgetting gate and an output gate, the medicine coding set is input into the input gate, after a memory unit in the input gate is activated, the memory unit sequentially reads the codes of the medicine coding set, activates the codes based on an activation function and inputs the codes into the forgetting gate;

f_t＝δ(w_t[h_t-1，x_t]+b_t)

wherein f is_tOutput data for said forgetting gate, x_tFor the input code, t is the current time of the forgetting gate receiving the input code, t-1 is the previous time of the current time, h_t-1For output data of said output gate at a time preceding said current time, w_tIs the weight of the current time, b_tIs an offset of the current time]For matrix multiplication operations, δ represents the sigmoid function.

In addition, to achieve the above object, the present invention also provides a computer readable storage medium having an intelligent drug toxicity judging program stored thereon, the intelligent drug toxicity judging program being executable by one or more processors to implement the steps of the intelligent drug toxicity judging method as described above.

The LSTM model can judge the molecular structure sequence, information which meets the rule can be left, information which does not meet the rule can be forgotten, and the analysis capability of the molecular structure sequence is improved.

Drawings

Fig. 1 is a schematic flow chart of an intelligent method for determining drug toxicity according to an embodiment of the present invention;

fig. 2 is a schematic diagram of an internal structure of an intelligent device for determining drug toxicity according to an embodiment of the present invention;

fig. 3 is a schematic block diagram of an intelligent drug toxicity determination program in the intelligent drug toxicity determination apparatus according to an embodiment of the present invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The invention provides an intelligent drug toxicity judgment method. Fig. 1 is a schematic flow chart of an intelligent method for determining drug toxicity according to an embodiment of the present invention. The method may be performed by an apparatus, which may be implemented by software and/or hardware.

In this embodiment, the intelligent method for determining drug toxicity comprises:

s1, the data processing layer receives a drug data set and a label set which comprise molecular structure sequences, the drug data set of the molecular structure sequences is coded based on the CBOW model to obtain a drug coding set, and the drug coding set and the label set are input into a drug injury judgment model.

In a preferred embodiment of the invention, each drug in the drug data set is defined in the form of { component, dose } according to the national library of medicine clinical drug standardization nomenclature system, the component being a molecular structural sequence of the drug. According to each drug in the drug data set, a plurality of healthy pigs are selected to feed in sequence, the activity of glutamic-pyruvic transaminase in livers of the pigs is measured through a detector, whether the livers of the pigs are damaged is judged, and according to the judgment result, each drug is labeled (harmful or harmless) in sequence to form the label set.

In the preferred embodiment of the present invention, the molecular structure sequences in the drug data set are sequentially read, the central sequence ω of the molecular structure sequences is randomly selected, and 2c molecular structure sequences before and after the central sequence ω are selected to calculate the cumulative sum value X_ω. And based on said accumulated sum value X_ωAnd carrying out node classification judgment to obtain a Huffman binary tree. Based on the Huffman binaryAnd carrying out Hoffman coding on the tree to obtain the drug coding set. Further, the huffman coding can use different arrangement rules of 01 codes to represent the molecular structure sequence according to the data communication knowledge.

The calculation of the cumulative sum X according to the preferred embodiment of the present invention_ωComprises the following steps:

wherein, V (ω)_i) And for the vector representation of the molecular structure sequence, the node classification judgment sigma is as follows:

wherein the content of the first and second substances,

representing said accumulated sum value X_ωE is an infinite acyclic decimal.

S2, the drug damage judgment model inputs the drug code set into an LSTM model, inputs the label set into a loss function, the LSTM model trains based on the drug code set to obtain a training value and inputs the training value into the loss function, the loss function calculates a loss value according to the label set and the training value, and judges the size of the loss value and a preset threshold value until the loss value is smaller than the preset threshold value, and then the training is quit.

The LSTM model in the preferred embodiment of the invention comprises an input gate, a forgetting gate and an output gate, wherein a drug coding set is input into the input gate, after a memory unit in the input gate is activated, the memory unit sequentially reads codes of the drug coding set, activates the codes based on an activation function and inputs the codes to the forgetting gate, and the activation function is a sigmoid function;

f_t＝δ(w_t[h_t-1，x_t]+b_t)

wherein, f_tOutput data for said forgetting gate, x_tFor the input code, t is the current time of the forgetting gate receiving the input code, t-1 is the previous time of the current time, h_t-1For output data of said output gate at a time preceding said current time, w_tIs the weight of the current time, b_tIs an offset of the current time]For matrix multiplication operations, δ represents the sigmoid function.

In the preferred embodiment of the present invention, the loss function calculates a loss value ξ according to the label set and the training value:

wherein n is the number of drug encoding sets,

is the training value, y_iIs the label set.

In the preferred embodiment of the present invention, when the loss value is greater than the preset threshold, the LSTM model continues to be trained based on the drug encoding set, and continues to update the memory unit of the LSTM model based on the gradient descent algorithm and output a training value.

And S3, receiving and coding the drug molecular structure sequence input by the user, inputting the coded drug molecular structure sequence into the drug injury judgment model, and outputting the toxicity judgment of the drug molecular structure sequence input by the user.

The invention also provides an intelligent drug toxicity judgment device. Fig. 2 is a schematic diagram of an internal structure of an intelligent device for determining drug toxicity according to an embodiment of the present invention.

In the present embodiment, the intelligent drug toxicity determination apparatus 1 may be a PC (Personal Computer), a terminal device such as a smart phone, a tablet Computer, or a mobile Computer, or may be a server. The intelligent drug toxicity judgment device 1 at least comprises a memory 11, a processor 12, a communication bus 13 and a network interface 14.

The memory 11 includes at least one type of readable storage medium, which includes a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, and the like. The memory 11 may be an internal storage unit of the intelligent drug toxicity determination apparatus 1 in some embodiments, for example, a hard disk of the intelligent drug toxicity determination apparatus 1. The memory 11 may also be an external storage device of the intelligent drug toxicity judging apparatus 1 in other embodiments, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like provided on the intelligent drug toxicity judging apparatus 1. Further, the memory 11 may also include both an internal storage unit and an external storage device of the intelligent drug toxicity judging apparatus 1. The memory 11 may be used not only to store application software installed in the intelligent drug toxicity judgment apparatus 1 and various types of data, such as a code of the intelligent drug toxicity judgment program 01, but also to temporarily store data that has been output or will be output.

The processor 12 may be a Central Processing Unit (CPU), a controller, a microcontroller, a microprocessor or other data Processing chip in some embodiments, and is used for executing program codes stored in the memory 11 or Processing data, such as executing the intelligent drug toxicity determining program 01.

The communication bus 13 is used to realize connection communication between these components.

The network interface 14 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), typically used to establish a communication link between the apparatus 1 and other electronic devices.

Optionally, the apparatus 1 may further comprise a user interface, which may comprise a Display (Display), an input unit such as a Keyboard (Keyboard), and optionally a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display may also be referred to as a display screen or a display unit, where appropriate, for displaying information processed in the intelligent drug toxicity assessment apparatus 1 and for displaying a visual user interface.

Fig. 2 shows only the intelligent drug toxicity judgment device 1 having the components 11-14 and the intelligent drug toxicity judgment program 01, and those skilled in the art will appreciate that the structure shown in fig. 1 does not constitute a limitation of the intelligent drug toxicity judgment device 1, and may include fewer or more components than those shown, or a combination of certain components, or a different arrangement of components.

In the embodiment of the apparatus 1 shown in fig. 2, the memory 11 stores an intelligent drug toxicity judgment program 01; the processor 12 executes the intelligent drug toxicity judgment program 01 stored in the memory 11 to implement the following steps:

the method comprises the steps that a data processing layer receives a drug data set and a tag set which comprise molecular structure sequences, the drug data set of the molecular structure sequences is coded based on a CBOW model to obtain a drug coding set, and the drug coding set and the tag set are input into a drug injury judgment model.

In a preferred embodiment of the invention, the drugs are read sequentiallySelecting a molecular structure sequence in a data set, randomly selecting a central sequence omega of the molecular structure sequence, and selecting 2c molecular structure sequences before and after the central sequence omega to calculate an accumulated summation value X_ω. And based on said accumulated sum value X_ωAnd carrying out node classification judgment to obtain the Huffman binary tree. And performing Huffman coding based on the Huffman binary tree to obtain the medicine coding set. The Huffman coding can use different arrangement rules of 01 codes to represent the molecular structure sequence according to data communication knowledge.

The calculation of the cumulative sum value X according to the preferred embodiment of the present invention_ωComprises the following steps:

wherein, the first and the second end of the pipe are connected with each other,

representing said accumulated sum value X_ωE is an infinite acyclic decimal.

Inputting the drug code set into an LSTM model by the drug damage judgment model, inputting the label set into a loss function, training the LSTM model based on the drug code set to obtain a training value and inputting the training value into the loss function, calculating a loss value by the loss function according to the label set and the training value, and judging the size of the loss value and a preset threshold value until the loss value is smaller than the preset threshold value and quitting training.

f_t＝δ(w_t[h_t-1，x_t]+b_t)

wherein n is the number of drug encoding sets,

is the training value, y_iIs the label set.

And step three, receiving a drug molecular structure sequence input by a user, encoding the drug molecular structure sequence, inputting the encoded drug molecular structure sequence into the drug injury judgment model, and outputting the toxicity judgment of the drug molecular structure sequence input by the user.

Alternatively, in other embodiments, the intelligent drug toxicity judging program may be further divided into one or more modules, and the one or more modules are stored in the memory 11 and executed by one or more processors (in this embodiment, the processor 12) to implement the present invention, where the module referred to in the present invention refers to a series of computer program instruction segments capable of performing a specific function for describing the execution process of the intelligent drug toxicity judging program in the intelligent drug toxicity judging apparatus.

For example, referring to fig. 3, a schematic diagram of program modules of an intelligent drug toxicity determination program in an embodiment of the intelligent drug toxicity determination apparatus of the present invention is shown, in this embodiment, the intelligent drug toxicity determination program may be divided into a data processing module 10, a model training module 20, and a drug toxicity output module 30, which exemplarily:

the data processing module 10 is configured to: the method comprises the steps of receiving a medicine data set and a label set which comprise a molecular structure sequence, coding the medicine data set of the molecular structure sequence based on a Huffman coding technology to obtain a medicine coding set, and inputting the medicine coding set and the label set into a model training module 20.

The model training module 20 is configured to: inputting the drug code set into an LSTM model, inputting the label set into a loss function, training the LSTM model based on the drug code set to obtain a training value and inputting the training value into the loss function, calculating a loss value by the loss function according to the label set and the training value, and judging the size of the loss value and a preset threshold value until the loss value is smaller than the preset threshold value and quitting training.

The drug toxicity output module 30 is configured to: and receiving a drug molecular structure sequence input by a user, encoding the drug molecular structure sequence, inputting the encoded drug molecular structure sequence into the model training module 20, and outputting toxicity judgment of the drug molecular structure sequence input by the user.

The functions or operation steps of the data processing module 10, the model training module 20, the drug toxicity output module 30 and other program modules implemented when executed are substantially the same as those of the above embodiments, and are not repeated herein.

Furthermore, an embodiment of the present invention further provides a computer-readable storage medium, where an intelligent drug toxicity judgment program is stored, where the intelligent drug toxicity judgment program is executable by one or more processors to implement the following operations:

The embodiment of the computer readable storage medium of the present invention is substantially the same as the embodiments of the intelligent drug toxicity determination apparatus and method, and will not be described herein again.

It should be noted that the above-mentioned numbers of the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments. And the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, apparatus, article or method that comprises the element.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention or portions thereof contributing to the prior art may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) as described above and includes several instructions for enabling a terminal device (which may be a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. An intelligent drug toxicity judgment method is characterized by comprising the following steps:

the method comprises the following steps that a data processing layer receives a medicine data set and a label set which comprise a molecular structure sequence, the medicine data set of the molecular structure sequence is coded based on a Huffman coding technology to obtain a medicine coding set, and the medicine coding set and the label set are input into a medicine damage judgment model, wherein the medicine data set of the molecular structure sequence is coded based on the Huffman coding technology to obtain the medicine coding set, and the method comprises the following steps: sequentially reading the molecular structure sequences in the drug data set, randomly selecting a central sequence omega of the molecular structure sequences, and selecting 2c molecular structure sequences before and after the central sequence omega to calculate an accumulated sum value X_ω(ii) a Based on the accumulated sum value X_ωCarrying out node classification judgment to obtain a Huffman binary tree; performing Huffman encoding based on the binary Huffman tree,obtaining the drug coding set;

the accumulated sum value X_ωComprises the following steps:

the node classification judgment sigma is as follows:

wherein the content of the first and second substances,

representing said accumulated sum value X_ωE is an infinite acyclic decimal;

2. The intelligent drug toxicity assessment method according to claim 1, wherein the LSTM model comprises an input gate, a forgetting gate and an output gate, the drug code set is input to the input gate, and after activating a memory unit in the input gate, the memory unit sequentially reads the codes of the drug code set, activates the codes based on an activation function, and inputs the activated codes to the forgetting gate;

f_t＝δ(w_t[h_t-1,x_t]+b_t)

wherein, f_tOutput data for said forgetting gate, x_tFor the input code, t is the current time of the forgetting gate receiving the input code, t-1 is the previous time of the current time, h_t-1For output data of said output gate at a time preceding said current time, w_tIs the weight of the current time, b_tIs an offset of the current time]For matrix multiplication operations, δ represents a sigmoid function.

3. The intelligent drug toxicity assessment method of claim 1, wherein the loss value ξ is:

wherein n is the number of drug encoding sets,

is the training value, y_iIs the label set.

4. An intelligent drug toxicity assessment apparatus, comprising a memory and a processor, wherein the memory stores an intelligent drug toxicity assessment program operable on the processor, and wherein the processor executes the intelligent drug toxicity assessment program to perform the following steps:

the data processing layer receives a medicine data set and a label set which comprise a molecular structure sequence, codes the medicine data set of the molecular structure sequence based on the Huffman coding technology to obtain a medicine coding set,inputting the drug coding set and the tag set into a drug damage judgment model, wherein the coding of the drug data set of the molecular structure sequence based on the huffman coding technology to obtain the drug coding set comprises: sequentially reading the molecular structure sequences in the drug data set, randomly selecting a central sequence omega of the molecular structure sequences, and selecting 2c molecular structure sequences before and after the central sequence omega to calculate an accumulated sum value X_ω(ii) a Based on the accumulated sum value X_ωCarrying out node classification judgment to obtain a Huffman binary tree; performing Huffman coding based on the binary Huffman tree to obtain the medicine coding set;

the accumulated sum value X_ωComprises the following steps:

the node classification judgment sigma is as follows:

wherein the content of the first and second substances,

representing said accumulated sum value X_ωE is an infinite acyclic decimal;

5. The intelligent drug toxicity judgment device according to claim 4, wherein the LSTM model comprises an input gate, a forgetting gate and an output gate, the drug code set is input to the input gate, and after a memory unit in the input gate is activated, the memory unit sequentially reads the codes of the drug code set and activates the codes based on an activation function and inputs the codes to the forgetting gate;

f_t＝δ(w_t[h_t-1,x_t]+b_t)

wherein f is_tOutput data for said forgetting gate, x_tFor the input code, t is the current time of the forgetting gate receiving the input code, t-1 is the previous time of the current time, h_t-1For output data of said output gate at a time preceding said current time, w_tIs the weight of the current time, b_tIs an offset of the current time]For matrix multiplication operations, δ represents a sigmoid function.

6. A computer readable storage medium having an intelligent drug toxicity determination program stored thereon, the intelligent drug toxicity determination program being executable by one or more processors to implement the steps of the intelligent drug toxicity determination method according to any one of claims 1 to 3.