WO2023134060A1 - 基于药物分子图像分类的信息推送方法及装置 - Google Patents

基于药物分子图像分类的信息推送方法及装置 Download PDF

Info

Publication number
WO2023134060A1
WO2023134060A1 PCT/CN2022/089688 CN2022089688W WO2023134060A1 WO 2023134060 A1 WO2023134060 A1 WO 2023134060A1 CN 2022089688 W CN2022089688 W CN 2022089688W WO 2023134060 A1 WO2023134060 A1 WO 2023134060A1
Authority
WO
WIPO (PCT)
Prior art keywords
drug
information
data
feature
sample data
Prior art date
Application number
PCT/CN2022/089688
Other languages
English (en)
French (fr)
Inventor
王俊
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2023134060A1 publication Critical patent/WO2023134060A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/50Molecular design, e.g. of drugs

Definitions

  • the present application relates to the field of intelligent medical technology, in particular to an information push method and device based on drug molecular image classification.
  • the identification process of the molecular structure of the drug is slow, which makes it unsuitable for relevant information in intelligent medical treatment. Therefore, there is an urgent need for an information push method based on drug molecular image classification to solve the above problems.
  • the present application provides an information push method and device based on drug molecular image classification, the main purpose of which is to solve the problem of low efficiency of information push based on drug molecular structure.
  • a method for pushing information based on drug molecular image classification including:
  • the image data of the molecular structure of the drug is classified and processed to obtain the classification result of the drug molecule.
  • the image classification model is obtained by constructing positive sample data and negative sample data based on training samples and performing model training, wherein, The negative sample data is used to scramble the feature matrix of the graph nodes when the network connection structure remains unchanged during the model training process;
  • an information push device based on drug molecular image classification including:
  • the acquisition module is used to acquire the drug molecular structure image data of the target drug
  • the processing module is used to classify and process the image data of the molecular structure of the drug based on the trained image classification model to obtain the classification result of the drug molecule.
  • the image classification model is based on training samples to construct positive sample data and negative sample data for model training Obtained, wherein the negative sample data is used to scramble the feature matrix of the graph nodes when the network connection structure is unchanged during the model training process;
  • An analysis module configured to analyze the drug feature information of the drug molecule classification result based on the drug molecule feature distribution sequence, and match the drug feature information with the disease feature information of the target disease;
  • An output module configured to output drug feature combination information and drug feature risk information matching the drug feature information if the drug feature information matches the disease feature information of the target disease.
  • a computer-readable storage medium on which computer-readable instructions are stored, wherein, when the computer-readable instructions are executed by a processor, an information push method based on drug molecular image classification is implemented ,include:
  • the image data of the molecular structure of the drug is classified and processed to obtain the classification result of the drug molecule.
  • the image classification model is obtained by constructing positive sample data and negative sample data based on training samples and performing model training, wherein, The negative sample data is used to scramble the feature matrix of the graph nodes when the network connection structure remains unchanged during the model training process;
  • a computer device including a memory, a processor, and computer-readable instructions stored in the memory and operable on the processor, wherein the computer-readable instructions are executed by the processor Realize the information push method based on drug molecular image classification, including:
  • the image data of the molecular structure of the drug is classified and processed to obtain the classification result of the drug molecule.
  • the image classification model is obtained by constructing positive sample data and negative sample data based on training samples and performing model training, wherein, The negative sample data is used to scramble the feature matrix of the graph nodes when the network connection structure remains unchanged during the model training process;
  • the technical solution provided by the embodiment of the present application has at least the following advantages:
  • This application provides an information push method and device based on drug molecular image classification. Compared with the prior art, it realizes the identification of drug characteristics based on intelligent artificial algorithms, so as to push information through the matching of drug characteristics and diseases. , which greatly improves the efficiency of matching diseases based on drug characteristics in intelligent medical care, thereby improving the efficiency and accuracy of information push in the intelligent medical system.
  • Fig. 1 shows a flow chart of an information push method based on drug molecular image classification provided by an embodiment of the present application
  • Fig. 2 shows a flow chart of another information push method based on drug molecular image classification provided by the embodiment of the present application
  • Fig. 3 shows a schematic diagram of a multi-scale graph convolutional neural network structure provided by the embodiment of the present application
  • Fig. 4 shows a composition block diagram of an information push device based on drug molecular image classification provided by the embodiment of the present application
  • FIG. 5 shows a schematic structural diagram of a computer device provided by an embodiment of the present application.
  • AI artificial intelligence
  • the embodiments of the present application may acquire and process relevant data based on artificial intelligence technology.
  • artificial intelligence is the theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results. .
  • Artificial intelligence basic technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, and mechatronics.
  • Artificial intelligence software technology mainly includes computer vision technology, robotics technology, biometrics technology, speech processing technology, natural language processing technology, and machine learning/deep learning.
  • an information push method based on drug molecular image classification is provided, and the application of this method to a computer device such as a server is used as an example for illustration, wherein the server can be an independent Servers can also provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, content delivery network (Content Delivery Network, CDN), and large Cloud servers for basic cloud computing services such as data and artificial intelligence platforms, such as intelligent medical systems, digital medical platforms, etc.
  • the above method comprises the following steps:
  • the executor may be an intelligent management system with an information push function, for example, an intelligent medical system, a data medical platform, and the like.
  • the current execution subject is an intelligent medical system
  • the target drug is a drug that is suitable for pushing relevant information matched with drug characteristics.
  • the drug molecular structure image data of the target drug uses a graph structure to represent the target drug Molecules, in which the image content in the drug molecular structure image data is the atom-chemical bond structure of the target drug molecule, from the image content, the characteristics of the molecular structure such as spatial features, atomic number, and charge number in the form of nodes-edges can be abstracted Content, so that based on the classification of image data, a classification method for the molecular structure of drugs can be obtained, that is, through the graph neural network, the local relationship of the graph can be captured and the graph properties can be automatically learned by passing the information of nodes and edges, so that Efficiently perform graph classification tasks.
  • the drug molecular structure image data in the embodiment of the present application is obtained by loading the drug molecular structure image data of the target drug generated by the intelligent medical system as the current execution subject based on the computer software for making molecular structure diagrams.
  • the operator can obtain the drug molecular structure image data matching the target drug based on the drug database already stored in the current intelligent medical system, or make it through the molecular structure creation application program, and make it in the specified file format in the intelligent medical system acquisition, which is not specifically limited in the embodiment of this application.
  • the graph network structure corresponding to the image data to be classified by the graph neural network includes graph nodes and edges, wherein the graph nodes contain entity information , such as the atoms in the compound, the edge contains the relationship information between entities, such as the chemical bond between the atoms in the compound image data, in order to classify the drug molecular structure image data, in order to obtain the drug molecular classification results and match the drug feature information, in advance
  • the image classification model is obtained through model training, so as to classify and process the data of the drug molecular structure diagram, and obtain the classification result of the drug molecule.
  • the corresponding drug molecule classification results are the classification results representing different atoms-chemical bonds, so as to determine the characteristics of the drug molecule based on the drug molecule classification results.
  • positive and negative sample data are constructed as supervised learning
  • the training samples can effectively learn the potential features and information in the sample data.
  • self-supervised learning self-supervised comparative learning means that for any two data points, if they are more similar, that is, they belong to the same class, the graph representations will be closer. Therefore, the embodiment of the present application constructs Different samples are used as model input parameters, that is, the features in the input data are learned by guiding the pre-training model to discriminate positive and negative sample data in the implicit representation space.
  • the pre-training model When constructing positive and negative sample data from the input data, let the pre-training model discriminate the positive and negative sample data in the implicit representation space, and realize the pre-training task by constructing the pre-training task, that is, the supervision signal, from the unlabeled input data.
  • the image classification model is obtained by constructing positive sample data and negative sample data based on training samples for model training.
  • the negative sample data is used to scramble the feature matrix of the graph nodes when the network connection structure is unchanged during the model training process, that is, the function of the negative sample data is to keep the image network connection structure in the image data unchanged, Disrupt the feature matrix of nodes by row, so that the pre-training model can distinguish positive and negative sample data in the implicit representation space to improve the learning accuracy of features.
  • the original 5 nodes each node has 32 Dimensional features
  • the original row numbers of 1, 2, 3, 4, 5 are disrupted to 3, 5, 4, 2, 1.
  • the nodes at each position The features of have changed, thus constructing a negative sample after the node attribute is destroyed.
  • the drug feature information corresponding to the drug molecular classification results is analyzed based on the drug molecule feature distribution sequence, so as to carry out the matching of the target disease. matching of disease characteristics.
  • the drug molecular characteristic distribution sequence stores the corresponding relationship between different atoms and chemical bonds corresponding to the molecular composition data containing different molecular chemical bonds, molecular chemical property information, and disease resistance attributes, so that the corresponding relationship between different atoms and chemical bonds can be obtained based on step 102.
  • the drug molecular classification results are matched to the corresponding molecular chemical bonds, molecular chemical property information, and disease resistance attributes from the drug molecular feature distribution sequence as drug feature information.
  • the drug molecule classification results include a atom-chemical bond 1, d atom-chemical bond 3, h atom-chemical bond 3, based on the distribution sequence analysis of drug molecule characteristics a atom-chemical bond 1, d atom-chemical bond 3, h atom-chemical bond 3
  • Corresponding molecular chemical bonds, molecular chemical property information, and disease resistance attributes are known resistance properties between target drug molecules and diseases, such as whether they are resistant to lung fibroblasts, as an antidote to lung cancer medicine.
  • the drug feature information is determined, since the drug feature information includes molecular chemical bonds, molecular chemical property information, and disease resistance attributes, in order to realize intelligent information push, based on the matched drug feature information and the target disease disease feature information to match.
  • the target disease is a disease that needs to be paired with the target drug to determine whether it is resistant.
  • the resistance in the embodiment of this application is whether the drug has the function of treating the disease.
  • the obtained drug feature information can be matched with the disease feature information of at least one target disease.
  • the current intelligent medical system can directly obtain the disease characteristic information entered by the operator (such as a doctor, or a medical researcher).
  • the disease characteristic information includes but is not limited to the different biological or chemical characteristics of the disease on the human body. For example, if the characteristic information of the disease is that the white blood cell value is higher than a, it will be matched with the chemical bond of the molecule, the chemical property information of the molecule, and the resistance property of the disease.
  • the drug feature information matches the disease feature information, it means that the target drug is resistant to the target disease, and the drug can be used to treat the disease. Therefore, by outputting the drug feature information that matches the drug feature information Combination information, drug profile risk information.
  • the drug feature combination information is the feature information of other drugs that the target drug can be used in combination
  • the drug feature risk information is the human body use risk content generated by the molecular chemical bond, molecular chemical property information, and disease resistance attributes of the target drug, In order to realize the intelligent information push of the intelligent medical system.
  • step 101 classifies the drug molecular structure image data based on the trained image classification model, and before obtaining the drug molecule classification result, the Methods also include:
  • a graph convolution network is constructed for the specific image data, and a training sample for training the graph convolution network is obtained, that is, the training sample data of the drug molecular structure image .
  • both the positive sample data and the negative sample data are constructed based on the training sample data of the drug molecular structure image, specifically, since the data pair of the negative sample data is the characteristic matrix of the graph nodes when the network connection structure is unchanged Perform scrambling processing, that is, perform feature perturbation processing on the training sample data of the drug molecular structure image, and obtain the training sample data of the drug molecular structure image as a perturbed pseudo-feature as negative sample data, for example, the negative sample data is the network connection structure of the maintenance graph Unchanged, the feature matrix of the graph nodes in the drug molecular structure image training sample data is disrupted row by row, for example, the original 5 nodes, each node has 32-dimensional features, the original 1, 2, 3, 4 , 5, the row numbers are scrambled into 3, 5, 4, 2, 1.
  • the positive sample data is the training sample data of the drug molecular structure image without feature perturbation processing directly as the positive sample data, and at this time the sum of the positive sample data and the negative sample data is used as the number of training sample data,
  • the number of positive sample data is M
  • the number of negative sample data is N
  • M+N is equal to the total number of samples of drug molecular structure image training sample data
  • N and M are not specified in the embodiments of this application. limited.
  • the pooling method is used to discard nodes according to a certain ratio to construct a multi-scale graph convolutional network. For example, for the original graph of 1000 nodes, by sequentially forming image data of different scales according to the ratio of 0.9, 0.8, and 0.7, it is possible to construct a node-full graph comparison learning target at different scales. At this time, image data of multiple scales Representing the information of the whole image at different granularities can obtain a richer comparative learning effect. Among them, the amount of data is reduced once each pooling, and the image data is extracted layer by layer by reducing the number of nodes.
  • the number of nodes is reduced by constructing data pairs, that is, data pairs are constructed based on positive sample data, negative sample data and graph nodes, so as to model the graph convolutional network based on data pairs Train to get an image classification model.
  • step 203 constructing data pairs based on the positive sample data, the negative sample data and graph nodes respectively includes: screening the molecular structure image data of the drug according to a preset ratio A graph node to obtain the multi-scale image data of the drug molecular structure; combining the positive sample data with the graph node of the multi-scale drug molecular structure image data to construct a first data pair, and combining the negative sample data with the graph node of the drug molecular structure image data Combining graph nodes of the multi-scale graph convolutional neural network to construct a second data pair, wherein the label of the first data pair is 1, and the label of the second data pair is 0.
  • the expression form of the data pair is: a certain node-full graph, therefore, the label of the first data pair is 1, which represents the node in the original image data and the data pair composed of the image data;
  • the features corresponding to different nodes are no longer the original features.
  • the feature X1 of node 1 may be the feature X5 of the original node 5
  • the feature of node 5 may be the feature X9 of node 9.
  • This node order In the disturbed image data, the data pair composed of any node after the disturbance and the original full image data is a negative sample, that is, the label of the second data pair is 0.
  • the preset ratio screening is to filter the graph of the drug molecular structure image data by discarding the nodes according to the preset ratio.
  • Nodes can construct graph nodes and compare learning with full image data of different granularities to form rich multi-scale data pairs.
  • step 204 performs model training on the graph convolutional network based on the data, and before obtaining the image classification model, the method further includes: combining the discriminator with the The number of samples of the positive sample data and the negative sample data constructs a loss function.
  • each graph node aggregates the characteristics of the neighbor nodes and its own
  • the feature of one layer is used to update its own information, and the aggregated information is usually transformed non-linearly, and by stacking multi-layer networks, each graph node can obtain the information of neighbor nodes within the corresponding hop number.
  • the loss function is constructed based on the discriminator combined with the number of samples of positive sample data and negative sample data, and the training of the model is discriminated in turn. Specifically, the loss function is:
  • s is the summary representation of the whole image obtained from the feature implicit representation of the original image data using the read function.
  • the calculation method of the mean value is used, that is, the feature representation of all nodes is averaged as the feature of the full image summary
  • h represents the hidden layer embedding of the current graph node, that is, the feature vector of the graph node, such as a 768 dimensional vector
  • D is a discriminator, which is used to use the global representation to perform model training scoring on positive sample data and negative sample data respectively, by giving positive sample data as high a score as possible and negative sample data as low as possible. Learning of representation vectors of image data is performed.
  • s represents the embedding of the full image data composed of all graph nodes, that is, the average feature vector of the full image data, if there are 100 graph nodes, s is the average 768-dimensional vector of the 100 graph nodes, the standard full
  • the feature information of the image data X represents the feature vector of the painted node, such as in the scene of a molecular graph, for a molecular graph composed of multiple atomic nodes and chemical bonds as edges, X represents the feature of the atomic node, and A represents the image data
  • the adjacency matrix indicates which graph nodes are connected and which graph nodes are not connected, so that it can represent the topological information of the image data.
  • Table 1 below, X represents the feature vector of graph nodes.
  • x represents the characteristics of atomic nodes, including the following characteristics, etc. .
  • step 204 performs model training on the graph convolutional network based on the data pair, and obtaining the image classification model includes: performing model training on the graph convolutional network based on the first data pair and the second data pair , the second data pair scrambles the feature matrix of the graph nodes, and performs learning and evaluation on the scrambled graph convolutional network based on the loss function; if the learning evaluation meets the preset Assuming the model training accuracy, the model training of the graph convolutional network is completed to obtain an image classification model.
  • the representation steps of image data are as follows: 1. First perform feature or randomly initialized entity variable features, and then perform an aggregation operation (aggregate), that is, aggregate the information of the neighbor graph nodes into the current graph nodes, After that, it operates on its own information and aggregated information to update its own characteristics; 2. Each graph node learns its own characteristics and information from other graph nodes.
  • aggregation operation that is, aggregate the information of the neighbor graph nodes into the current graph nodes, After that, it operates on its own information and aggregated information to update its own characteristics
  • Each graph node learns its own characteristics and information from other graph nodes.
  • a molecule is an image data. Molecular graph expression learning will accumulate or perform other summation operations on the features of all graph nodes to obtain the vector features of the entire molecular graph.
  • the method before step 103 analyzing the drug characteristic information of the drug molecular classification result based on the drug molecule characteristic distribution sequence, the method further includes: obtaining the molecular composition of at least one drug Data, based on molecular chemical bonds, molecular chemical property information, and disease resistance attributes to construct the drug molecule characteristic distribution sequence of the target drug.
  • the molecular composition data of at least one drug is obtained.
  • the obtained molecular composition data only includes molecular chemical bonds, molecular chemical property information, and disease resistance data.
  • the molecular chemical bond is the chemical bond of all atoms in the drug molecule
  • the molecular chemical property information includes the chemical properties corresponding to the unique molecular structure, for example, the chemical properties corresponding to the phenol ring, the chemical properties corresponding to the benzyl group, etc.
  • the attribute of disease resistance is the attribute of whether there is a medical effect such as treatment or slowdown between different drug molecules and different diseases.
  • the drug molecule s has a relieving effect on the blood viscosity of hypertensive patients, that is, it is antagonistic.
  • the embodiment of the present application is to push the relevant information of the target drug, and the target drug is probably not tested or needs long-term experiments for verification, and at least A drug is a drug that has been verified. Therefore, in the embodiment of this application, the drug molecular feature distribution sequence based on the analyzed molecular composition data is compared with the drug molecular classification results to obtain drug feature information.
  • step 103 analyzing the drug characteristic information of the drug molecule classification result based on the drug molecule characteristic distribution sequence includes: comparing the drug molecule classification result with the drug molecule characteristic distribution sequence one by one with the chemical bonds of molecules and atoms, from the The molecular chemical property information and disease resistance attribute determined to have the largest chemical bond similarity in the drug molecule characteristic distribution sequence are the drug characteristic information of the target drug.
  • the drug characteristic information of the target drug is the molecular chemical property information and the disease resistance attribute corresponding to the similarity maximum value corresponding to the drug molecular characteristic distribution sequence.
  • the method further includes: acquiring a drug knowledge graph; Search for drug feature combination information and drug feature risk information that match the drug feature information.
  • the drug knowledge map is obtained in order to obtain the drug feature combination information and drug feature risk information that match the drug feature information.
  • the drug knowledge map stores the associated combination content of different drug feature information and the risk information corresponding to the associated combination between different drug feature information, and the drug feature combination information is the characteristics of other drugs that the target drug can be used in combination.
  • the characteristic information of drug 1 is s
  • the characteristic information of drug 2 is e
  • the characteristic information of drug 1 and drug 2 can be s+e or f
  • the drug characteristic risk information is the target drug Human body use risk content generated by the molecular chemical bond, molecular chemical property information, and disease resistance attributes.
  • the risk content used is red blood cell reduction, etc., which are not specifically limited in the embodiments of this application.
  • the current intelligent medical system pre-stores or generates drug feature combination information and drug feature risk information corresponding to different drugs so that in the current embodiment of the application, the drug knowledge map is directly called for Matching of drug feature combination information and drug feature risk information.
  • the method further includes: if the drug characteristic information does not match the disease characteristic information of the target disease, outputting the classification result of the drug molecule to indicate that the drug molecule The classification results are manually matched.
  • the operator can still obtain the drug molecular classification results, and directly output the drug molecular classification results when there is a mismatch , so as to perform manual experiments or matching, for example, to directly display the classification results of drug molecules containing chemical molecular bonds, which is not specifically limited in this embodiment of the present application.
  • the embodiment of the present application provides an information push method based on drug molecular image classification.
  • the embodiment of the present application obtains the drug molecular structure image data of the target drug;
  • the drug molecular structure image data is classified and processed to obtain the drug molecular classification result.
  • the image classification model is obtained by constructing positive sample data and negative sample data based on training samples for model training, wherein the negative sample data is used for model training.
  • the feature matrix of the graph nodes is scrambled; the drug feature information of the drug molecular classification result is analyzed based on the drug molecule feature distribution sequence, and the drug feature information is combined with the disease characteristics of the target disease.
  • the purpose is to push information through the matching of drug characteristics and diseases, which greatly improves the efficiency of matching diseases based on drug characteristics in intelligent medical care, thereby improving the efficiency and accuracy of information push in intelligent medical systems.
  • the embodiment of the present application provides an information push device based on drug molecular image classification, as shown in Figure 4, the device includes:
  • An acquisition module 31, configured to acquire drug molecular structure image data of the target drug
  • the processing module 32 is used to classify the drug molecular structure image data based on the trained image classification model to obtain the drug molecular classification result, and the image classification model is to construct a positive sample data and negative sample data based on training samples Obtained by training, wherein the negative sample data is used to scramble the feature matrix of the graph nodes when the network connection structure remains unchanged during the model training process;
  • An analysis module 33 configured to analyze the drug feature information of the drug molecule classification result based on the drug molecule feature distribution sequence, and match the drug feature information with the disease feature information of the target disease;
  • the output module 34 is configured to output drug feature combination information and drug feature risk information matching the drug feature information if the drug feature information matches the disease feature information of the target disease.
  • the device also includes: a training module,
  • the acquisition module is used to acquire training sample data of drug molecular structure images, and construct a graph convolutional network
  • the processing module is configured to perform feature perturbation processing on the training sample data of the molecular structure image of the drug, obtain the training sample data of the molecular structure image of the drug as a perturbed pseudo-feature as negative sample data, and process the uncharacterized perturbation
  • the drug molecular structure image training sample data is used as positive sample data;
  • the training module is used to construct data pairs based on the positive sample data, the negative sample data and graph nodes respectively, and perform model training on the graph convolution network based on the data pairs to obtain an image classification model, wherein , the data of the negative sample data scrambles the feature matrix of the graph nodes when the network connection structure remains unchanged.
  • building blocks include:
  • a screening unit configured to screen the graph nodes of the drug molecular structure image data according to a preset ratio, to obtain multi-scale drug molecular structure image data
  • a construction unit configured to combine the positive sample data with graph nodes of the multi-scale drug molecular structure image data to construct a first data pair, and combine the negative sample data with the multi-scale graph convolutional neural network
  • a second data pair is constructed by combining graph nodes, wherein the label of the first data pair is 1, and the label of the second data pair is 0.
  • the construction module is also used to construct a loss function based on the number of samples of the positive sample data and the negative sample data based on the discriminator;
  • the training modules include:
  • a processing unit configured to shuffle the feature matrix of the graph node by the second data pair when performing model training on the graph convolutional network based on the first data pair and the second data pair, and Based on the loss function, learning and evaluating the scrambled graph convolutional network;
  • the training unit is configured to complete the model training of the graph convolution network to obtain an image classification model if the learning evaluation meets the preset model training accuracy.
  • the obtaining module is also used to obtain molecular composition data of at least one drug, and construct a drug molecule characteristic distribution sequence of the target drug based on molecular chemical bonds, molecular chemical property information, and disease resistance attributes;
  • the analysis module is specifically used to compare the molecular and atomic chemical bonds one by one between the drug molecule classification result and the drug molecule characteristic distribution sequence, and determine the chemical property of the molecule with the largest chemical bond similarity from the drug molecule characteristic distribution sequence
  • the information and disease resistance attributes are drug characteristic information of the target drug.
  • the device also includes: a search module,
  • the acquiring module is also used to acquire a drug knowledge map, which stores the associated combination content of different drug characteristic information and the risk information corresponding to the associated combination between different drug characteristic information;
  • the search module is configured to search the drug feature combination information and drug feature risk information matching the drug feature information from the drug knowledge map.
  • the output module is further configured to output the drug molecule classification result if the drug characteristic information does not match the disease characteristic information of the target disease, so as to indicate manual matching of the drug molecule classification result.
  • the embodiment of the present application provides an information push device based on drug molecular image classification.
  • the embodiment of the present application obtains the drug molecular structure image data of the target drug;
  • the drug molecular structure image data is classified and processed to obtain the drug molecular classification result.
  • the image classification model is obtained by constructing positive sample data and negative sample data based on training samples for model training, wherein the negative sample data is used for model training.
  • the feature matrix of the graph nodes is scrambled; the drug feature information of the drug molecule classification result is analyzed based on the drug molecule feature distribution sequence, and the drug feature information is combined with the disease characteristics of the target disease.
  • the purpose is to push information through the matching of drug characteristics and diseases, which greatly improves the efficiency of matching diseases based on drug characteristics in intelligent medical care, thereby improving the efficiency and accuracy of information push in intelligent medical systems.
  • a computer-readable storage medium stores at least one executable instruction, and the computer-executable instruction can execute the information push based on drug molecular image classification in any of the above method embodiments method.
  • the computer-readable storage medium may be non-volatile or volatile.
  • FIG. 5 shows a schematic structural diagram of a computer device provided according to an embodiment of the present application.
  • the specific embodiment of the present application does not limit the specific implementation of the computer device.
  • the computer device may include: a processor (processor) 402, a communication interface (Communications Interface) 404, a memory (memory) 406, and a communication bus 408.
  • processor processor
  • Communication interface Communication Interface
  • memory memory
  • the processor 402 , the communication interface 404 , and the memory 406 communicate with each other through the communication bus 408 .
  • the communication interface 404 is used to communicate with network elements of other devices such as clients or other servers.
  • the processor 402 is configured to execute the program 410, and specifically, may execute the relevant steps in the above embodiment of the information push method based on drug molecular image classification.
  • the program 410 may include program codes including computer operation instructions.
  • the processor 402 may be a central processing unit CPU, or an ASIC (Application Specific Integrated Circuit), or one or more integrated circuits configured to implement the embodiments of the present application.
  • the one or more processors included in the computer device may be of the same type, such as one or more CPUs, or may be different types of processors, such as one or more CPUs and one or more ASICs.
  • the memory 406 is used to store the program 410 .
  • the memory 406 may include a high-speed RAM memory, and may also include a non-volatile memory (non-volatile memory), such as at least one disk memory.
  • the program 410 can specifically be used to make the processor 402 perform the following operations:
  • the image data of the molecular structure of the drug is classified and processed to obtain the classification result of the drug molecule.
  • the image classification model is obtained by constructing positive sample data and negative sample data based on training samples and performing model training, wherein, The negative sample data is used to scramble the feature matrix of the graph nodes when the network connection structure remains unchanged during the model training process;
  • the drug feature information matches the disease feature information of the target disease, then output the drug feature combination information and drug feature risk information matching the drug feature information.
  • each module or each step of the above-mentioned application can be realized by a general-purpose computing device, and they can be concentrated on a single computing device, or distributed in a network composed of multiple computing devices Alternatively, they may be implemented in program code executable by a computing device so that they may be stored in a storage device to be executed by a computing device, and in some cases in an order different from that shown here
  • the steps shown or described are carried out, or they are separately fabricated into individual integrated circuit modules, or multiple modules or steps among them are fabricated into a single integrated circuit module for implementation.
  • the application is not limited to any specific combination of hardware and software.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Chemical & Material Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Molecular Biology (AREA)
  • Multimedia (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Medicinal Chemistry (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Image Analysis (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

本申请公开了一种基于药物分子图像分类的信息推送方法及装置,涉及智能医疗技术领域,主要目的在于解决现有基于药物分子结构进行信息推送效率较低的问题。包括:获取目标药物的药物分子结构图像数据;基于训练后的图像分类模型对所述药物分子结构图像数据进行分类处理,得到药物分子分类结果;基于药物分子特征分布序列解析所述药物分子分类结果的药物特征信息,并基于所述药物特征信息与目标病症的病症特征信息进行匹配;若所述药物特征信息与目标病症的病症特征信息匹配,则输出与所述药物特征信息匹配的药物特征组合信息、药物特征风险信息。主要用于基于药物分子图像分类的信息推送。

Description

基于药物分子图像分类的信息推送方法及装置
本申请要求与2022年1月11日提交中国专利局、申请号为202210028280.1申请名称为“基于药物分子图像分类的信息推送方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在申请中。
技术领域
本申请涉及一种智能医疗技术领域,特别是涉及一种基于药物分子图像分类的信息推送方法及装置。
背景技术
近年来,智能医疗技术的应用领域已经从临床治疗逐步向药物研发方向发展,越来越多的人工智能技术涉足于药物对不同病症的适用情况的分析,从而准确找到适用于临床治疗的药物。尤其是针对药物的分子结构进行研究,从而基于药物特征来确定适合患者的治疗方案或者病症的治疗,向用户进行推送。
发明人意识到目前基于药物分子结构的研究均是采用物理实验方式来确定药物特征,从而进行人为识别病症进行推送,但是,药物分子结构识别过程较慢,导致无法适用于智能医疗中进行相关信息的推送,从而使得基于药物特征匹配病症在智能医疗中的使用效率较低,因此,亟需一种基于药物分子图像分类的信息推送方法来解决上述问题。
发明内容
有鉴于此,本申请提供一种基于药物分子图像分类的信息推送方法及装置,主要目的在于解决现有基于药物分子结构进行信息推送效率较低的问题。
依据本申请一个方面,提供了一种基于药物分子图像分类的信息推送方法,包括:
获取目标药物的药物分子结构图像数据;
基于训练后的图像分类模型对所述药物分子结构图像数据进行分类处理,得到药物分子分类结果,所述图像分类模型为基于训练样本构建正样本数据、负样本数据进行模型训练得到的,其中,所述负样本数据用于在模型训练过程中网络连接结构不变时对图节点的特征矩阵进行打乱处理;
基于药物分子特征分布序列解析所述药物分子分类结果的药物特征信息,并基于所述药物特征信息与目标病症的病症特征信息进行匹配;
若所述药物特征信息与目标病症的病症特征信息匹配,则输出与所述药物特征信息匹配的药物特征组合信息、药物特征风险信息。
依据本申请另一个方面,提供了一种基于药物分子图像分类的信息推送装置,包括:
获取模块,用于获取目标药物的药物分子结构图像数据;
处理模块,用于基于训练后的图像分类模型对所述药物分子结构图像数据进行分类处理,得到药物分子分类结果,所述图像分类模型为基于训练样本构建正样本数据、负样本数据进行模型训练得到的,其中,所述负样本数据用于在模型训练过程中网络连接结构不变时对图节点的特征矩阵进行打乱处理;
解析模块,用于基于药物分子特征分布序列解析所述药物分子分类结果的药物特征信息,并基于所述药物特征信息与目标病症的病症特征信息进行匹配;
输出模块,用于若所述药物特征信息与目标病症的病症特征信息匹配,则输出与所述药物特征信息匹配的药物特征组合信息、药物特征风险信息。
根据本申请的又一方面,提供了一种计算机可读存储介质,其上存储有计算机可读指令,其中,所述计算机可读指令被处理器执行时实现基于药物分子图像分类的信息推送方法,包括:
获取目标药物的药物分子结构图像数据;
基于训练后的图像分类模型对所述药物分子结构图像数据进行分类处理,得到药物分子分类结果,所述图像分类模型为基于训练样本构建正样本数据、负样本数据进行模型训练得到的,其中,所述负样本数据用于在模型训练过程中网络连接结构不变时对图节点的特征矩阵进行打乱处理;
基于药物分子特征分布序列解析所述药物分子分类结果的药物特征信息,并基于所述药物特征信息与目标病症的病症特征信息进行匹配;
若所述药物特征信息与目标病症的病症特征信息匹配,则输出与所述药物特征信息匹配的药物特征组合信息、药物特征风险信息。
根据本申请的再一方面,提供了一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机可读指令,其中,所述计算机可读指令被处理器执行时实现基于药物分子图像分类的信息推送方法,包括:
获取目标药物的药物分子结构图像数据;
基于训练后的图像分类模型对所述药物分子结构图像数据进行分类处理,得到药物分子分类结果,所述图像分类模型为基于训练样本构建正样本数据、负样本数据进行模型训练得到的,其中,所述负样本数据用于在模型训练过程中网络连接结构不变时对图节点的特征矩阵进行打乱处理;
基于药物分子特征分布序列解析所述药物分子分类结果的药物特征信息,并基于所述药物特征信息与目标病症的病症特征信息进行匹配;
若所述药物特征信息与目标病症的病症特征信息匹配,则输出与所述药物特征信息匹配的药物特征组合信息、药物特征风险信息。
借由上述技术方案,本申请实施例提供的技术方案至少具有下列优点:
本申请提供了一种基于药物分子图像分类的信息推送方法及装置,与现有技术相比,实现了基于智能人工算法进行药物特征的识别目的,以通过药物特征与病症的匹配来进行信息推送,大大提高了基于药物特征匹配病症在智能医疗中的使用效率较,从而提高了在智能医疗系统中的信息推送效率以及准确性。
上述说明仅是本申请技术方案的概述,为了能够更清楚了解本申请的技术手段,而可依照说明书的内容予以实施,并且为了让本申请的上述和其它目的、特征和优点能够更明显易懂,以下特举本申请的具体实施方式。
附图说明
通过阅读下文优选实施方式的详细描述,各种其他的优点和益处对于本领域普通技术人员将变得清楚明了。附图仅用于示出优选实施方式的目的,而并不认为是对本申请的限制。而且在整个附图中,用相同的参考符号表示相同的部件。在附图中:
图1示出了本申请实施例提供的一种基于药物分子图像分类的信息推送方法流程图;
图2示出了本申请实施例提供的另一种基于药物分子图像分类的信息推送方法流程图;
图3示出了本申请实施例提供的一种多尺度图卷积神经网络结构示意图;
图4示出了本申请实施例提供的一种基于药物分子图像分类的信息推送装置组成框图;
图5示出了本申请实施例提供的一种计算机设备的结构示意图。
具体实施方式
下面将参照附图更详细地描述本公开的示例性实施例。虽然附图中显示了本公开的示例性实施例,然而应当理解,可以以各种形式实现本公开而不应被这里阐述的实施例所限制。相反,提供这些实施例是为了能够更透彻地理解本公开,并且能够将本公开的范围完整的传达给本领域的技术人员。
本申请实施例可以基于人工智能技术对相关的数据进行获取和处理。其中,人工智能(Artificial Intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。
人工智能基础技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理技术、操作/交互系统、机电一体化等技术。人工智能软件技术主要包括计算机视觉技术、机器人技术、生物识别技术、语音处理技术、自然语言处理技术以及机器学习/深度学习等几大方向。
基于此,在一个实施例中,如图1所示,提供了一种基于药物分子图像分类的信息推送方法,以该方法应用于服务器等计算机设备为例进行说明,其中,服务器可以是独立的服务器,也可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、内容分发网络(Content Delivery Network,CDN)、以及大数据和人工智能平台等基础云计算服务的云服务器,如智能医疗系统、数字医疗平台等。上述方法包括以下步骤:
101、获取目标药物的药物分子结构图像数据。
本申请实施例中,执行主体可以是带有信息推送功能的智能管理系统,例如,智能医疗系统、数据医疗平台等。示例性的,当前执行主体为智能医疗系统,目标药物为适用于待进行与药物特征进行匹配的相关信息的推送的药物,对应的,目标药物的药物分子结构图像数据为使用图结构表示目标药物的分子,其中,药物分子结构图像数据中的图像内容为目标药物分子的原子-化学键结构,从图像内容中可以抽象得到以节点-边形式的空间特征、原子序数、电荷数等分子结构的特征内容,从而可以基于对图像数据的分类,得到对药物分子结构的一种分类实现方法,即通过图神经网络可以通过传递节点和边的信息等特定,捕捉图的局部关系自动学习图属性,从而高效的进行图分类任务。
需要说明的是,本申请实施例中的药物分子结构图像数据为作为当前执行主体的智能医疗系统基于制作分子结构图的计算机软件生成目标药物的药物分子结构图像数据后进行加载得到的,此时,操作人员可以基于已经存储于当前智能医疗系统中的药物数据库获取与目标药物匹配的药物分子结构图像数据,也可以通过分子结构制作应用程序进行制作,并以智能医疗系统中的指定文件格式进行获取,本申请实施例不做具体限定。
102、基于训练后的图像分类模型对所述药物分子结构图像数据进行分类处理,得到药物分子分类结果。
本申请实施例中,由于图神经网络是深度学习在图结构数据上的一个分支,待进行图神经网络分类的图像数据对应的图网络结构包含图节点和边,其中,图节点包含了实体信息,如化合物中的原子,边包含实体间的关系信息,如化合物图像数据中原子间的化学键,为了针对药物分子结构图像数据进行分类,以得到药物分子分类结果进行药物特征信息的匹配,预先进行模型训练得到图像分类模型,以对药物分子结构图数据进行分类处理,得到药物分子分类结果。其中,由于药物分子结构图像数据是进行图神经网络的分类,对应得到的药物分子分类结果即为表示不同原子-化学键的分类结果,以便基于药物分子分类结果确定药物分子特征。
需要说明的是,为了提高图像分类模型的模型处理精度,并针对无法获取大量标签数据,如很难获取海量的分子数据,因此,本申请实施例中,构建正、负样本数据,作为监督学习的训练样本,有效地学习样本数据中的潜在特征以及信息。其中,在自监督学习中,自监督对比学习就是对于任意两个数据点,若越相似,即属于同一类,则其图表示就会越接近,因此,本申请实施例从样本数据中构造出不同的样本来作为模型输入 参数,即通过引导预训练模型在隐式表示空间对正、负样本数据进行判别来学习输入数据中的特征。从输入数据中构造出正、负样本数据时,让预训练模型在隐式表示空间对正、负样本数据进行判别,通过从无标记的输入数据中构建预训练任务,即监督信号,来实现泛化能力强、对比准确的目的,因此,图像分类模型为基于训练样本构建正样本数据、负样本数据进行模型训练得到的。具体的,所述负样本数据用于在模型训练过程中网络连接结构不变时对图节点的特征矩阵进行打乱处理,即负样本数据的作用是保持图像数据中图像网络连接结构不变,将节点的特征矩阵按行进行打乱,从而实现预训练模型在隐式表示空间对正、负样本数据进行判别来提高特征的学习准确性,例如,原始的5个节点,每个节点有32维特征,把原本的1、2、3、4、5的行序号,打乱为3、5、4、2、1,此时,网络的拓扑结构虽然没有变,但是每个位置上的节点的特征已经发生改变,由此构建了节点属性破坏后的负样本。
103、基于药物分子特征分布序列解析所述药物分子分类结果的药物特征信息,并基于所述药物特征信息与目标病症的病症特征信息进行匹配。
本申请实施例中,为了实现标准统一的特征匹配,并自动简化复杂的人为特征匹配的实验性操作,基于药物分子特征分布序列解析药物分子分类结果所对应的药物特征信息,以进行与目病症的病症特征的匹配。其中,药物分子特征分布序列中存储有包含不同分子化学键、分子化学性质信息、病症对抗属性的分子组成成分数据所对应的不同原子-化学键之间的对应关系,从而可以在基于102步骤中得到的药物分子分类结果从药物分子特征分布序列中匹配到对应的分子化学键、分子化学性质信息、病症对抗属性,作为药物特征信息。例如,药物分子分类结果包括a原子-化学键1、d原子-化学键3,h原子-化学键3,基于药物分子特征分布序列解析a原子-化学键1、d原子-化学键3,h原子-化学键3所对应的分子化学键、分子化学性质信息、病症对抗属性,其中,病症对抗属性为已知的目标药物分子与病症之间具有对抗性的属性,如是否对肺纤维细胞具有对抗性,以作为肺癌的治疗药物。
另外,当确定药物特征信息后,由于药物特征信息包括分子化学键、分子化学性质信息、病症对抗属性,因此,为了实现智能的信息推送,基于已经匹配到的药物特征信息与目标病症的病症特征信息进行匹配。其中,目标病症为需要与目标药物进行配对判断是否具有对抗性的病症,本申请实施例中的对抗性即为药物是否具有治疗病症的功能,因此,为了实现智能性的药物与病症的信息推送,将得到的药物特征信息可以与至少一个目标病症的病症特征信息进行匹配。当前智能医疗系统可以直接获取操作者(如医生、或医药科研人员)所录入的病症特征信息,此时,病症特征信息包括但不限于病症对人体所产生的不同生物学或化学上的特征内容,例如,病症特征信息为白细胞值高于a,则与分子化学键、分子化学性质信息、病症对抗属性进行匹配。
104、若所述药物特征信息与目标病症的病症特征信息匹配,则输出与所述药物特征信息匹配的药物特征组合信息、药物特征风险信息。
本申请实施例中,若药物特征信息与病症特征信息匹配,则说明目标药物与目标病症具有对抗性,此药物可以用于对此病症的治疗,因此,通过输出与药物特征信息匹配的药物特征组合信息、药物特征风险信息。其中,药物特征组合信息为目标药物可以组合使用的其他药物所具有的特征信息,所述药物特征风险信息为目标药物的分子化学键、分子化学性质信息、病症对抗属性所产生的人体使用风险内容,从而实现智能医疗系统的智能信息推送。
在一个本申请实施例中,为了进一步限定及说明,如图2所示,步骤101基于训练后的图像分类模型对所述药物分子结构图像数据进行分类处理,得到药物分子分类结果之前,所述方法还包括:
201、获取药物分子结构图像训练样本数据,并构建图卷积网络;
202、对所述药物分子结构图像训练样本数据进行特征扰动处理,得到作为扰动伪特征的药物分子结构图像训练样本数据,作为负样本数据,并将未特征扰动处理的所述药物分子结构图像训练样本数据作为正样本数据;
203、基于所述正样本数据、所述负样本数据分别与图节点构建数据对,并基于所述数据对对所述图卷积网络进行模型训练,得到图像分类模型。
本申请实施例中,为了实现对图像数据进行分类,因此,针对图像数据的特定,构建图卷积网络,并获取作为对图卷积网络进行训练的训练样本,即药物分子结构图像训练样本数据。其中,正样本数据与负样本数据均是基于药物分子结构图像训练样本数据中进行构造的,具体的,由于所述负样本数据的数据对是对网络连接结构不变时的图节点的特征矩阵进行打乱处理,即对药物分子结构图像训练样本数据进行特征扰动处理,得到作为扰动伪特征的药物分子结构图像训练样本数据,作为负样本数据,例如,负样本数据为保持图的网络连接结构不变,将药物分子结构图像训练样本数据中图节点的特征矩阵按行进行打乱,比如例如,原始的5个节点,每个节点有32维特征,把原本的1、2、3、4、5的行序号,打乱为3、5、4、2、1,此时,网络的拓扑结构虽然没有变,但是每个位置上的节点的特征已经发生改变,由此构建了节点属性破坏后的负样本。对应的,正样本数据即为将未特征扰动处理的所述药物分子结构图像训练样本数据直接作为正样本数据,此时正样本数据与负样本数据的个数总和作为训练样本数据的个数,例如,正样本数据的个数为M,负样本数据的个数为N,则M+N等于药物分子结构图像训练样本数据的样本总个数,本申请实施例中对N、M不做具体限定。
需要说明的是,为了提高图卷积网络对特征的对比准确性,在节点-全局图的对比基础上,通过池化的方法,按照一定的比例丢弃节点,构建多尺度的图卷积网络。例如,原始1000节点的图,通过按0.9,0.8,0.7的比例依次形成不同尺度的图像数据,从而可以在不同的尺度上构建节点-全图的对比学习目标,此时多个尺度的图像数据代表不同粒度的全图的信息,可以获得更加丰富的对比学习效果。其中,每池化一次缩小一次数据量,图像数据就通过减少节点数目来来完成一层一层的抽取,经过池化后,保留了 网络认为有用的节点,从而提高了特征对比的效果。此时,本申请实施例中,通过构建数据对的形式来减少节点数目,即基于正样本数据、负样本数据分别与图节点构建数据对,以基于数据对对所述图卷积网络进行模型训练,得到图像分类模型。
在一个本申请实施例中,为了进一步限定及说明,步骤203基于所述正样本数据、所述负样本数据分别与图节点构建数据对包括:按照预设比例筛选所述药物分子结构图像数据的图节点,得到多尺度的所述药物分子结构图像数据;将所述正样本数据与多尺度的所述药物分子结构图像数据的图节点组合构建第一数据对,并将所述负样本数据与多尺度的所述图卷积神经网络的图节点组合构建第二数据对,其中,所述第一数据对的标签为1,所述第二数据对的标签为0。
具体的,数据对的表现形式为:某节点-全图,因此,第一数据对的标签为1代表原始的图像数据中的节点及该图像数据组成的数据对;当对节点顺序进行扰动打乱后,不同的节点对应的特征就不再是原始的特征了,比如节点1的特征X1可能是原来的节点5的特征X5,而节点5的特征可能是节点9的特征X9,这个节点顺序扰动后的图像数据中,此时扰动后的任一节点和原始全图像数据组成的数据对就是负样本,即其第二数据对的标签为0。为了按照预设比例筛选药物分子结构图像数据的图节点,得到多尺度的药物分子结构图像数据,其中,预设比例筛选即为按预设比例丢弃节点的方式来筛选药物分子结构图像数据的图节点,可以构建图节点和不同粒度的全图像数据的对比学习,形成丰富的多尺度的数据对。
在一个本申请实施例中,为了进一步限定及说明,步骤204基于所述数据对对所述图卷积网络进行模型训练,得到图像分类模型之前,所述方法还包括:基于判别器结合所述正样本数据、所述负样本数据的样本个数构建损失函数。
由于图卷积神经网络的模型学习主要过程是通过迭代对图像数据中的图节点的邻居信息进行聚合和更新,因此,在一次迭代中,每一个图节点通过聚合邻居节点的特征及自己在上一层的特征来更新自己的信息,通常也会对聚合后的信息进行非线性变换,并通过堆叠多层网络,每个图节点可以获取到相应跳数内的邻居节点信息。此时,基于判别器结合正样本数据、负样本数据的样本个数构建损失函数,依次来对模型的训练进行判别,具体的,损失函数为:
Figure PCTCN2022089688-appb-000001
其中,s为使用读取函数从原始图像数据的特征隐含表示中获得的全图的总结表示,此处采用mean均值的计算方式,即所有节点的特征表示取平均,作为全图summary的特征表示,即s,N和M分别表示正样本数据和负样本数据的个数,优选为N=M,h表示当前图节点的隐含层嵌入,也就是该图节点的特征向量,如一个768维的向量,D为一个判别器,用来使用全局表示来分别对正样本数据和负样本数据进行模型训练打分,通过给正样本数据尽可能打高分,并给负样本数据打低分来进行图像数据的表示向量的学习。
另外,由于s表示由所有图节点组成的全图像数据的嵌入,也就是全图像数据的平均特征向量,如有100个图节点,s是100个图节点的平均的768维的向量,标准全图像数据的特征信息;X表示涂节点的特征向量,如在分子图的场景下,对于一个由多个原子节点和化学键为边组成的分子图,X表示原子节点的特征,A表示图像数据的邻接矩阵,即表示哪些图节点是相连的,哪些图节点是没有相连,从而能够表征图像数据的拓扑信息。如下表1所示,X表示图节点的特征向量,例如,在分子图的场景下,对于一个由多个原子节点和化学键为边组成的分子图,x表示原子节点的特征,包括如下特征等。
表1:原子的节点初始特征向量
Figure PCTCN2022089688-appb-000002
对应的,步骤204基于所述数据对对所述图卷积网络进行模型训练,得到图像分类模型包括:在基于所述第一数据对、所述第二数据对对图卷积网络进行模型训练时,所述第二数据对对所述图节点的特征矩阵进行打乱处理,并基于所述损失函数对打乱处理后的所述图卷积网络进行学习评估;若所述学习评估符合预设模型训练精度,则完成所述图卷积网络的模型训练,得到图像分类模型。
作为对图卷积网络进行模型训练过程,具体为:
1、利用原始的药物分子结构图像训练样本,对每个样本,应用一个特征扰动的处 理,获得每个图像数据样本对应的扰动假图样本,每个原始图节点和原始全图像数据形成的数据对,其标签为1,作为正样本数据的第一数据对,每个原始图节点和扰动后的假图像数据形成的数据对,其标签为0,作为负样本数据的第二数据对;
2、通过按比例丢弃节点的方式,构建多个尺度的图像数据,构建图节点和不同粒度的全图像数据的对比学习,形成丰富的多尺度的数据对;
3、在训练过程中,随机抽取一定大小批次的处理样本数据,该批次中会包含正样本数据和负样本数据分别对应的数据对,输入到图卷积神经网络中对比学习训练鉴别正负,其中,图神经网络的训练学习目标是让模型学习区分某个数据对是原始的图节点-全图像数据,还是特征扰动后的图节点-扰动的全图像数据,以此学会对图数据的理解和表达;
4、基于损失函数来判断是否迭代完成对图卷积神经网络的模型训练,以便完成训练过程,得到图像分类模型,即学习评估符合预设模型训练精度。
需要说明的是,如图3所示,图像数据的表示步骤为:1、先进行特征或者随机初始化的实体变量特征,进行聚合操作(aggregate),即将邻居图节点的信息聚合到本图节点,之后就是对自己的信息和聚合的信息进行操作,来更新自己的特征;2、每个图节点获知了自己的特征和来自其他图节点的信息。对于一个分子图来说,一个分子就是一个图像数据,分子图表达学习会将所有的图节点的特征进行累加或者其他的求和操作,从而得到整个分子图的向量特征。
在一个本申请实施例中,为了进一步限定及说明,步骤103基于药物分子特征分布序列解析所述药物分子分类结果的药物特征信息之前,所述方法还包括:获取至少一种药物的分子组成成分数据,基于分子化学键、分子化学性质信息、病症对抗属性构建所述目标药物的药物分子特征分布序列。
为了基于得到的药物分子分类结果进行有效且准确的信息推送,在基于药物分子特征分布序列进行解析药物特征之前,需要构建药物分子特征分布序列。其中,获取至少一种药物的分子组成成分数据,此时,由于是为了建立药物分子特征分布序列,则获取的分子组成成分数据仅仅包括分子化学键、分子化学性质信息、病症对抗性数据即可,以便构建至少一种药物的药物分子特征分布序列。其中,分子化学键为药物分子中所有原子的化学键,分子化学性质信息为包含特有分子组成结构所对应的化学性质,例如,苯酚环所对应的化学性质、苯甲基所对应的化学性质等内容,病症对抗属性为不同药物分子与不同病症之间是否存在治疗或者减缓等医疗效果的属性,例如,药物分子s对高血压患者的血粘稠度具有缓解作用,即具有对抗性。
需要说明的是,由于本申请实施例中是要对目标药物进行相关信息的推送,而目标药物大概率为未进行实验或者需要长期实验进行验证的,而构建的药物分子特征分布序列中的至少一个药物均是已经完成验证的药物,因此,本申请实施例中是基于已经分析分子组成成分数据的药物分子特征分布序列与药物分子分类结果进行对比,从而得到药 物特征信息。
对应的,步骤103基于药物分子特征分布序列解析所述药物分子分类结果的药物特征信息包括:将所述药物分子分类结果与所述药物分子特征分布序列进行分子与原子化学键逐一项对比,从所述药物分子特征分布序列中确定化学键相似度最大的分子化学性质信息、病症对抗属性为所述目标药物的药物特征信息。
为了实现具体的药物分子特征分布序列与药物分子分类结果的比较,且由于药物分子分类结果即为表示不同原子-化学键的分类结果,因此,通过化学键的信息分类进行逐一对比,并计算相似度,以将相似度最大值所对应在药物分子特征分布序列中的相似度最大值所对应的分子化学性质信息、病症对抗属性为目标药物的药物特征信息。
在一个本申请实施例中,为了进一步限定及说明,步骤基于所述药物特征信息与目标病症的病症特征信息进行匹配之后,所述方法还包括:获取药物知识图谱;从所述药物知识图谱中查找与所述药物特征信息匹配的药物特征组合信息、药物特征风险信息。
为了作为一种快速且智能的医疗信息的推荐方法,在确定药物特征信息与目标病症的病症特征信息匹配后,说明目标药物与目标病症具有对抗性,即此目标药物对治疗目标病症具有治疗或抑制、缓解等作用,因此,为了提高操作者的信息处理效率,获取药物知识图谱,以便获取与药物特征信息匹配的药物特征组合信息以及药物特征风险信息。其中,所述药物知识图谱中存储有不同药物特征信息存在的关联组合内容以及不同药物特征信息之间关联组合对应的风险信息,药物特征组合信息为目标药物可以组合使用的其他药物所具有的特征信息,例如,药物1的特征信息为s,药物2的特征信息为e,则组合后,药物1与药物2的特征信息可以为s+e,也可以为f;药物特征风险信息为目标药物的分子化学键、分子化学性质信息、病症对抗属性所产生的人体使用风险内容,例如,针对药物2中的化学性质信息d,使用的风险内容为红细胞减少等,本申请实施例不做具体限定。
需要说明的是,为了提高匹配效率,当前智能医疗系统中预先存储或者生成有不同药物所对应的药物特征组合信息以及药物特征风险信息以便在当前本申请实施例中,直接调取药物知识图谱进行药物特征组合信息、药物特征风险信息的匹配。
在一个本申请实施例中,为了进一步限定及说明,方法还包括:若所述药物特征信息与目标病症的病症特征信息不匹配,则输出所述药物分子分类结果,以指示对所述药物分子分类结果进行人工匹配。
在一个具体的实时场景中,为了避免因药物特征信息与目标病症的病症特征信息不匹配时,操作人员仍可以进行药物分子分类结果的获取,在不匹配时,直接将药物分子分类结果进行输出,以便进行人工实验或者匹配,例如,直接显示包含化学分子键的药物分子分类结果,本申请实施例不做具体限定。
本申请实施例提供了一种基于药物分子图像分类的信息推送方法,与现有技术相比,本申请实施例通过获取目标药物的药物分子结构图像数据;基于训练后的图像分类 模型对所述药物分子结构图像数据进行分类处理,得到药物分子分类结果,所述图像分类模型为基于训练样本构建正样本数据、负样本数据进行模型训练得到的,其中,所述负样本数据用于在模型训练过程中网络连接结构不变时对图节点的特征矩阵进行打乱处理;基于药物分子特征分布序列解析所述药物分子分类结果的药物特征信息,并基于所述药物特征信息与目标病症的病症特征信息进行匹配;若所述药物特征信息与目标病症的病症特征信息匹配,则输出与所述药物特征信息匹配的药物特征组合信息、药物特征风险信息,实现了基于智能人工算法进行药物特征的识别目的,以通过药物特征与病症的匹配来进行信息推送,大大提高了基于药物特征匹配病症在智能医疗中的使用效率较,从而提高了在智能医疗系统中的信息推送效率以及准确性。
进一步的,作为对上述图1所示方法的实现,本申请实施例提供了一种基于药物分子图像分类的信息推送装置,如图4所示,该装置包括:
获取模块31,用于获取目标药物的药物分子结构图像数据;
处理模块32,用于基于训练后的图像分类模型对所述药物分子结构图像数据进行分类处理,得到药物分子分类结果,所述图像分类模型为基于训练样本构建正样本数据、负样本数据进行模型训练得到的,其中,所述负样本数据用于在模型训练过程中网络连接结构不变时对图节点的特征矩阵进行打乱处理;
解析模块33,用于基于药物分子特征分布序列解析所述药物分子分类结果的药物特征信息,并基于所述药物特征信息与目标病症的病症特征信息进行匹配;
输出模块34,用于若所述药物特征信息与目标病症的病症特征信息匹配,则输出与所述药物特征信息匹配的药物特征组合信息、药物特征风险信息。
进一步地,所述装置还包括:训练模块,
所述获取模块,用于获取药物分子结构图像训练样本数据,并构建图卷积网络;
所述处理模块,用于对所述药物分子结构图像训练样本数据进行特征扰动处理,得到作为扰动伪特征的药物分子结构图像训练样本数据,作为负样本数据,并将未特征扰动处理的所述药物分子结构图像训练样本数据作为正样本数据;
所述训练模块,用于基于所述正样本数据、所述负样本数据分别与图节点构建数据对,并基于所述数据对对所述图卷积网络进行模型训练,得到图像分类模型,其中,所述负样本数据的数据对对网络连接结构不变时的图节点的特征矩阵进行打乱处理。
进一步地,所述构建模块包括:
筛选单元,用于按照预设比例筛选所述药物分子结构图像数据的图节点,得到多尺度的所述药物分子结构图像数据;
构建单元,用于将所述正样本数据与多尺度的所述药物分子结构图像数据的图节点组合构建第一数据对,并将所述负样本数据与多尺度的所述图卷积神经网络的图节点组合构建第二数据对,其中,所述第一数据对的标签为1,所述第二数据对的标签为0。
进一步地,
所述构建模块,还用于基于判别器结合所述正样本数据、所述负样本数据的样本个数构建损失函数;
所述所述训练模块包括:
处理单元,用于在基于所述第一数据对、所述第二数据对对图卷积网络进行模型训练时,所述第二数据对对所述图节点的特征矩阵进行打乱处理,并基于所述损失函数对打乱处理后的所述图卷积网络进行学习评估;
训练单元,用于若所述学习评估符合预设模型训练精度,则完成所述图卷积网络的模型训练,得到图像分类模型。
进一步地,
所述获取模块,还用于获取至少一种药物的分子组成成分数据,基于分子化学键、分子化学性质信息、病症对抗属性构建所述目标药物的药物分子特征分布序列;
所述解析模块,具体用于将所述药物分子分类结果与所述药物分子特征分布序列进行分子与原子化学键逐一项对比,从所述药物分子特征分布序列中确定化学键相似度最大的分子化学性质信息、病症对抗属性为所述目标药物的药物特征信息。
进一步地,所述装置还包括:查找模块,
所述获取模块,还用于获取药物知识图谱,所述药物知识图谱中存储有不同药物特征信息存在的关联组合内容以及不同药物特征信息之间关联组合对应的风险信息;
所述查找模块,用于从所述药物知识图谱中查找与所述药物特征信息匹配的药物特征组合信息、药物特征风险信息。
进一步地,所述输出模块,还用于若所述药物特征信息与目标病症的病症特征信息不匹配,则输出所述药物分子分类结果,以指示对所述药物分子分类结果进行人工匹配。
本申请实施例提供了一种基于药物分子图像分类的信息推送装置,与现有技术相比,本申请实施例通过获取目标药物的药物分子结构图像数据;基于训练后的图像分类模型对所述药物分子结构图像数据进行分类处理,得到药物分子分类结果,所述图像分类模型为基于训练样本构建正样本数据、负样本数据进行模型训练得到的,其中,所述负样本数据用于在模型训练过程中网络连接结构不变时对图节点的特征矩阵进行打乱处理;基于药物分子特征分布序列解析所述药物分子分类结果的药物特征信息,并基于所述药物特征信息与目标病症的病症特征信息进行匹配;若所述药物特征信息与目标病症的病症特征信息匹配,则输出与所述药物特征信息匹配的药物特征组合信息、药物特征风险信息,实现了基于智能人工算法进行药物特征的识别目的,以通过药物特征与病症的匹配来进行信息推送,大大提高了基于药物特征匹配病症在智能医疗中的使用效率较,从而提高了在智能医疗系统中的信息推送效率以及准确性。
根据本申请一个实施例提供了一种计算机可读存储介质,所述存储介质存储有至少一可执行指令,该计算机可执行指令可执行上述任意方法实施例中的基于药物分子图像分类的信息推送方法。所述计算机可读存储介质可以是非易失性,也可以是易失性。
图5示出了根据本申请一个实施例提供的一种计算机设备的结构示意图,本申请具体实施例并不对计算机设备的具体实现做限定。
如图5所示,该计算机设备可以包括:处理器(processor)402、通信接口(Communications Interface)404、存储器(memory)406、以及通信总线408。
其中:处理器402、通信接口404、以及存储器406通过通信总线408完成相互间的通信。
通信接口404,用于与其它设备比如客户端或其它服务器等的网元通信。
处理器402,用于执行程序410,具体可以执行上述基于药物分子图像分类的信息推送方法实施例中的相关步骤。
具体地,程序410可以包括程序代码,该程序代码包括计算机操作指令。
处理器402可能是中央处理器CPU,或者是特定集成电路ASIC(Application Specific Integrated Circuit),或者是被配置成实施本申请实施例的一个或多个集成电路。计算机设备包括的一个或多个处理器,可以是同一类型的处理器,如一个或多个CPU;也可以是不同类型的处理器,如一个或多个CPU以及一个或多个ASIC。
存储器406,用于存放程序410。存储器406可能包含高速RAM存储器,也可能还包括非易失性存储器(non-volatile memory),例如至少一个磁盘存储器。
程序410具体可以用于使得处理器402执行以下操作:
获取目标药物的药物分子结构图像数据;
基于训练后的图像分类模型对所述药物分子结构图像数据进行分类处理,得到药物分子分类结果,所述图像分类模型为基于训练样本构建正样本数据、负样本数据进行模型训练得到的,其中,所述负样本数据用于在模型训练过程中网络连接结构不变时对图节点的特征矩阵进行打乱处理;
基于药物分子特征分布序列解析所述药物分子分类结果的药物特征信息,并基于所述药物特征信息与目标病症的病症特征信息进行匹配;
若所述药物特征信息与目标病症的病症特征信息匹配,则输出与所述药物特征信息匹配的药物特征组合信息、药物特征风险信息。
显然,本领域的技术人员应该明白,上述的本申请的各模块或各步骤可以用通用的计算装置来实现,它们可以集中在单个的计算装置上,或者分布在多个计算装置所组成的网络上,可选地,它们可以用计算装置可执行的程序代码来实现,从而,可以将它们存储在存储装置中由计算装置来执行,并且在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤,或者将它们分别制作成各个集成电路模块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。这样,本申请不限制于任何特定的硬件和软件结合。
以上所述仅为本申请的优选实施例而已,并不用于限制本申请,对于本领域的技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包括在本申请的保护范围之内。

Claims (20)

  1. 一种基于药物分子图像分类的信息推送方法,其中,包括:
    获取目标药物的药物分子结构图像数据;
    基于训练后的图像分类模型对所述药物分子结构图像数据进行分类处理,得到药物分子分类结果,所述图像分类模型为基于训练样本构建正样本数据、负样本数据进行模型训练得到的,其中,所述负样本数据用于在模型训练过程中网络连接结构不变时对图节点的特征矩阵进行打乱处理;
    基于药物分子特征分布序列解析所述药物分子分类结果的药物特征信息,并基于所述药物特征信息与目标病症的病症特征信息进行匹配;
    若所述药物特征信息与目标病症的病症特征信息匹配,则输出与所述药物特征信息匹配的药物特征组合信息、药物特征风险信息。
  2. 根据权利要求1所述的方法,其中,所述基于训练后的图像分类模型对所述药物分子结构图像数据进行分类处理,得到药物分子分类结果之前,所述方法还包括:
    获取药物分子结构图像训练样本数据,并构建图卷积网络;
    对所述药物分子结构图像训练样本数据进行特征扰动处理,得到作为扰动伪特征的药物分子结构图像训练样本数据,作为负样本数据,并将未特征扰动处理的所述药物分子结构图像训练样本数据作为正样本数据;
    基于所述正样本数据、所述负样本数据分别与图节点构建数据对,并基于所述数据对对所述图卷积网络进行模型训练,得到图像分类模型。
  3. 根据权利要求2所述的方法,其中,所述基于所述正样本数据、所述负样本数据分别与图节点构建数据对包括:
    按照预设比例筛选所述药物分子结构图像数据的图节点,得到多尺度的所述药物分子结构图像数据;
    将所述正样本数据与多尺度的所述药物分子结构图像数据的图节点组合构建第一数据对,并将所述负样本数据与多尺度的所述图卷积神经网络的图节点组合构建第二数据对,其中,所述第一数据对的标签为1,所述第二数据对的标签为0。
  4. 根据权利要求3所述的方法,其中,所述基于所述数据对对所述图卷积网络进行模型训练,得到图像分类模型之前,所述方法还包括:
    基于判别器结合所述正样本数据、所述负样本数据的样本个数构建损失函数;
    所述基于所述数据对对所述图卷积网络进行模型训练,得到图像分类模型包括:
    在基于所述第一数据对、所述第二数据对对图卷积网络进行模型训练时,所述第二数据对对所述图节点的特征矩阵进行打乱处理,并基于所述损失函数对打乱处理后的所述图卷积网络进行学习评估;
    若所述学习评估符合预设模型训练精度,则完成所述图卷积网络的模型训练,得到 图像分类模型。
  5. 根据权利要求1所述的方法,其中,所述基于药物分子特征分布序列解析所述药物分子分类结果的药物特征信息之前,所述方法还包括:
    获取至少一种药物的分子组成成分数据,基于分子化学键、分子化学性质信息、病症对抗属性构建所述目标药物的药物分子特征分布序列;
    所述基于药物分子特征分布序列解析所述药物分子分类结果的药物特征信息包括:
    将所述药物分子分类结果与所述药物分子特征分布序列进行分子与原子化学键逐一项对比,从所述药物分子特征分布序列中确定化学键相似度最大的分子化学性质信息、病症对抗属性为所述目标药物的药物特征信息。
  6. 根据权利要求1所述的方法,其中,所述基于所述药物特征信息与目标病症的病症特征信息进行匹配之后,所述方法还包括:
    获取药物知识图谱,所述药物知识图谱中存储有不同药物特征信息存在的关联组合内容以及不同药物特征信息之间关联组合对应的风险信息;
    从所述药物知识图谱中查找与所述药物特征信息匹配的药物特征组合信息、药物特征风险信息。
  7. 根据权利要求1-6任一项所述的方法,其中,所述方法还包括:
    若所述药物特征信息与目标病症的病症特征信息不匹配,则输出所述药物分子分类结果,以指示对所述药物分子分类结果进行人工匹配。
  8. 一种基于药物分子图像分类的信息推送装置,其中,包括:
    获取模块,用于获取目标药物的药物分子结构图像数据;
    处理模块,用于基于训练后的图像分类模型对所述药物分子结构图像数据进行分类处理,得到药物分子分类结果,所述图像分类模型为基于训练样本构建正样本数据、负样本数据进行模型训练得到的,其中,所述负样本数据用于在模型训练过程中网络连接结构不变时对图节点的特征矩阵进行打乱处理;
    匹配模块,用于基于药物分子特征分布序列解析所述药物分子分类结果的药物特征信息,并基于所述药物特征信息与目标病症的病症特征信息进行匹配;
    输出模块,用于若所述药物特征信息与目标病症的病症特征信息匹配,则输出与所述药物特征信息匹配的药物特征组合信息、药物特征风险信息。
  9. 一种计算机可读存储介质,其上存储有计算机可读指令,其中,所述计算机可读指令被处理器执行时实现基于药物分子图像分类的信息推送方法,包括:
    获取目标药物的药物分子结构图像数据;
    基于训练后的图像分类模型对所述药物分子结构图像数据进行分类处理,得到药物分子分类结果,所述图像分类模型为基于训练样本构建正样本数据、负样本数据进行模型训练得到的,其中,所述负样本数据用于在模型训练过程中网络连接结构不变时对图节点的特征矩阵进行打乱处理;
    基于药物分子特征分布序列解析所述药物分子分类结果的药物特征信息,并基于所述药物特征信息与目标病症的病症特征信息进行匹配;
    若所述药物特征信息与目标病症的病症特征信息匹配,则输出与所述药物特征信息匹配的药物特征组合信息、药物特征风险信息。
  10. 根据权利要求9所述的计算机可读存储介质,其中,所述计算机可读指令被处理器执行时实现基于训练后的图像分类模型对所述药物分子结构图像数据进行分类处理,得到药物分子分类结果之前,所述方法还包括:
    获取药物分子结构图像训练样本数据,并构建图卷积网络;
    对所述药物分子结构图像训练样本数据进行特征扰动处理,得到作为扰动伪特征的药物分子结构图像训练样本数据,作为负样本数据,并将未特征扰动处理的所述药物分子结构图像训练样本数据作为正样本数据;
    基于所述正样本数据、所述负样本数据分别与图节点构建数据对,并基于所述数据对对所述图卷积网络进行模型训练,得到图像分类模型。
  11. 根据权利要求10所述的计算机可读存储介质,其中,所述计算机可读指令被处理器执行时实现基于所述正样本数据、所述负样本数据分别与图节点构建数据对包括:
    按照预设比例筛选所述药物分子结构图像数据的图节点,得到多尺度的所述药物分子结构图像数据;
    将所述正样本数据与多尺度的所述药物分子结构图像数据的图节点组合构建第一数据对,并将所述负样本数据与多尺度的所述图卷积神经网络的图节点组合构建第二数据对,其中,所述第一数据对的标签为1,所述第二数据对的标签为0。
  12. 根据权利要求11所述的计算机可读存储介质,其中,所述计算机可读指令被处理器执行时实现基于所述数据对对所述图卷积网络进行模型训练,得到图像分类模型之前,所述方法还包括:
    基于判别器结合所述正样本数据、所述负样本数据的样本个数构建损失函数;
    所述基于所述数据对对所述图卷积网络进行模型训练,得到图像分类模型包括:
    在基于所述第一数据对、所述第二数据对对图卷积网络进行模型训练时,所述第二数据对对所述图节点的特征矩阵进行打乱处理,并基于所述损失函数对打乱处理后的所述图卷积网络进行学习评估;
    若所述学习评估符合预设模型训练精度,则完成所述图卷积网络的模型训练,得到图像分类模型。
  13. 根据权利要求9所述的计算机可读存储介质,其中,所述计算机可读指令被处理器执行时实现基于药物分子特征分布序列解析所述药物分子分类结果的药物特征信息之前,所述方法还包括:
    获取至少一种药物的分子组成成分数据,基于分子化学键、分子化学性质信息、病 症对抗属性构建所述目标药物的药物分子特征分布序列;
    所述基于药物分子特征分布序列解析所述药物分子分类结果的药物特征信息包括:
    将所述药物分子分类结果与所述药物分子特征分布序列进行分子与原子化学键逐一项对比,从所述药物分子特征分布序列中确定化学键相似度最大的分子化学性质信息、病症对抗属性为所述目标药物的药物特征信息。
  14. 根据权利要求9所述的计算机可读存储介质,其中,所述计算机可读指令被处理器执行时实现基于所述药物特征信息与目标病症的病症特征信息进行匹配之后,所述方法还包括:
    获取药物知识图谱,所述药物知识图谱中存储有不同药物特征信息存在的关联组合内容以及不同药物特征信息之间关联组合对应的风险信息;
    从所述药物知识图谱中查找与所述药物特征信息匹配的药物特征组合信息、药物特征风险信息。
  15. 一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机可读指令,其中,所述计算机可读指令被处理器执行时实现基于药物分子图像分类的信息推送方法,包括:
    获取目标药物的药物分子结构图像数据;
    基于训练后的图像分类模型对所述药物分子结构图像数据进行分类处理,得到药物分子分类结果,所述图像分类模型为基于训练样本构建正样本数据、负样本数据进行模型训练得到的,其中,所述负样本数据用于在模型训练过程中网络连接结构不变时对图节点的特征矩阵进行打乱处理;
    基于药物分子特征分布序列解析所述药物分子分类结果的药物特征信息,并基于所述药物特征信息与目标病症的病症特征信息进行匹配;
    若所述药物特征信息与目标病症的病症特征信息匹配,则输出与所述药物特征信息匹配的药物特征组合信息、药物特征风险信息。
  16. 根据权利要求15所述的计算机设备,其中,所述计算机可读指令被处理器执行时实现基于训练后的图像分类模型对所述药物分子结构图像数据进行分类处理,得到药物分子分类结果之前,所述方法还包括:
    获取药物分子结构图像训练样本数据,并构建图卷积网络;
    对所述药物分子结构图像训练样本数据进行特征扰动处理,得到作为扰动伪特征的药物分子结构图像训练样本数据,作为负样本数据,并将未特征扰动处理的所述药物分子结构图像训练样本数据作为正样本数据;
    基于所述正样本数据、所述负样本数据分别与图节点构建数据对,并基于所述数据对对所述图卷积网络进行模型训练,得到图像分类模型。
  17. 根据权利要求16所述的计算机设备,其中,所述计算机可读指令被处理器执行时实现基于所述正样本数据、所述负样本数据分别与图节点构建数据对包括:
    按照预设比例筛选所述药物分子结构图像数据的图节点,得到多尺度的所述药物分子结构图像数据;
    将所述正样本数据与多尺度的所述药物分子结构图像数据的图节点组合构建第一数据对,并将所述负样本数据与多尺度的所述图卷积神经网络的图节点组合构建第二数据对,其中,所述第一数据对的标签为1,所述第二数据对的标签为0。
  18. 根据权利要求17所述的计算机设备,其中,所述计算机可读指令被处理器执行时实现基于所述数据对对所述图卷积网络进行模型训练,得到图像分类模型之前,所述方法还包括:
    基于判别器结合所述正样本数据、所述负样本数据的样本个数构建损失函数;
    所述基于所述数据对对所述图卷积网络进行模型训练,得到图像分类模型包括:
    在基于所述第一数据对、所述第二数据对对图卷积网络进行模型训练时,所述第二数据对对所述图节点的特征矩阵进行打乱处理,并基于所述损失函数对打乱处理后的所述图卷积网络进行学习评估;
    若所述学习评估符合预设模型训练精度,则完成所述图卷积网络的模型训练,得到图像分类模型。
  19. 根据权利要求15所述的计算机设备,其中,所述计算机可读指令被处理器执行时实现基于药物分子特征分布序列解析所述药物分子分类结果的药物特征信息之前,所述方法还包括:
    获取至少一种药物的分子组成成分数据,基于分子化学键、分子化学性质信息、病症对抗属性构建所述目标药物的药物分子特征分布序列;
    所述基于药物分子特征分布序列解析所述药物分子分类结果的药物特征信息包括:
    将所述药物分子分类结果与所述药物分子特征分布序列进行分子与原子化学键逐一项对比,从所述药物分子特征分布序列中确定化学键相似度最大的分子化学性质信息、病症对抗属性为所述目标药物的药物特征信息。
  20. 根据权利要求15所述的计算机设备,其中,所述计算机可读指令被处理器执行时实现基于所述药物特征信息与目标病症的病症特征信息进行匹配之后,所述方法还包括:
    获取药物知识图谱,所述药物知识图谱中存储有不同药物特征信息存在的关联组合内容以及不同药物特征信息之间关联组合对应的风险信息;
    从所述药物知识图谱中查找与所述药物特征信息匹配的药物特征组合信息、药物特征风险信息。
PCT/CN2022/089688 2022-01-11 2022-04-27 基于药物分子图像分类的信息推送方法及装置 WO2023134060A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210028280.1 2022-01-11
CN202210028280.1A CN114358202B (zh) 2022-01-11 2022-01-11 基于药物分子图像分类的信息推送方法及装置

Publications (1)

Publication Number Publication Date
WO2023134060A1 true WO2023134060A1 (zh) 2023-07-20

Family

ID=81108273

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/089688 WO2023134060A1 (zh) 2022-01-11 2022-04-27 基于药物分子图像分类的信息推送方法及装置

Country Status (2)

Country Link
CN (1) CN114358202B (zh)
WO (1) WO2023134060A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114358202B (zh) * 2022-01-11 2024-10-15 平安科技(深圳)有限公司 基于药物分子图像分类的信息推送方法及装置
CN115132295B (zh) * 2022-04-21 2024-05-24 腾讯科技(深圳)有限公司 分子分类方法、装置、设备及计算机可读存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110998739A (zh) * 2017-08-08 2020-04-10 国际商业机器公司 不良药物反应的预测
CN111933225A (zh) * 2020-09-27 2020-11-13 平安科技(深圳)有限公司 药物分类方法、装置、终端设备以及存储介质
CN112308227A (zh) * 2020-11-02 2021-02-02 平安科技(深圳)有限公司 神经网络架构搜索方法、装置、终端设备以及存储介质
CN113707264A (zh) * 2021-08-31 2021-11-26 平安科技(深圳)有限公司 基于机器学习的药物推荐方法、装置、设备及介质
CN114358202A (zh) * 2022-01-11 2022-04-15 平安科技(深圳)有限公司 基于药物分子图像分类的信息推送方法及装置

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109935341B (zh) * 2019-04-09 2021-04-13 北京深度制耀科技有限公司 一种药物新适应症的预测方法及装置
KR20210125310A (ko) * 2020-04-08 2021-10-18 주식회사 셀바스에이아이 약물 유사도 평가 방법 및 이를 이용한 디바이스
CN113140267B (zh) * 2021-03-25 2024-03-29 北京化工大学 一种基于图神经网络的定向分子生成方法
CN113707236B (zh) * 2021-08-30 2024-05-14 平安科技(深圳)有限公司 基于图神经网络的药物小分子性质预测方法、装置及设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110998739A (zh) * 2017-08-08 2020-04-10 国际商业机器公司 不良药物反应的预测
CN111933225A (zh) * 2020-09-27 2020-11-13 平安科技(深圳)有限公司 药物分类方法、装置、终端设备以及存储介质
CN112308227A (zh) * 2020-11-02 2021-02-02 平安科技(深圳)有限公司 神经网络架构搜索方法、装置、终端设备以及存储介质
CN113707264A (zh) * 2021-08-31 2021-11-26 平安科技(深圳)有限公司 基于机器学习的药物推荐方法、装置、设备及介质
CN114358202A (zh) * 2022-01-11 2022-04-15 平安科技(深圳)有限公司 基于药物分子图像分类的信息推送方法及装置

Also Published As

Publication number Publication date
CN114358202A (zh) 2022-04-15
CN114358202B (zh) 2024-10-15

Similar Documents

Publication Publication Date Title
WO2021121129A1 (zh) 雷同病例检测方法、装置、设备及存储介质
WO2023134060A1 (zh) 基于药物分子图像分类的信息推送方法及装置
CN109801705A (zh) 治疗推荐方法、系统、装置及存储介质
Haritha et al. COVID detection from chest X-rays with DeepLearning: CheXNet
WO2016205286A1 (en) Automatic entity resolution with rules detection and generation system
Xu et al. Research on ResNet101 network chemical reagent label image classification based on transfer learning
Cheng et al. Classification of long sequential data using circular dilated convolutional neural networks
WO2023134061A1 (zh) 基于人工智能的药物特征信息确定方法及装置
de Sousa Costa et al. Classification of malignant and benign lung nodules using taxonomic diversity index and phylogenetic distance
WO2021151358A1 (zh) 基于解释模型的分诊信息推荐方法、装置、设备及介质
Xie et al. Optic disc and cup image segmentation utilizing contour-based transformation and sequence labeling networks
Iparraguirre-Villanueva et al. Convolutional neural networks with transfer learning for pneumonia detection
Kundu et al. Vision transformer based deep learning model for monkeypox detection
Florindo et al. VisGraphNet: A complex network interpretation of convolutional neural features
Mabrouk et al. Ensemble Federated Learning: An approach for collaborative pneumonia diagnosis
Soundrapandiyan et al. AI-based wavelet and stacked deep learning architecture for detecting coronavirus (COVID-19) from chest X-ray images
Sen et al. A transfer learning based approach for lung inflammation detection
Lin et al. The design of error-correcting output codes based deep forest for the micro-expression recognition
Alghieth Skin Disease Detection for Kids at School Using Deep Learning Techniques.
Nawshad et al. Attention based residual network for effective detection of covid-19 and viral pneumonia
Gururaj et al. Fundus image features extraction for exudate mining in coordination with content based image retrieval: A study
Zhou et al. Audit to Forget: A Unified Method to Revoke Patients' Private Data in Intelligent Healthcare
Yuan et al. Meta-learning causal feature selection for stable prediction
Maquen-Niño et al. Classification Model Using Transfer Learning for the Detection of Pneumonia in Chest X-Ray Images.
Yang et al. Artificial intelligence in biomedical research

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE